PKL-Converter and Visualizer

About the Pkl Converter and Visualizer

With the spectrafit-pkl-converter command line tool you can convert the pkl files with nested dictionaries and list or numpy arrays to list-of-dictionaries with numpy arrays. This is useful for further processing with other tools.

In general, the pickle files can be very complex and contain nested dictionaries and lists, as shown in the following example:

stateDiagram
    [*] --> pkl
    pkl --> list
    pkl --> np.array
    pkl --> dict
    pkl --> else
    dict --> dict
    dict --> list
    dict --> np.array
    list --> list
    list --> np.array
    np.array --> np.array
    np.array --> list
    np.array --> dict
    dict -->  list_of_dicts
    list_of_dicts --> [*]

For the visualization of the pkl files, the spectrafit-pkl-visualizer command line tool can be used. It creates a graph of the pkl file and

PKL Converter¶

The spectrafit-pkl-converter command line tool can be used like this:

Bash

    ➜ spectrafit-pkl-converter -h
    usage: spectrafit-pkl-converter [-h] [-f {utf-16,utf-8,latin1,utf-32}] [-e {pkl.gz,pkl,npy,npz}] infile

    Converter for 'SpectraFit' from pkl files to CSV files.

    positional arguments:
    infile                Filename of the pkl file to convert.

    options:
    -h, --help            show this help message and exit
    -f {latin1,utf-16,utf-8,utf-32}, --file-format {latin1,utf-16,utf-8,utf-32}
                            File format for the optional encoding of the pickle file. Default is 'latin1'.
    -e {pkl.gz,pkl,npy,npz}, --export-format {pkl.gz,pkl,npy,npz}
                            File format for export of the output file. Default is 'pkl'.

The following export files are possible:

pkl: Pickle file as pkl file and compressed pkl.gz file.
npy: Numpy array as npy file and compressed npz file.

In case of using other file formats, the spectrafit-pkl-converter supports the following file formats:

utf-8: UTF-8 encoded file.
utf-16: UTF-16 encoded file.
utf-32: UTF-32 encoded file.
latin1: Latin-1 encoded file.

All keys up to the first key-value pair of a numpy.ndarray or list are merged into a single string, which is used as a new filename. A list will be converted to a numpy.ndarray with the shape (len(list),).


graph LR

.pkl --> dict_1
.pkl --> dict_2
.pkl --> dict_3
.pkl --> dict_4
dict_1 --> dict_1.pkl
dict_2 --> dict_2.pkl
dict_3 --> dict_3.pkl
dict_4 --> dict_4.pkl

Using the spectrafit-pkl-converter as a Python module

In the case of using spectrafit-pkl-converter as a Python module, the following:

Python

from spectrafit.plugins.pkl_converter import PklConverter

pkl_converter = PklConverter()
list_dict = pkl_converter.convert_pkl_to_csv(
    infile="test.pkl",
)

The list_dict variable contains the converted data as a list of dictionaries.

See also:

Bases: Converter

Convert pkl data to a CSV files.

General information

The pkl data is converted to a CSV file. The CSV file is saved in the same directory as the input file. The name of the CSV file is the same as the input file with the suffix .csv and prefixed with the name of the 'major' keys in the pkl file. Furthermore, a graph of the data is optionally saved as a PDF file to have a visual representation of the data structure.

Supported file formats

Currently supported file formats:

-[x] pkl -[x] pkl.gz -[x] ...

Attributes:

Name	Type	Description
`choices_fformat`	`Set[str]`	The choices for the file format.
`choices_export`	`Set[str]`	The choices for the export format.

Source code in spectrafit/plugins/pkl_converter.py

Python

class PklConverter(Converter):
    """Convert pkl data to a CSV files.

    !!! info "General information"

        The pkl data is converted to a CSV file. The CSV file is saved in the same
        directory as the input file. The name of the CSV file is the same as the
        input file with the suffix `.csv` and prefixed with the name of the
        'major' keys in the pkl file. Furthermore, a graph of the data is optionally
        saved as a PDF file to have a visual representation of the data structure.

    !!! info "Supported file formats"

        Currently supported file formats:

        -[x] pkl
        -[x] pkl.gz
        -[x] ...


    Attributes:
        choices_fformat (Set[str]): The choices for the file format.
        choices_export (Set[str]): The choices for the export format.
    """

    choices_fformat = {"latin1", "utf-8", "utf-16", "utf-32"}
    choices_export = {"npy", "npz", "pkl", pkl_gz}

    def get_args(self) -> Dict[str, Any]:
        """Get the arguments from the command line.

        Returns:
            Dict[str, Any]: Return the input file arguments as a dictionary without
                additional information beyond the command line arguments.
        """
        parser = argparse.ArgumentParser(
            description="Converter for 'SpectraFit' from pkl files to CSV files.",
            usage="%(prog)s [options] infile",
        )
        parser.add_argument(
            "infile",
            type=Path,
            help="Filename of the pkl file to convert.",
        )
        parser.add_argument(
            "-f",
            "--file-format",
            help="File format for the optional encoding of the pickle file."
            " Default is 'latin1'.",
            type=str,
            default="latin1",
            choices=self.choices_fformat,
        )
        parser.add_argument(
            "-e",
            "--export-format",
            help="File format for export of the output file. Default is 'pkl'.",
            type=str,
            default="pkl",
            choices=self.choices_export,
        )
        return vars(parser.parse_args())

    @staticmethod
    def convert(infile: Path, file_format: str) -> Dict[str, Any]:
        """Convert the input file to the output file.

        Args:
            infile (Path): The input file of the as a path object.
            file_format (str): The output file format.

        Returns:
            Dict[str, Any]: The data as a dictionary, which can be a nested dictionary
        """

        def _convert(
            data_values: Dict[str, Any], _key: Optional[List[str]] = None
        ) -> List[Dict[str, Any]]:
            """Convert the data to a list of dictionaries.

            The new key is the old key plus all the subkeys. The new value is the
            value of the subkey if the value is an instance of an array.

            For avoiding `pylint` errors, the `_key` argument is set to `None` by
            default and is set to an empty list if it is `None`. This is done to
            avoid the `pylint` error `dangerous-default-value`. The `_key` argument
            is used to keep track of the keys of the nested dictionary. Furthermore,
            the `_key` argument is used to create the new key for the new dictionary.

            Finally, the new dictionary is appended to the list of dictionaries.

            Args:
                data_values (Dict[str, Any]): The data as a dictionary.

            Returns:
                List[Dict[str, Any]]: The data as a list of dictionaries.
            """
            data_list = []
            if _key is None:
                _key = []
            for key, value in data_values.items():
                if isinstance(value, dict):
                    _key.append(str(key))
                    data_list.extend(_convert(value, _key))
                    _key.pop()
                elif isinstance(value, np.ndarray):
                    data_list.append({"_".join(_key + [key]): value})
            return data_list

        data_dict = {}
        for key, value in pkl2any(infile, file_format).items():
            if isinstance(value, dict):
                data_dict[key] = _convert(value)
        return data_dict

    def save(self, data: Any, fname: Path, export_format: str) -> None:
        """Save the converted pickle data to a file.

        Args:
            data (Any): The converted nested dictionary of the pkl data.
            fname (Path): The filename of the output file.
            export_format (str): The file format of the output file.

        Raises:
            ValueError: If the export format is not supported.
        """
        if export_format.lower() not in self.choices_export:
            raise ValueError(f"Unsupported file format '{export_format}'.")

        fname = pure_fname(fname)

        for key, value in data.items():
            _fname = Path(f"{fname}_{key}").with_suffix(f".{export_format}")
            ExportData(data=value, fname=_fname, export_format=export_format)()

    def __call__(self) -> None:
        """Run the converter."""
        args = self.get_args()
        data = self.convert(args["infile"], args["file_format"])
        self.save(data, args["infile"], args["export_format"])

`call()` ¶

Run the converter.

Source code in spectrafit/plugins/pkl_converter.py

Python

def __call__(self) -> None:
    """Run the converter."""
    args = self.get_args()
    data = self.convert(args["infile"], args["file_format"])
    self.save(data, args["infile"], args["export_format"])

`convert(infile, file_format)` `staticmethod` ¶

Convert the input file to the output file.

Parameters:

Name	Type	Description	Default
`infile`	`Path`	The input file of the as a path object.	required
`file_format`	`str`	The output file format.	required

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: The data as a dictionary, which can be a nested dictionary

Source code in spectrafit/plugins/pkl_converter.py

Python

@staticmethod
def convert(infile: Path, file_format: str) -> Dict[str, Any]:
    """Convert the input file to the output file.

    Args:
        infile (Path): The input file of the as a path object.
        file_format (str): The output file format.

    Returns:
        Dict[str, Any]: The data as a dictionary, which can be a nested dictionary
    """

    def _convert(
        data_values: Dict[str, Any], _key: Optional[List[str]] = None
    ) -> List[Dict[str, Any]]:
        """Convert the data to a list of dictionaries.

        The new key is the old key plus all the subkeys. The new value is the
        value of the subkey if the value is an instance of an array.

        For avoiding `pylint` errors, the `_key` argument is set to `None` by
        default and is set to an empty list if it is `None`. This is done to
        avoid the `pylint` error `dangerous-default-value`. The `_key` argument
        is used to keep track of the keys of the nested dictionary. Furthermore,
        the `_key` argument is used to create the new key for the new dictionary.

        Finally, the new dictionary is appended to the list of dictionaries.

        Args:
            data_values (Dict[str, Any]): The data as a dictionary.

        Returns:
            List[Dict[str, Any]]: The data as a list of dictionaries.
        """
        data_list = []
        if _key is None:
            _key = []
        for key, value in data_values.items():
            if isinstance(value, dict):
                _key.append(str(key))
                data_list.extend(_convert(value, _key))
                _key.pop()
            elif isinstance(value, np.ndarray):
                data_list.append({"_".join(_key + [key]): value})
        return data_list

    data_dict = {}
    for key, value in pkl2any(infile, file_format).items():
        if isinstance(value, dict):
            data_dict[key] = _convert(value)
    return data_dict

`get_args()` ¶

Get the arguments from the command line.

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: Return the input file arguments as a dictionary without additional information beyond the command line arguments.

Source code in spectrafit/plugins/pkl_converter.py

Python

def get_args(self) -> Dict[str, Any]:
    """Get the arguments from the command line.

    Returns:
        Dict[str, Any]: Return the input file arguments as a dictionary without
            additional information beyond the command line arguments.
    """
    parser = argparse.ArgumentParser(
        description="Converter for 'SpectraFit' from pkl files to CSV files.",
        usage="%(prog)s [options] infile",
    )
    parser.add_argument(
        "infile",
        type=Path,
        help="Filename of the pkl file to convert.",
    )
    parser.add_argument(
        "-f",
        "--file-format",
        help="File format for the optional encoding of the pickle file."
        " Default is 'latin1'.",
        type=str,
        default="latin1",
        choices=self.choices_fformat,
    )
    parser.add_argument(
        "-e",
        "--export-format",
        help="File format for export of the output file. Default is 'pkl'.",
        type=str,
        default="pkl",
        choices=self.choices_export,
    )
    return vars(parser.parse_args())

`save(data, fname, export_format)` ¶

Save the converted pickle data to a file.

Parameters:

Name	Type	Description	Default
`data`	`Any`	The converted nested dictionary of the pkl data.	required
`fname`	`Path`	The filename of the output file.	required
`export_format`	`str`	The file format of the output file.	required

Raises:

Type	Description
`ValueError`	If the export format is not supported.

Source code in spectrafit/plugins/pkl_converter.py

Python

def save(self, data: Any, fname: Path, export_format: str) -> None:
    """Save the converted pickle data to a file.

    Args:
        data (Any): The converted nested dictionary of the pkl data.
        fname (Path): The filename of the output file.
        export_format (str): The file format of the output file.

    Raises:
        ValueError: If the export format is not supported.
    """
    if export_format.lower() not in self.choices_export:
        raise ValueError(f"Unsupported file format '{export_format}'.")

    fname = pure_fname(fname)

    for key, value in data.items():
        _fname = Path(f"{fname}_{key}").with_suffix(f".{export_format}")
        ExportData(data=value, fname=_fname, export_format=export_format)()

:members: :undoc-members:

PKL Visualizer¶

The spectrafit-pkl-visualizer should be used for the visualization of the pkl files. It creates a graph of the pkl file and exports it as a graph file.

The spectrafit-pkl-visualizer command line tool can be used like this:

Bash

    ➜ spectrafit-pkl-visualizer -h
    usage: spectrafit-pkl-visualizer [-h] [-f {utf-32,utf-16,latin1,utf-8}] [-e {jpg,pdf,jpeg,png}] infile

    Converter for 'SpectraFit' from pkl files to a graph.

    positional arguments:
    infile                Filename of the pkl file to convert to graph.

    options:
    -h, --help            show this help message and exit
    -f {latin1,utf-16,utf-8,utf-32}, --file-format {latin1,utf-16,utf-8,utf-32}
                            File format for the optional encoding of the pickle file. Default is 'latin1'.
    -e {jpg,pdf,jpeg,png}, --export-format {jpg,pdf,jpeg,png}
                            File extension for the graph export.

Furthermore the spectrafit-pkl-visualizer allows export the structure of the pkl file as a JSON file. The information about the attributes and their structure is stored in the JSON file. The following example shows the structure of the JSON file:

Example of the JSON file

JSON

{
  "file_1": {
    "attribute_1": "<class 'list'>",
    "attribute_2": "<class 'str'>",
    "attribute_3": "<class 'numpy.ndarray'> of shape (201,)",
    "attribute_4": "<class 'numpy.ndarray'> of shape (199,)",
    "attribute_5": "<class 'numpy.ndarray'> of shape (10, 201, 10000)",
    "attribute_6": "<class 'numpy.ndarray'> of shape (10, 201, 10000)",
    "attribute_7": "<class 'numpy.ndarray'> of shape (10, 201, 10000)",
    "attribute_8": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
    "attribute_9": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
    "attribute_10": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
    "attribute_11": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
    "attribute_12": "<class 'numpy.ndarray'> of shape (10000,)",
    "attribute_13": "<class 'list'>",
    "attribute_14": "<class 'numpy.ndarray'> of shape (10, 201)",
    "attribute_16": "<class 'int'>",
    "attribute_17": "<class 'str'>",
    "attribute_19": "<class 'str'>"
  },
  "file_2": {
    "attribute_1": "<class 'list'>",
    "attribute_2": "<class 'str'>",
    "attribute_3": "<class 'numpy.ndarray'> of shape (201,)",
    "attribute_4": "<class 'numpy.ndarray'> of shape (199,)",
    "attribute_5": "<class 'numpy.ndarray'> of shape (10, 201, 10000)",
    "attribute_6": "<class 'numpy.ndarray'> of shape (10, 201, 10000)",
    "attribute_7": "<class 'numpy.ndarray'> of shape (10, 201, 10000)",
    "attribute_8": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
    "attribute_9": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
    "attribute_10": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
    "attribute_11": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
    "attribute_12": "<class 'numpy.ndarray'> of shape (10000,)",
    "attribute_13": "<class 'list'>",
    "attribute_14": "<class 'numpy.ndarray'> of shape (10, 201)",
    "attribute_16": "<class 'int'>",
    "attribute_17": "<class 'str'>",
    "attribute_19": "<class 'str'>"
  }
}

Example of the graph

The resulting graph looks like this:

PKL-Converter and Visualizer

PKL Converter¶

__call__() ¶

convert(infile, file_format) staticmethod ¶

get_args() ¶

save(data, fname, export_format) ¶

PKL Visualizer¶

`call()` ¶

`convert(infile, file_format)` `staticmethod` ¶

`get_args()` ¶

`save(data, fname, export_format)` ¶