PKL-Converter and Visualizer
About the Pkl Converter and Visualizer
With the spectrafit-pkl-converter
command line tool you can convert the pkl files with nested dictionaries and list or numpy arrays
to list-of-dictionaries with numpy arrays
. This is useful for further processing with other tools.
In general, the pickle files can be very complex and contain nested dictionaries and lists, as shown in the following example:
stateDiagram
[*] --> pkl
pkl --> list
pkl --> np.array
pkl --> dict
pkl --> else
dict --> dict
dict --> list
dict --> np.array
list --> list
list --> np.array
np.array --> np.array
np.array --> list
np.array --> dict
dict --> list_of_dicts
list_of_dicts --> [*]
For the visualization of the pkl files, the spectrafit-pkl-visualizer
command line tool can be used. It creates a graph of the pkl file and
PKL Converter¶
The spectrafit-pkl-converter
command line tool can be used like this:
➜ spectrafit-pkl-converter -h
usage: spectrafit-pkl-converter [-h] [-f {utf-16,utf-8,latin1,utf-32}] [-e {pkl.gz,pkl,npy,npz}] infile
Converter for 'SpectraFit' from pkl files to CSV files.
positional arguments:
infile Filename of the pkl file to convert.
options:
-h, --help show this help message and exit
-f {latin1,utf-16,utf-8,utf-32}, --file-format {latin1,utf-16,utf-8,utf-32}
File format for the optional encoding of the pickle file. Default is 'latin1'.
-e {pkl.gz,pkl,npy,npz}, --export-format {pkl.gz,pkl,npy,npz}
File format for export of the output file. Default is 'pkl'.
The following export files are possible:
-
pkl
: Pickle file aspkl
file and compressedpkl.gz
file. -
npy
: Numpy array asnpy
file and compressednpz
file.
In case of using other file formats, the spectrafit-pkl-converter
supports the following file formats:
-
utf-8
: UTF-8 encoded file. -
utf-16
: UTF-16 encoded file. -
utf-32
: UTF-32 encoded file. -
latin1
: Latin-1 encoded file.
All keys up to the first key-value pair of a numpy.ndarray
or list
are merged into a single string, which is used as a new filename. A list will be converted to a numpy.ndarray
with the shape (len(list),)
.
graph LR
.pkl --> dict_1
.pkl --> dict_2
.pkl --> dict_3
.pkl --> dict_4
dict_1 --> dict_1.pkl
dict_2 --> dict_2.pkl
dict_3 --> dict_3.pkl
dict_4 --> dict_4.pkl
Using the spectrafit-pkl-converter
as a Python module
In the case of using spectrafit-pkl-converter
as a Python module, the following:
from spectrafit.plugins.pkl_converter import PklConverter
pkl_converter = PklConverter()
list_dict = pkl_converter.convert_pkl_to_csv(
infile="test.pkl",
)
The list_dict
variable contains the converted data as a list of dictionaries.
See also:
Bases: Converter
Convert pkl data to a CSV files.
General information
The pkl data is converted to a CSV file. The CSV file is saved in the same directory as the input file. The name of the CSV file is the same as the input file with the suffix .csv
and prefixed with the name of the 'major' keys in the pkl file. Furthermore, a graph of the data is optionally saved as a PDF file to have a visual representation of the data structure.
Supported file formats
Currently supported file formats:
-[x] pkl -[x] pkl.gz -[x] ...
Attributes:
Name | Type | Description |
---|---|---|
choices_fformat | Set[str] | The choices for the file format. |
choices_export | Set[str] | The choices for the export format. |
Source code in spectrafit/plugins/pkl_converter.py
class PklConverter(Converter):
"""Convert pkl data to a CSV files.
!!! info "General information"
The pkl data is converted to a CSV file. The CSV file is saved in the same
directory as the input file. The name of the CSV file is the same as the
input file with the suffix `.csv` and prefixed with the name of the
'major' keys in the pkl file. Furthermore, a graph of the data is optionally
saved as a PDF file to have a visual representation of the data structure.
!!! info "Supported file formats"
Currently supported file formats:
-[x] pkl
-[x] pkl.gz
-[x] ...
Attributes:
choices_fformat (Set[str]): The choices for the file format.
choices_export (Set[str]): The choices for the export format.
"""
choices_fformat = {"latin1", "utf-8", "utf-16", "utf-32"}
choices_export = {"npy", "npz", "pkl", pkl_gz}
def get_args(self) -> Dict[str, Any]:
"""Get the arguments from the command line.
Returns:
Dict[str, Any]: Return the input file arguments as a dictionary without
additional information beyond the command line arguments.
"""
parser = argparse.ArgumentParser(
description="Converter for 'SpectraFit' from pkl files to CSV files.",
usage="%(prog)s [options] infile",
)
parser.add_argument(
"infile",
type=Path,
help="Filename of the pkl file to convert.",
)
parser.add_argument(
"-f",
"--file-format",
help="File format for the optional encoding of the pickle file."
" Default is 'latin1'.",
type=str,
default="latin1",
choices=self.choices_fformat,
)
parser.add_argument(
"-e",
"--export-format",
help="File format for export of the output file. Default is 'pkl'.",
type=str,
default="pkl",
choices=self.choices_export,
)
return vars(parser.parse_args())
@staticmethod
def convert(infile: Path, file_format: str) -> Dict[str, Any]:
"""Convert the input file to the output file.
Args:
infile (Path): The input file of the as a path object.
file_format (str): The output file format.
Returns:
Dict[str, Any]: The data as a dictionary, which can be a nested dictionary
"""
def _convert(
data_values: Dict[str, Any], _key: Optional[List[str]] = None
) -> List[Dict[str, Any]]:
"""Convert the data to a list of dictionaries.
The new key is the old key plus all the subkeys. The new value is the
value of the subkey if the value is an instance of an array.
For avoiding `pylint` errors, the `_key` argument is set to `None` by
default and is set to an empty list if it is `None`. This is done to
avoid the `pylint` error `dangerous-default-value`. The `_key` argument
is used to keep track of the keys of the nested dictionary. Furthermore,
the `_key` argument is used to create the new key for the new dictionary.
Finally, the new dictionary is appended to the list of dictionaries.
Args:
data_values (Dict[str, Any]): The data as a dictionary.
Returns:
List[Dict[str, Any]]: The data as a list of dictionaries.
"""
data_list = []
if _key is None:
_key = []
for key, value in data_values.items():
if isinstance(value, dict):
_key.append(str(key))
data_list.extend(_convert(value, _key))
_key.pop()
elif isinstance(value, np.ndarray):
data_list.append({"_".join(_key + [key]): value})
return data_list
data_dict = {}
for key, value in pkl2any(infile, file_format).items():
if isinstance(value, dict):
data_dict[key] = _convert(value)
return data_dict
def save(self, data: Any, fname: Path, export_format: str) -> None:
"""Save the converted pickle data to a file.
Args:
data (Any): The converted nested dictionary of the pkl data.
fname (Path): The filename of the output file.
export_format (str): The file format of the output file.
Raises:
ValueError: If the export format is not supported.
"""
if export_format.lower() not in self.choices_export:
raise ValueError(f"Unsupported file format '{export_format}'.")
fname = pure_fname(fname)
for key, value in data.items():
_fname = Path(f"{fname}_{key}").with_suffix(f".{export_format}")
ExportData(data=value, fname=_fname, export_format=export_format)()
def __call__(self) -> None:
"""Run the converter."""
args = self.get_args()
data = self.convert(args["infile"], args["file_format"])
self.save(data, args["infile"], args["export_format"])
__call__()
¶
convert(infile, file_format)
staticmethod
¶
Convert the input file to the output file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
infile | Path | The input file of the as a path object. | required |
file_format | str | The output file format. | required |
Returns:
Type | Description |
---|---|
Dict[str, Any] | Dict[str, Any]: The data as a dictionary, which can be a nested dictionary |
Source code in spectrafit/plugins/pkl_converter.py
@staticmethod
def convert(infile: Path, file_format: str) -> Dict[str, Any]:
"""Convert the input file to the output file.
Args:
infile (Path): The input file of the as a path object.
file_format (str): The output file format.
Returns:
Dict[str, Any]: The data as a dictionary, which can be a nested dictionary
"""
def _convert(
data_values: Dict[str, Any], _key: Optional[List[str]] = None
) -> List[Dict[str, Any]]:
"""Convert the data to a list of dictionaries.
The new key is the old key plus all the subkeys. The new value is the
value of the subkey if the value is an instance of an array.
For avoiding `pylint` errors, the `_key` argument is set to `None` by
default and is set to an empty list if it is `None`. This is done to
avoid the `pylint` error `dangerous-default-value`. The `_key` argument
is used to keep track of the keys of the nested dictionary. Furthermore,
the `_key` argument is used to create the new key for the new dictionary.
Finally, the new dictionary is appended to the list of dictionaries.
Args:
data_values (Dict[str, Any]): The data as a dictionary.
Returns:
List[Dict[str, Any]]: The data as a list of dictionaries.
"""
data_list = []
if _key is None:
_key = []
for key, value in data_values.items():
if isinstance(value, dict):
_key.append(str(key))
data_list.extend(_convert(value, _key))
_key.pop()
elif isinstance(value, np.ndarray):
data_list.append({"_".join(_key + [key]): value})
return data_list
data_dict = {}
for key, value in pkl2any(infile, file_format).items():
if isinstance(value, dict):
data_dict[key] = _convert(value)
return data_dict
get_args()
¶
Get the arguments from the command line.
Returns:
Type | Description |
---|---|
Dict[str, Any] | Dict[str, Any]: Return the input file arguments as a dictionary without additional information beyond the command line arguments. |
Source code in spectrafit/plugins/pkl_converter.py
def get_args(self) -> Dict[str, Any]:
"""Get the arguments from the command line.
Returns:
Dict[str, Any]: Return the input file arguments as a dictionary without
additional information beyond the command line arguments.
"""
parser = argparse.ArgumentParser(
description="Converter for 'SpectraFit' from pkl files to CSV files.",
usage="%(prog)s [options] infile",
)
parser.add_argument(
"infile",
type=Path,
help="Filename of the pkl file to convert.",
)
parser.add_argument(
"-f",
"--file-format",
help="File format for the optional encoding of the pickle file."
" Default is 'latin1'.",
type=str,
default="latin1",
choices=self.choices_fformat,
)
parser.add_argument(
"-e",
"--export-format",
help="File format for export of the output file. Default is 'pkl'.",
type=str,
default="pkl",
choices=self.choices_export,
)
return vars(parser.parse_args())
save(data, fname, export_format)
¶
Save the converted pickle data to a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data | Any | The converted nested dictionary of the pkl data. | required |
fname | Path | The filename of the output file. | required |
export_format | str | The file format of the output file. | required |
Raises:
Type | Description |
---|---|
ValueError | If the export format is not supported. |
Source code in spectrafit/plugins/pkl_converter.py
def save(self, data: Any, fname: Path, export_format: str) -> None:
"""Save the converted pickle data to a file.
Args:
data (Any): The converted nested dictionary of the pkl data.
fname (Path): The filename of the output file.
export_format (str): The file format of the output file.
Raises:
ValueError: If the export format is not supported.
"""
if export_format.lower() not in self.choices_export:
raise ValueError(f"Unsupported file format '{export_format}'.")
fname = pure_fname(fname)
for key, value in data.items():
_fname = Path(f"{fname}_{key}").with_suffix(f".{export_format}")
ExportData(data=value, fname=_fname, export_format=export_format)()
:members: :undoc-members:
PKL Visualizer¶
The spectrafit-pkl-visualizer
should be used for the visualization of the pkl files. It creates a graph of the pkl file and exports it as a graph file.
The spectrafit-pkl-visualizer
command line tool can be used like this:
➜ spectrafit-pkl-visualizer -h
usage: spectrafit-pkl-visualizer [-h] [-f {utf-32,utf-16,latin1,utf-8}] [-e {jpg,pdf,jpeg,png}] infile
Converter for 'SpectraFit' from pkl files to a graph.
positional arguments:
infile Filename of the pkl file to convert to graph.
options:
-h, --help show this help message and exit
-f {latin1,utf-16,utf-8,utf-32}, --file-format {latin1,utf-16,utf-8,utf-32}
File format for the optional encoding of the pickle file. Default is 'latin1'.
-e {jpg,pdf,jpeg,png}, --export-format {jpg,pdf,jpeg,png}
File extension for the graph export.
Furthermore the spectrafit-pkl-visualizer
allows export the structure of the pkl file as a JSON file. The information about the attributes and their structure is stored in the JSON file. The following example shows the structure of the JSON file:
Example of the JSON file
{
"file_1": {
"attribute_1": "<class 'list'>",
"attribute_2": "<class 'str'>",
"attribute_3": "<class 'numpy.ndarray'> of shape (201,)",
"attribute_4": "<class 'numpy.ndarray'> of shape (199,)",
"attribute_5": "<class 'numpy.ndarray'> of shape (10, 201, 10000)",
"attribute_6": "<class 'numpy.ndarray'> of shape (10, 201, 10000)",
"attribute_7": "<class 'numpy.ndarray'> of shape (10, 201, 10000)",
"attribute_8": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
"attribute_9": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
"attribute_10": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
"attribute_11": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
"attribute_12": "<class 'numpy.ndarray'> of shape (10000,)",
"attribute_13": "<class 'list'>",
"attribute_14": "<class 'numpy.ndarray'> of shape (10, 201)",
"attribute_16": "<class 'int'>",
"attribute_17": "<class 'str'>",
"attribute_19": "<class 'str'>"
},
"file_2": {
"attribute_1": "<class 'list'>",
"attribute_2": "<class 'str'>",
"attribute_3": "<class 'numpy.ndarray'> of shape (201,)",
"attribute_4": "<class 'numpy.ndarray'> of shape (199,)",
"attribute_5": "<class 'numpy.ndarray'> of shape (10, 201, 10000)",
"attribute_6": "<class 'numpy.ndarray'> of shape (10, 201, 10000)",
"attribute_7": "<class 'numpy.ndarray'> of shape (10, 201, 10000)",
"attribute_8": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
"attribute_9": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
"attribute_10": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
"attribute_11": "<class 'numpy.ndarray'> of shape (10, 199, 10000)",
"attribute_12": "<class 'numpy.ndarray'> of shape (10000,)",
"attribute_13": "<class 'list'>",
"attribute_14": "<class 'numpy.ndarray'> of shape (10, 201)",
"attribute_16": "<class 'int'>",
"attribute_17": "<class 'str'>",
"attribute_19": "<class 'str'>"
}
}
Example of the graph
The resulting graph looks like this: