IO#
The IO module in DMF Utils provides tools for handling file operations, including saving, loading, compression, and decompression of various file formats. These utilities simplify working with different data formats, making it easier to manage files and directories in your data workflows.
This module is installed as part of the base package.
pip install dmf-utils
However, there are additional external dependencies required for specific file formats and operations. For example, to work with HDF5 files, you may need to install the h5py package. An error will be raised when you need to install additional dependencies. However, you can install ALL dependencies by running:
pip install dmf-utils[extra]
Overview#
The IO module allows you to:
Save and load data in various formats, including CSV, JSON, HDF5, NumPy, and more.
Compress and decompress files and directories with support for popular formats like ZIP, 7z, tar, gzip, etc.
Other utilities like creating videos from image frames.
This module mainly contains wrappers for specific loaders and savers designed to facilitate common file operations in data processing.
Saving and Loading#
The saving and loading functions in the IO module support a wide variety of formats, automatically inferring the appropriate format based on the file extension or allowing you to specify it explicitly.
|
Save data to a file using the appropriate saver. |
|
Load data from a file using the appropriate loader. |
Supported File Formats#
Loader / Saver |
Extensions |
|---|---|
Pickle |
.pkl, .pickle |
Joblib |
.joblib |
HDF5 |
.h5, .hdf5, .hdf |
JSON |
.json |
Text |
.txt, .html, .log, .md, .rst |
NumPy |
.npz, .npy |
Pandas |
.csv, .parquet, .xlsx, .xls, .feather |
Pillow (Images) |
.jpg, .jpeg, .png, .bmp, .gif, .tiff, .tif, .webp |
PyTorch |
.pt, .pth |
YAML |
.yaml, .yml |
INI |
.ini, .cfg |
MATLAB (Scipy) |
.mat |
Audio (Librosa) |
.wav, .mp3, .flac, .ogg |
Video (OpenCV) |
.mp4, .avi, .mov, .mkv |
Examples#
Saving a DataFrame to a CSV File:
import pandas as pd
from dmf.io import save
df = pd.DataFrame({"a": [1, 2, 3]})
save(df, "data.csv")
Loading a DataFrame from a CSV File:
from dmf.io import load
df = load("data.csv")
print(df)
Compression#
The IO module provides easy-to-use tools for compressing and decompressing files and directories. Supported formats include gzip, bzip2, xz, zip, 7z, and various tar-based formats.
|
Compress a file or directory into a specified format. |
|
Decompress a compressed file. |
Compression Methods#
Examples#
Compressing a Directory:
from dmf.io import compress
compress("my_folder", compression="zip")
Decompressing a File:
from dmf.io import decompress
decompress("my_folder.zip")