IO#

The IO module in DMF Utils provides tools for handling file operations, including saving, loading, compression, and decompression of various file formats. These utilities simplify working with different data formats, making it easier to manage files and directories in your data workflows.

This module is installed as part of the base package.

pip install dmf-utils

However, there are additional external dependencies required for specific file formats and operations. For example, to work with HDF5 files, you may need to install the h5py package. An error will be raised when you need to install additional dependencies. However, you can install ALL dependencies by running:

pip install dmf-utils[extra]

Overview#

The IO module allows you to:

  • Save and load data in various formats, including CSV, JSON, HDF5, NumPy, and more.

  • Compress and decompress files and directories with support for popular formats like ZIP, 7z, tar, gzip, etc.

  • Other utilities like creating videos from image frames.

This module mainly contains wrappers for specific loaders and savers designed to facilitate common file operations in data processing.

Saving and Loading#

The saving and loading functions in the IO module support a wide variety of formats, automatically inferring the appropriate format based on the file extension or allowing you to specify it explicitly.

dmf.io.save(data, file_path[, saver])

Save data to a file using the appropriate saver.

dmf.io.load(file_path[, loader])

Load data from a file using the appropriate loader.

Supported File Formats#

Supported File Formats for Saving and Loading#

Loader / Saver

Extensions

Pickle

.pkl, .pickle

Joblib

.joblib

HDF5

.h5, .hdf5, .hdf

JSON

.json

Text

.txt, .html, .log, .md, .rst

NumPy

.npz, .npy

Pandas

.csv, .parquet, .xlsx, .xls, .feather

Pillow (Images)

.jpg, .jpeg, .png, .bmp, .gif, .tiff, .tif, .webp

PyTorch

.pt, .pth

YAML

.yaml, .yml

INI

.ini, .cfg

MATLAB (Scipy)

.mat

Audio (Librosa)

.wav, .mp3, .flac, .ogg

Video (OpenCV)

.mp4, .avi, .mov, .mkv

Examples#

Saving a DataFrame to a CSV File:

import pandas as pd
from dmf.io import save

df = pd.DataFrame({"a": [1, 2, 3]})
save(df, "data.csv")

Loading a DataFrame from a CSV File:

from dmf.io import load

df = load("data.csv")
print(df)

Compression#

The IO module provides easy-to-use tools for compressing and decompressing files and directories. Supported formats include gzip, bzip2, xz, zip, 7z, and various tar-based formats.

dmf.io.compress(input_file[, compression, ...])

Compress a file or directory into a specified format.

dmf.io.decompress(input_file[, output_dir, ...])

Decompress a compressed file.

Compression Methods#

Supported Compression Methods#

Method

Extensions

Password

Folders

gzip

.gz, .gzip

bzip2

.bz2, .bzip2

xz

.xz

zip

.zip

7z (py7zr)

.7z

tar

.tar

tar (+ compression)

.tar.gz, .tar.bz2, .tar.xz

Examples#

Compressing a Directory:

from dmf.io import compress

compress("my_folder", compression="zip")

Decompressing a File:

from dmf.io import decompress

decompress("my_folder.zip")