Releases · hristo-vrigazov/mmap.ninja

Accelerate the iteration over your machine learning dataset by up to 20 times !

mmap_ninja is a library for storing your datasets in memory-mapped files, which leads to a dramatic speedup in the training time.

The only dependencies are numpy and tqdm.

You can use mmap_ninja with any training framework (such as Tensorflow, PyTorch, MxNet), etc., as it stores your dataset as a memory-mapped numpy array.

A memory mapped file is a file that is physically present on disk in a way that the correlation between the file and the memory space permits applications to treat the mapped portions as if it were primary memory, allowing very fast I/O!

When working on a machine learning project, one of the most time-consuming parts is the model's training. However, a large portion of the training time actually consists of just iterating over your dataset and filesystem I/O!

This library, mmap_ninja provides high-level, easy to use, well tested API for using memory maps for your datasets, reducing the time needed for training.

Memory maps would usually take a little more disk space though, so if you are willing to trade some disk space for fast filesystem to memory I/O, this is your library!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: hristo-vrigazov/mmap.ninja

v0.7.0: Performance improvements in read-only mode

v0.2.4: First public version with docstrings