Skip to content

Latest commit

 

History

History
101 lines (68 loc) · 2.84 KB

README.md

File metadata and controls

101 lines (68 loc) · 2.84 KB

shelf - a lightweight Python artefact store client

What is it?

shelf combines the pytree registry from JAX with the fsspec project.

Similarly to what you do in JAX, registering a pair of serialization and deserialization callbacks allows you to easily save your custom Python types as files anywhere fsspec can reach!

A ⚡️- quick demo

Here's how you register a custom neural network type that uses pickle to store trained models on disk.

# my_model.py
import numpy as np
import pickle
import shelf
import os


class MyModel:
    def __call__(self):
        return 42
    
    def train(self, data: np.ndarray):
        pass
    
    def score(self, data: np.ndarray):
        return 1.


def save_to_disk(model: MyModel, ctx: shelf.Context) -> None:
    """Dumps the model to the directory ``tmpdir`` using `pickle`."""
    fp = ctx.file("my-model.pkl", mode="wb")
    pickle.dump(model, fp)


def load_from_disk(ctx: shelf.Context) -> MyModel:
    """Reloads the previously pickled model."""
    fname, = ctx.filenames
    fp = ctx.file(fname, mode="rb")
    model: MyModel = pickle.load(fp)
    return model


shelf.register_type(MyModel, save_to_disk, load_from_disk)

Now, for example in your training loop, save the model to anywhere using a Shelf:

import numpy as np
from shelf import Shelf

from my_model import MyModel


def train():
    # Initialize a `Shelf` to handle remote I/O.
    shelf = Shelf()
    
    model = MyModel()
    data = np.random.randn(100)

    # Train your model...
    for epoch in range(10):
        model.train(data)
    
    # and save it to S3...
    shelf.put(model, "s3://my-bucket/my-model.pkl")
    # ... or GCS if you prefer...
    shelf.put(model, "gs://my-bucket/my-model.pkl")
    # ... or Azure!
    shelf.put(model, "az://my-blob/my-model.pkl")

Conversely, if you want to reinstantiate a remotely stored model:

def score():
    model = shelf.get("s3://my-bucket/my-model.pkl", MyModel)
    accuracy = model.score(np.random.randn(100))
    
    print(f"And here's how accurately it predicts: {accuracy:.2%}")

And just like that, push and pull your custom models and data artifacts anywhere you like - your service of choice just has to have a supporting fsspec filesystem implementation available.

Installation

⚠️ shelf is an experimental project - expect bugs and sharp edges.

Install it directly from source, for example either using pip or poetry:

pip install git+https://github.com/nicholasjng/shelf.git
# or
poetry add git+https://github.com/nicholasjng/shelf.git

A PyPI package release is planned for the future.