Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register custom_model with custom python package pygam #130

Open
benleit opened this issue Nov 12, 2024 · 1 comment
Open

Register custom_model with custom python package pygam #130

benleit opened this issue Nov 12, 2024 · 1 comment

Comments

@benleit
Copy link

benleit commented Nov 12, 2024

The goal is to use Snowflake’s Model Registry to store and deploy the pygam model, leveraging Snowflake's Snowpark and ML features. However, issues arise when registering the model due to compatibility constraints with the external package pygam.

Here is a minimal example that I tried to run in a Snowflake Notebook (I was able to use pygam via stage packages within Snowflake notebook):

# Import necessary Snowpark and Snowflake ML modules
from snowflake.snowpark.context import get_active_session
session = get_active_session()

import numpy as np
import pandas as pd
from pygam import LinearGAM, s
from snowflake.ml.model import custom_model
from snowflake.ml.model import model_signature
from snowflake.ml.registry import Registry

# Step 1: Generate synthetic data for model training
np.random.seed(0)
X = np.linspace(0, 10, 100).reshape(-1, 1)  # 100 samples, single feature
y = np.sin(X).ravel() + np.random.normal(scale=0.1, size=X.shape[0])  # Noisy sine wave data

# Step 2: Train a simple `pygam` model with smoothing on the synthetic data
pygam_model = LinearGAM(s(0)).fit(X, y)

# Step 3: Test the model by making predictions on new data
X_test = np.linspace(0, 10, 10).reshape(-1, 1)  # New test data
predictions = pygam_model.predict(X_test)
print("Predictions on new data:", predictions)  # Expected output for verification

# Step 4: Define a Custom Model class to wrap `pygam` in Snowflake
class PyGAMModel(custom_model.CustomModel):
    @custom_model.inference_api
    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
        model_output = self.context["models"].predict(X)
        return model_output

# Step 5: Create the Model Context and pass in the trained `pygam` model
mc = custom_model.ModelContext(
    models=pygam_model
)

# Instantiate the Custom Model
pygam_model = PyGAMModel(mc)

# Test prediction in the custom model context
output_pd = pygam_model.predict(X_test)
print("Custom Model Prediction Output:", output_pd)

# Step 6: Register the Model in the Snowflake Model Registry
registry = Registry(
    session=session, 
    database_name="YOUR_DATABASE",  # Replace with your database
    schema_name="YOUR_SCHEMA",      # Replace with your schema
)

# Attempt to log the model
registry.log_model(
    model=pygam_model,
    model_name="pygam_model",
    version_name="v1",
    sample_input_data=X_test,
    comment="Test deployment of a pygam model as a Custom Model"
)

This is the error message I get:

AssertionError
Traceback:
File "Cell [cell25]", line 52, in <module>
    registry.log_model(
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/_internal/telemetry.py", line 527, in wrap
    return ctx.run(execute_func_with_statement_params)
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/_internal/telemetry.py", line 503, in execute_func_with_statement_params
    result = func(*args, **kwargs)
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/registry/registry.py", line 288, in log_model
    return self._model_manager.log_model(
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/registry/_manager/model_manager.py", line 82, in log_model
    return self._log_model(
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/registry/_manager/model_manager.py", line 164, in _log_model
    model_metadata: model_meta.ModelMetadata = mc.save(
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/model/_model_composer/model_composer.py", line 111, in save
    model_metadata: model_meta.ModelMetadata = self.packager.save(
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/model/_packager/model_packager.py", line 87, in save
    handler.save_model(
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/model/_packager/model_handlers/custom.py", line 101, in save_model
    assert handler is not None

I tried conda_dependencies, pip_requirements, ext_modules & code_paths without success. Support for arbitrary pip package installation (beyond the Snowflake Anaconda Channel) in the Snowflake model registry would significantly improve the flexibility of deploying custom models with niche or specialized packages, like pygam. Is there any solution to this problem that is currently available? This is a big show stopper for us in migrating to use Snowflake's ML features.

@benleit benleit changed the title Register custom_model with custom python package pygam Register custom_model with custom python package pygam Nov 12, 2024
@sfc-gh-sdas
Copy link
Collaborator

sfc-gh-sdas commented Nov 22, 2024

Hi,
Thanks for filing issue with us.

First of all, you are trying to define a custom model of pygam because this is not supported in Snowflake conda channel. To make it running in warehouse, you may package the library as long all of its dependencies are available in Snowflake conda channel. If so, there are 2 things to note:

  1. Please make sure to include all the dependencies as your model dependency
  2. Entire pygam python code needs to be packaged with the model itself (MODEL object must be self-contained and cannot refer to Stage).

Then comes confusion of CustomModel API:
3. All the objects mentioned in models attribute of ModelContext must be a known model to registry. In this case, pygam is not known to Registry. Thus you get the error of assert handler is not None (that is we do not know the handler).

With that, here is what I would suggest:
a. follow (1)
b. instead of (2), try to pickle the model including the pygam library as value (not reference) thus pickle file will include all python modules needed/referred.
c. Pass the pickle file in artifacts attribute of context. Then in __init__ load the pickle file.

As you can see, there are lots of caveats. We have an alternative to that warehouse by running the model in SPCS. See https://docs.snowflake.com/en/developer-guide/snowflake-ml/model-registry/container . On SPCS, pip is supported natively. You do not need to take care if (1), (b) can only take care of serializing the model (either via pickle or some native save/load API if pygam supports).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants