This repository contains a modular Streamlit application that provides comprehensive analysis and forecasting for Snowflake storage usage. The application is divided into several components that handle different aspects of the analysis, such as data querying, visualization, forecasting, and recommendations. The modular approach makes the code more maintainable, readable, and reusable.
This module contains the logic for creating and managing a Snowflake session using the Snowpark API. It handles both OAuth token-based authentication and traditional username/password-based authentication. The session creation is abstracted to allow reuse across different parts of the application.
This module centralizes all the SQL queries used in the application. It contains functions to run these queries and return the results as pandas DataFrames. The queries themselves are organized into functions, which accept parameters to dynamically generate the correct SQL for different analyses.
This module handles all the plotting and visualization tasks within the Streamlit app. It uses Plotly to create interactive charts, such as line charts for storage trends, pie charts for storage breakdowns, and bar charts for unused table costs. The visualizations are designed to be clear and informative, providing insights at a glance.
This module contains the logic for generating storage forecasts using Snowflake’s machine learning capabilities. The forecast generation involves creating training data, building a forecast model, and generating predictions for future storage usage. The forecast results are then returned for visualization and further analysis.
This module generates recommendations based on the data analysis and forecast results. It assesses factors like projected storage growth, unused tables, and overall storage usage, and provides actionable insights to help optimize storage and reduce costs. The recommendations are then displayed in the Streamlit app.
This is the main entry point of the application. It ties together the
components from the storage/
directory to create a cohesive user
interface. The application is structured to first fetch and display
storage data, followed by unused tables analysis, storage forecasting,
and recommendations. The session state is managed to ensure that data is
only fetched or computed when necessary, reducing redundant operations
and improving performance.
This file combines all the components into a single script for easier deployment and testing. It is useful for running the application locally or deploying it to Streamlit on Snowflake. The script imports the necessary modules and creates the Streamlit app with all the required functionality. Feel free to delete or only work off this file.
-
Data Fetching and Initialization:
The application initializes session state variables to store the data fetched from Snowflake. It checks if the data is already in the session state; if not, it runs the corresponding SQL queries to fetch the data. -
Data Visualization:
After fetching the data, the application uses thevisualization.py
module to display various charts that show historical storage usage, daily trends, and a breakdown of storage components (active, stage, fail-safe). -
Unused Tables Analysis:
Users can specify criteria (e.g., days since last access, storage cost per TB) to identify unused tables. The application queries Snowflake for tables that meet these criteria and displays the results, including potential cost savings from removing or archiving these tables. -
Storage Forecasting:
Users can generate a forecast for future storage usage by specifying the number of training days and prediction days. The forecast includes upper and lower bounds, which are visualized alongside the predicted usage. The application also provides a cost estimation based on the forecasted storage. -
Recommendations:
Based on the analysis and forecasts, the application generates recommendations to help users optimize their Snowflake storage. These include strategies for managing growth, cleaning up unused tables, and general best practices for storage management.
Before running the application, ensure you have the following dependencies installed:
- Streamlit: For creating the interactive web interface.
- Plotly: For generating interactive visualizations.
- Snowflake-Snowpark-Python: For interacting with Snowflake using the Snowpark API.
You can install these dependencies using the provided environment.yml
file or directly via pip:
pip install streamlit plotly snowflake-snowpark-python
Note: This application uses Snowflake’s Cortex forecast function to predict future storage usage. Ensure your Snowflake account has the necessary permissions to use this function. If you prefer, you can replace this with another forecasting method.
To run the application locally, navigate to the project directory and use the following command:
streamlit run streamlit_app.py
This will start a local Streamlit server and automatically open the application in your web browser.
You can modify the application locally and test your changes immediately. Streamlit’s built-in hot-reloading feature automatically updates the app as you save your changes, making the development process seamless and efficient.
To deploy the application to Streamlit on Snowflake, you need to install
the Snowflake CLI (snow
). This CLI allows you to interact with your
Snowflake instance and manage various resources, including Streamlit
applications.
Install the Snowflake CLI by following the installation instructions.
-
Create a Connection:
Ensure you have a Snowflake connection configured using the Snowflake CLI. This connection will be used to authenticate and deploy your Streamlit application. -
Deploy the Application:
Once you have set up the connection, you can deploy the application using the following command:snow streamlit deploy --open --replace
This command deploys your application to Snowflake’s Streamlit environment, replacing any existing deployment with the same name. The
--open
flag automatically opens the application in your browser after deployment.
You can automate the deployment of your Streamlit application to Snowflake using GitHub Actions. This ensures that every time you push changes to your repository, the application is automatically redeployed.
- GitHub Secrets Setup:
- You need to configure secrets in your GitHub repository for authentication with Snowflake. These secrets include your Snowflake account information, username, password, and any other required credentials.
- Refer to the GitHub Secrets documentation for guidance on setting up and managing secrets.
- GitHub Actions Workflow:
- A workflow file (
.github/workflows/deploy.yml
) should be created to define the steps for deploying the application. This file specifies the actions to be taken on each push, such as running tests and deploying the app.
- A workflow file (
-
Fork or Clone the Repository:
Start by forking or cloning this repository to your GitHub account. -
Push the Repository:
After setting up the repository and secrets, push your code to GitHub. Ensure GitHub Actions is enabled in your repository settings. -
Automated Deployment:
With GitHub Actions configured, every push to the repository triggers an automated workflow that tests and deploys your application to Snowflake’s Streamlit environment.
If you run into any issue please put an issue in the repo and we will get to it asap.