Skip to content

End to end ML pipeline written with open source tools exclusively

Notifications You must be signed in to change notification settings

arghhjayy/EndToEndML

Repository files navigation

End to end ML Project

Project setup:

  1. Open this in VSCode
  2. Install Dev Containers
  3. Do Cmd + Shift + P -> Dev Containers: Rebuild Container Without Cache
  4. Activate the conda virtual environment: source activate endtoend
  5. Inside Dev Container, run mlflow and prefect local servers: nohup bash ./start_backend.sh

Model training:

Run: python main.py

Model training using Docker:

Build: docker build . -t endtoend:latest

Run: docker run endtoend:latest

Model serving (deployment)

For batch inference, do the following:

  1. Start the data generation worker process in a terminal instance: make start_data_generator_worker
  2. Start the batch inference worker process in another terminal instance: make start_batch_inference_worker
  3. Deploy the flows for #1 and #2: prefect deploy --all

If you want to run it for debugging, make sure you change the CRON expressions in prefect.yaml

Tools used

  • Pandas for data processing/engineering
  • Sklearn for feature engineering and model development
  • Pytest for testing
  • MLFlow for experimentation tracking
  • Prefect for workflow management(done) + orchestration (TBD)
  • Black, isort and Flake8 for code styling and linting

TODO list:

  • ✅ Train and test flow
  • ✅ Log metrics and artifacts to MLFlow
  • ✅ Prefect for workflows
  • ✅ Makefile
  • ✅ Basic tests
  • ⌛ Model monitoring + scheduling it
  • ⌛ Containerization
  • ✅ Use databases for input/output
  • 🔜 Feature store and vector store - TBD
  • 🔜 Streaming features

About

End to end ML pipeline written with open source tools exclusively

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published