Project setup:
- Open this in VSCode
- Install Dev Containers
- Do
Cmd + Shift + P
->Dev Containers: Rebuild Container Without Cache
- Activate the conda virtual environment:
source activate endtoend
- Inside Dev Container, run mlflow and prefect local servers:
nohup bash ./start_backend.sh
Run: python main.py
Build: docker build . -t endtoend:latest
Run: docker run endtoend:latest
For batch inference, do the following:
- Start the data generation worker process in a terminal instance:
make start_data_generator_worker
- Start the batch inference worker process in another terminal instance:
make start_batch_inference_worker
- Deploy the flows for #1 and #2:
prefect deploy --all
If you want to run it for debugging, make sure you change the CRON expressions in prefect.yaml
- Pandas for data processing/engineering
- Sklearn for feature engineering and model development
- Pytest for testing
- MLFlow for experimentation tracking
- Prefect for workflow management(done) + orchestration (TBD)
- Black, isort and Flake8 for code styling and linting
- ✅ Train and test flow
- ✅ Log metrics and artifacts to MLFlow
- ✅ Prefect for workflows
- ✅ Makefile
- ✅ Basic tests
- ⌛ Model monitoring + scheduling it
- ⌛ Containerization
- ✅ Use databases for input/output
- 🔜 Feature store and vector store - TBD
- 🔜 Streaming features