This is a cookiecutter template focused on AI, designed for model architecture development, dataset creation, pipeline development, and model deployment using several open and free state-of-the-art tools.
This project was developed with multi-model and multi-dataset studies and implementations in mind. It is designed to use mlflow, dvc, pre-commit, git, docker or podman, jupyter lab, hydra, bentoml, pipenv, and, at your choice, various databases like duckdb or PostgreSQL in local or cloud environments.
The project manages its own environment variables through a .env file integrated with Hydra configurations, offering two development branches (development and production) at the Hydra level. The project also uses pdoc to generate useful documentation in HTML format.
mindmap
markdown[Root **folder_name**]
markdown[**configs**]
database
duckdb
mysql
postgres
sqlite
mlflow
development_mlflow
production_mlflow
optuna
development_optuna
production_optuna
pipeline
modelv1
type
development_type
production_type
markdown[Source **short_title**]
markdown[**notebooks**]
example.ipynb
markdown[**datasets**]
datasetV1
markfown[**deploy**]
modelV1_deploy
docker
markdown[**models**]
modelexample
notebooks
markdown[**train**]
trainV1
steps
dataset
final
processed
raw
docs
This project takes care of configuring all its dependencies and tools. However, it requires that you have the Python package manager (pip) and cookiecutter installed.
sudo apt install python3-pip git && \
pip install --upgrade pip && \
pip install --upgrade cookiecutter
there is two options that are recomended
- On project folder python envioriment
- git default branch as main
export PIPENV_VENV_IN_PROJECT=1
git config --global init.defaultBranch main
To instantiate a project, you can do it just typing
cookiecutter https://github.com/kascesar/artificial-inteligence-template.git
then follow the instruction.
after cloning
chmod +x setup_hooks.sh && \
sh setup_hooks.sh
The git Hooks that set for us are
- black ... (python code fixer. This try to solve issues by its self first)
- check-yaml
- end-of-file-fixer
- trailing-whitespace
- dvc-pre-commit ... (before commit)
- dvc-pre-push ... (before push)
- dvc-post-checkout ... (after switching branch)
R: Anyone, whether a developer, data scientist, or machine learning engineer, who wants to have a clean, simple, scalable, and replicable development environment.
R: For developers using free and/or open-source MLOps and artificial intelligence tools like mlflow, optuna, bentoml, docker, tensorflow, etc ... aimed at studying, developing, and deploying models to production.
R: At least have a moderate understanding of Python, MLflow, DVC, Git, and Hydra.