Skip to content

radema/datascience-personal-templates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Project Template

This template has been built after reading the Medium article by khuyetran1401. It would be much simpler to just fork its repo but I prefer to build it by myself to understand each component. It has been built to be easy and quick to use.

For 'industrial' or more 'business' projects, I still prefer tools like Kedro.

Features and Roadmap

✅ Automatically build repository structure for DS personal projects

✅ Create and Build an environment using conda

🔲 Run Tests automatically

🔲 Manage configuration variables for data pipelines and projects

✅ Enforce hints and quality code

🔲 Automatically Document Code

🔲 Automate Code

DVC for Data Management and Experiment Management

To Do

  • Automate setup of dvc repo and .gitignore

Tools used

  • Conda: Package, dependency and environment management
  • pre-commit: framework for managing and maintaining multi-language pre-commit hooks.

Template Structure

.
├── config                       # Project configuration files
│   ├──environment.yml           # Environment file for conda
├── data                         # Local project data (not committed to version control)
│   ├── 01_raw                   # Raw immutable data
│   ├── 02_primary               # Domain model data
│   ├── 03_feature               # Model features
│   ├── 04_model_input           # Often called 'master tables'
│   ├── 05_model_output          # Data generated by model runs
│   ├── 06_reporting             # Ad hoc descriptive cuts
├── docs                         # Project documentation
├── models                       # Project configuration files
├── notebooks                    # Project related Jupyter notebooks (used for experimental code before moving code to src)
├── README.md                    # Project README
└── src                          # Project source code
    └── main.py

How to use this template

Install Cookiecutter:

pip install cookiecutter

Create a project based on the template:

cookiecutter https://github.com/radema/datascience-personal-templates

Activate the new environment

conda activate {{cookiecutter.environment_name}}

Execute setup in terminal

cd {{cookiecutter.repository-name}}; make setup

Resources and references

About

Repository to track my templates for personal projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published