Data Science Project Template

This template has been built after reading the Medium article by khuyetran1401. It would be much simpler to just fork its repo but I prefer to build it by myself to understand each component. It has been built to be easy and quick to use.

For 'industrial' or more 'business' projects, I still prefer tools like Kedro.

Features and Roadmap

✅ Automatically build repository structure for DS personal projects

✅ Create and Build an environment using conda

🔲 Run Tests automatically

🔲 Manage configuration variables for data pipelines and projects

✅ Enforce hints and quality code

🔲 Automatically Document Code

🔲 Automate Code

✅ DVC for Data Management and Experiment Management

To Do

Automate setup of dvc repo and .gitignore

Tools used

Conda: Package, dependency and environment management
pre-commit: framework for managing and maintaining multi-language pre-commit hooks.

Template Structure

.
├── config                       # Project configuration files
│   ├──environment.yml           # Environment file for conda
├── data                         # Local project data (not committed to version control)
│   ├── 01_raw                   # Raw immutable data
│   ├── 02_primary               # Domain model data
│   ├── 03_feature               # Model features
│   ├── 04_model_input           # Often called 'master tables'
│   ├── 05_model_output          # Data generated by model runs
│   ├── 06_reporting             # Ad hoc descriptive cuts
├── docs                         # Project documentation
├── models                       # Project configuration files
├── notebooks                    # Project related Jupyter notebooks (used for experimental code before moving code to src)
├── README.md                    # Project README
└── src                          # Project source code
    └── main.py

How to use this template

Install Cookiecutter:

pip install cookiecutter

Create a project based on the template:

cookiecutter https://github.com/radema/datascience-personal-templates

Activate the new environment

conda activate {{cookiecutter.environment_name}}

Execute setup in terminal

cd {{cookiecutter.repository-name}}; make setup

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
{{cookiecutter.directory_name}}		{{cookiecutter.directory_name}}
.gitignore		.gitignore
README.md		README.md
cookiecutter.json		cookiecutter.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Project Template

Features and Roadmap

To Do

Tools used

Template Structure

How to use this template

Resources and references

About

Releases

Packages

Languages

radema/datascience-personal-templates

Folders and files

Latest commit

History

Repository files navigation

Data Science Project Template

Features and Roadmap

To Do

Tools used

Template Structure

How to use this template

Resources and references

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages