paste in git clone https://github.com/esemoney/research-skeleton.git
to your terminal and press enter.
next, type cd research-skeleton
adapted from Good research code - Patrick Mineault
- Pick a name and create a folder for the project
- Initialize a git repository and sync to Github
- Set up a virtual environment
- Create a project skeleton
- Install a project package
The end result will be a logically organized project skeleton that’s synced to version control.
After picking a name and making a directory for your research project, initialize and sync your git repository to github.
- create a README.md file in the directory.
- run
git init
git add README.md
git commit -m "first commit"
git branch -M main
git remote add origin https://github.com/your-user-name/and-directory-name.git
git push -u origin main
Virtual environments are used to manage dependencies. Each virtual environment specifies which versions of software and packages a project uses. The specs can be different for different projects, and each virtual environment can be easily swapped, created, duplicated or destroyed. Virtual ennviroments solve the problem of one big monolithic Python environment when every package is installed in that one environment. Since this environment is not documented anywhere, if you need to move to another computer, or need to recreate the environment from scratch several months later, you would be in for several hours or days of frustration.
conda is both a package manager (something that installs package on your system) and a virtual environment manager (something that can swap out different combinations of packages and binaries - virtual environments - easily).
conda create --name codebook python=3.8
conda activate codebook
conda install pandas numpy scipy matplotlib seaborn
conda env export > environment.yml
- ``
- data: Where you put raw data for your project. You usually won’t sync this to source control, unless you use very small, text-based datasets (< 10 MBs).
- docs: Where you put documentation, including Markdown and reStructuredText (reST). Calling it docs makes it easy to publish documentation online through Github pages.
- results: Where you put results, including checkpoints, hdf5 files, pickle files, as well as figures and tables. If these files are heavy, you won’t put these under source control.
- scripts: Where you put scripts - Python and bash alike - as well as .ipynb notebooks.
- src: Where you put reusable Python modules for your project. This is the kind of python code that you import.
- tests: Where you put tests for your code
You can create this project directory structure in the terminal with:
mkdir {data,docs,results,scripts,src,tests}