This repository is a part of 3rd place solution in Kaggle PLAsTiCC Astronomical Classification. Our team (major tom) consists of 3 members (mamas, yuvals, nyanp), and this repository contains code of nyanp's models and features which was used by models of teammates.
- features/ : Feature engineering functions
- model/ : Training LightGBM
- lsst/ : LSST throuputs files copied from sncosmo
- config.py : Configuration relates to I/O directory and debug mode
- step*.py : Entrypoint of feature engineering and modeling
You can run script both Windows and Linux. If you use your local machine, ~200GB RAM is required to run.
- Prepare a virtual environment for this project (python 3.5 is recommended)
- Clone this repository
- Run
pip install -r requirements.txt
Recommended environment:
- OS: Ubuntu 16.04 LTS
- CPU: vCPU x 32
- RAM: 208GB
- Storage: 100GB+ (SSD storage is recommended)
After creating GCP instance, connect it with SSL and follow these commands:
sudo apt-get update
sudo apt install gcc g++ git tmux
wget https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh
sh Anaconda3-5.3.1-Linux-x86_64.sh
conda create -n plasticc python=3.5
source activate plasticc
git clone https://github.com/nyanp/kaggle-PLASTiCC
cd kaggle-PLASTiCC
git checkout cleanup
pip install numpy
pip install -r requirements.txt
mkdir input
cd input
- download following files to
input
directory (via Kaggle API or any way you want)- training_set.csv
- test_set.csv
- training_set_metadata.csv
- test_set_metadata.csv
You can change I/O directory and running mode by editing config.py. See comments in the file in detail.
See entry_points.md. If you just want to use precompiled feature set to train yuval/mamas's model, use feature files in share/ directory.