Logistic regression classification of SF crimes

The SF crime dataset used here contains over 800,000 rows of data in the below format:

2003-01-06 00:31:00,"OTHER, OFFENSES","DRIVERS LICENSE, SUSPENDED OR REVOKED",Monday,RICHMOND,"ARREST, CITED",CLEMENT ST / 14TH AV,-122.472984835661,37.7825523645525

Each row includes the time, place, and category of crime. There are 39 different categories. These scripts build a logistic regression model that can predict the category of crime given the time and place.

To run, first build and run the container:

docker build -t sfcrime .
docker run -v `pwd`/src:/src/sfcrime -v `pwd`/data:/data/sfcrime -i -t docker bash

Next from inside the container use the make.sh script to parse the data into a format that can be easily loaded into Octave:

./make.sh parsetrain

Finally use octave to build engineered features, train the classifier on a random sampled subset of the data, and finally classify using that model:

octave
octave> sfcrimeFeatures
octave> sfcrimeLogisticTrain
octave> sfcrimeLogisticClassify

That's it! You should get a 23% prediction accuracy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls