Skip to content

The SF crime repository and the data-set I am working with

Notifications You must be signed in to change notification settings

necro351/sfcrime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Logistic regression classification of SF crimes

The SF crime dataset used here contains over 800,000 rows of data in the below format:

2003-01-06 00:31:00,"OTHER, OFFENSES","DRIVERS LICENSE, SUSPENDED OR REVOKED",Monday,RICHMOND,"ARREST, CITED",CLEMENT ST / 14TH AV,-122.472984835661,37.7825523645525

Each row includes the time, place, and category of crime. There are 39 different categories. These scripts build a logistic regression model that can predict the category of crime given the time and place.

To run, first build and run the container:

docker build -t sfcrime .
docker run -v `pwd`/src:/src/sfcrime -v `pwd`/data:/data/sfcrime -i -t docker bash

Next from inside the container use the make.sh script to parse the data into a format that can be easily loaded into Octave:

./make.sh parsetrain

Finally use octave to build engineered features, train the classifier on a random sampled subset of the data, and finally classify using that model:

octave
octave> sfcrimeFeatures
octave> sfcrimeLogisticTrain
octave> sfcrimeLogisticClassify

That's it! You should get a 23% prediction accuracy.

About

The SF crime repository and the data-set I am working with

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published