TaxLess - TeamTax

See below for the prizes we won at GovHack Finals 2016!

https://2016.hackerspace.govhack.org/content/taxless

This platform was designed for the GovHack hackathon hosted in Sydney in July 2016.

TaxLess uses open ATO data to optimise your next return, by comparing your deductions with people like you.

By analsying over 250 thousand tax returns, our web tool discovers where your deductions are falling short, and what can you do about them.

Using a stack that consisted of sci-kit learn and other machine learning tools, we were able to determine the amount of tax that someone should be receiving, by analyzing individals from the data set using deep learning and making links with the data provided by the user.

The information provided by the user consisted of:

Gender
Age Range
Occupancy Category
Marital Status
Postcode
Was a tax agent used
Sum of salary and wages

Using your age, occupation, earnings amount, region and gender, we group you with similar taxpayers, and find out how you rank in the various deduction categories among them using a clustering algorithm.

A significant difference likely indicates that you are either under claiming or under investing. For example, you may be self-contributing nothing to supperannuation when others are doing 10%, or you simply forgot to report expenses.

By using the ATO's open tax return datasets, which covers 2% of taxpayers, we can also show users with no statistical background key difference and similiarities between different demographics.

Visulisated population clusters

http://plnkr.co/oV69rrF338cHuLqm3lEG

Video Entry for GovHack2016

https://youtu.be/x9Xy2qFaL24

Dataset Name:

Taxation statistics - Individual tax return sample files

Publishing Organisation/Agency:

ATO Jurisdiction of Data: Australian Government

Dataset URL:

http://portal.govhack.org/datasets/2016/australia/australian-taxation-office/tax...

How did you use this data in your entry?:

Firstly, we use K-means clustering (by Occupation code, salary and wages amount, region, age, marital status and gender) to group users into clusters. Secondly, within the clusters we use ridge regression (aided with dummy variables for the discrete variables) to understand, in a fine grained way, how tax effective particular users are and where they get deductions. Finally, we profile the clustering by showing the distribution of variables in the different groups to study their differences among the clusters.

Prizes Won

Data Intelligence Prize

Smarter Data Prize

Google Machine Learning Prize

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
lambda-flask		lambda-flask
lambda-rails		lambda-rails
predstuff/models		predstuff/models
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TaxLess - TeamTax

Visulisated population clusters

Video Entry for GovHack2016

Dataset Name:

Publishing Organisation/Agency:

Dataset URL:

How did you use this data in your entry?:

Prizes Won

About

Releases

Packages

Contributors 2

Languages

glen-mac/Lambda

Folders and files

Latest commit

History

Repository files navigation

TaxLess - TeamTax

Visulisated population clusters

Video Entry for GovHack2016

Dataset Name:

Publishing Organisation/Agency:

Dataset URL:

How did you use this data in your entry?:

Prizes Won

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages