Skip to content

Commit

Permalink
Unbreak links
Browse files Browse the repository at this point in the history
  • Loading branch information
sjmiller8182 committed Mar 9, 2020
1 parent a61b8bc commit c99ef09
Show file tree
Hide file tree
Showing 25 changed files with 40,182 additions and 40,193 deletions.
4 changes: 2 additions & 2 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# override linguist - ignore documentation
*.html linguist-documentation
# override linguist - ignore documentation
*.html linguist-documentation
274 changes: 137 additions & 137 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,137 +1,137 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/
*.zip
data.csv
crisps-dm2.png
bureau.csv
application_train.csv
application_test.csv
.DS_Store
home-credit-default-risk/newFeatures.csv
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/
*.zip
data.csv
crisps-dm2.png
bureau.csv
application_train.csv
application_test.csv
.DS_Store
home-credit-default-risk/newFeatures.csv
68 changes: 34 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,35 @@
# An Analysis of Loan Default Risk: Executive Summary

## Problem Statement
This analysis of Home Credit's Default Risk dataset will focus on generating accurate loan default risk probabilities. Predicting loan defaults is essential to the profitability of banks and, given the competitive nature of the loan market, a bank that collects the right data can offer and service more loans. The target variable of the dataset is the binary label, 'TARGET', indicating whether the loan entered into default status or not.
## Data Source
```bash
https://www.kaggle.com/c/home-credit-default-risk/overview
```
Given the binary nature of the target variable, the analytic task is that of classification. The final model will produce the probability of default for each loan and the predicted probabilities will be evaluated on the area under the Reciever Operating Characteristics curve between the predicted probabilitity of default and whether the loans defaulted or not. We believe that a good predictive model is capable of achieving an accuracy between 70% and 80%.

## Data Understanding
#### 1. Exploratory Analysis: Meaning of the data
The dataset consists of 307,511 individual loans. For the purpose of this assignment, the analysis will be limited to the initial training and test sets, with the addition of three engineered features obtained from the bureau.csv file and labeled ```newFeatures.csv``` .

Data Table | Number of Features
------------ | -------------
application_{train}.csv | 122
application_{test}.csv | 121
newFeatures.csv | 3
bureau.csv | 17
bureau_balance.csv | 3
POS_CASH_balance.csv | 8
installments_payments.csv | 8
credit_card_balance.csv | 23
previous_application.csv | 37


- The data is a collection of nine tables arranged in a schema that is defined here:
![Home Credit Schema](https://storage.googleapis.com/kaggle-media/competitions/home-credit/home_credit.png)

- The data are a mixture of binary indicators, integer values, and continous floating values. The scale of the data varies. Large scale data from income features will need scaling to be more precisely compared to binary values. Data types encompass the entire range nominal categories to ordinal.
- A detailed description of all features in the dataset can be found [here](/HomeCredit_columns_description.csv).
- A list of all features and their associated datatypes can be found [here]().

# An Analysis of Loan Default Risk: Executive Summary

## Problem Statement
This analysis of Home Credit's Default Risk dataset will focus on generating accurate loan default risk probabilities. Predicting loan defaults is essential to the profitability of banks and, given the competitive nature of the loan market, a bank that collects the right data can offer and service more loans. The target variable of the dataset is the binary label, 'TARGET', indicating whether the loan entered into default status or not.
## Data Source
```bash
https://www.kaggle.com/c/home-credit-default-risk/overview
```
Given the binary nature of the target variable, the analytic task is that of classification. The final model will produce the probability of default for each loan and the predicted probabilities will be evaluated on the area under the Reciever Operating Characteristics curve between the predicted probabilitity of default and whether the loans defaulted or not. We believe that a good predictive model is capable of achieving an accuracy between 70% and 80%.

## Data Understanding
#### 1. Exploratory Analysis: Meaning of the data
The dataset consists of 307,511 individual loans. For the purpose of this assignment, the analysis will be limited to the initial training and test sets, with the addition of three engineered features obtained from the bureau.csv file and labeled ```newFeatures.csv``` .

Data Table | Number of Features
------------ | -------------
application_{train}.csv | 122
application_{test}.csv | 121
newFeatures.csv | 3
bureau.csv | 17
bureau_balance.csv | 3
POS_CASH_balance.csv | 8
installments_payments.csv | 8
credit_card_balance.csv | 23
previous_application.csv | 37


- The data is a collection of nine tables arranged in a schema that is defined here:
![Home Credit Schema](https://storage.googleapis.com/kaggle-media/competitions/home-credit/home_credit.png)

- The data are a mixture of binary indicators, integer values, and continous floating values. The scale of the data varies. Large scale data from income features will need scaling to be more precisely compared to binary values. Data types encompass the entire range nominal categories to ordinal.
- A detailed description of all features in the dataset can be found [here](/HomeCredit_columns_description.csv).
- A list of all features and their associated datatypes can be found [here]().

## Conclusions
Loading

0 comments on commit c99ef09

Please sign in to comment.