-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.Rmd
290 lines (204 loc) · 6.87 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
---
title : EDA on LendingClub Data
subtitle :
author : Liangquan Zhou, Lu Han, Xiaoyao Yang
job :
framework : io2012 # {io2012, html5slides, shower, dzslides, ...}
highlighter : highlight.js # {highlight.js, prettify, highlight}
hitheme : tomorrow #
widgets : [mathjax, interactive] # {mathjax, quiz, bootstrap}
ext_widgets : {rCharts: [libraries/nvd3, libraries/morris, libraries/highcharts]}
mode : selfcontained # {standalone, draft}
knit : slidify::knit2slides
---
<!-- Limit image width and height -->
<style type='text/css'>
t1 {
border: 2px solid black;
border-radius: 5px;
background-color: #308014;
/*Add your CSS here!*/
float:right;
margin: 20px
}
t2 {
float: right;
margin: 40px
}
t3 {
border: 2px solid black;
border-radius: 5px;
/*Add your CSS here!*/
}
t4 {
float: left;
margin: 10px
}
</style>
```{r, echo=FALSE, message=FALSE, warning=FALSE}
require(rCharts)
require(googleVis)
require(rHighcharts)
require(devtools)
require(slidify)
require(slidifyLibraries)
library(knitr)
library(ggplot2)
library(reshape2)
library(scales)
library(utils)
library(zoo)
```
## Outline
* <h3>Background<h3>
* <h3>Data Source<h3>
* <h3>Visualization<h3>
--- .segue .dark
## Background
---
<a href = "http://en.wikipedia.org/wiki/Lending_Club"> <img src = "LendingClub_logo.jpg"/></a>
> * The biggest [Peer-to-Peer lending](http://en.wikipedia.org/wiki/Peer-to-peer_lending) (P2P) company in the world.
> * Operates an online lending platform that enables borrowers to obtain a loan, and investors to purchase notes backed by payments made on loans.
> * Founded on 2006, went Public on December 11, 2014.
---
## How P2P works?
<t1><img src = "LC.jpg"></img></t1>
* Borrowers complete applications on LendingClub.com
* Lending Club evaluates the information, determines an interest rate, and instantly presents a variety of offers to qualified borrowers.
* Investors select loans to invest and earn monthly returns.
- https://www.lendingclub.com/browse/browse.action
---
## Advantages
> * For Borrowers
- Easy online application
- Low fixed rates
- Fixed monthly payments
- Flexible terms
- No prepayment penalties
<t2>
> * For Investors
- Returns from 5% to 25%
- Monthly cash flow
- Easy to diversify many loans
</t2>
---
## Concerns
> * Loans could be defaulted.
> * If a loan is defaulted:
- impact on the borrower's credit score.
- investment losses for investors.
---
## Objective: Exploratory Data Analysis on Lending Club Data
<q>Explore the influential factors of loans properties and major borrower features.</q>
--- .segue .dark
## Data Source
---
## Data Introduction
> * Lending Club publish all issued loans data.
- [https://www.lendingclub.com/info/download-data.action](https://www.lendingclub.com/info/download-data.action)
> * More than 400 thousands loans. From July 2007 to September 2014.
> * Each loan record contains about 100 features, including:
- <font color = "red">Loan Properties</font>: Loan Amount, Loan Grade, Term, Interest Rate, Loan Purpose, etc.
- <font color = "red">Borrower Profile</font>: Annual Income, Credit Score ([FICO Score](http://en.wikipedia.org/wiki/Credit_score_in_the_United_States#FICO_score)), [Debt-to-Income Ratio](http://en.wikipedia.org/wiki/Debt-to-income_ratio), Address, Home Ownership, Open Credit Lines, Delinquency History, etc.
- Loan Statistics: Pricipal received, Interest received, Last Payment, Next Payment, Late Fees, Recoveries, etc.
---
## A snapshot of data...
<t4><img src = "data_sample.jpg"/></t4> <br> <br> <br> <br> <br> <h2>...</h2>
<br> <br> <br> <br> <br> <br>
<h2 align = "center">...</h2>
---
## Data Processing
> * Keep variables we are interested in.
- <font color = "red">Loan Properties</font>: Loan Amount, Loan Grade, Term, Loan Purpose.
- <font color = "red">Borrower Profile</font>: Annual Income, Credit Score, Debt-to-Income Ratio, Address, Home Ownership, Open Accounts, Delinquency History.
> * Drop missing values and outliers.
> * Reformatting and Reshaping.
--- .segue .dark
## Visualization
---
## Visualization
> * For Loan Properties:
- Amount
- Grade
- Term
- Purpose
> * For Borrower Profiles:
- Annual income
- Credit Score
- Debt-to-Income Ratio
- Bank Account Delinquency History
---
## Total Loan Amount Trend (in millions)
```{r nvd3plot, results='asis', message=FALSE, echo = FALSE}
options(rcharts.mode = 'iframesrc', rcharts.cdn =TRUE)
dat1 = read.csv("total_amount.csv", header = T)
dat1 = dat1[,-1]
dat1$loan_amnt = round(dat1$loan_amnt/1000000)
p8 <- nPlot(loan_amnt ~ issue_d, group ='grade', data = dat1, type = 'stackedAreaChart')
p8$yAxis(axisLabel = "Total Loan Amount", width = 62)
p8$xAxis(axisLabel = "Year")
p8$print("chart1")
```
> * The P2P Lending Market grows really fast in recent 3 years.
---
## Loan Amount by States
```{r echo=FALSE,message=FALSE, fig.width=12}
require(ggplot2)
require(maps)
require(plyr)
states <- map_data("state")
loanbystate <- read.csv('loanbystate.csv',header=T)
choro <- merge(states, loanbystate, sort = FALSE, by = "region")
choro <- choro[order(choro$order), ]
q <- qplot(long, lat, data = choro, group = group, fill = loan_amnt,
geom = "polygon",main='Loan Amount by State',xlab='',ylab='') +
theme(panel.background = element_blank(),axis.text = element_blank(),
axis.ticks =element_blank())
q+ scale_fill_gradient2('Total Loan Amount \n(in Million $)',low = "#0000FF", mid = "#FFFFFF", high ="#FF0000",midpoint = median(choro$loan_amnt),
space = "rgb", guide = "colourbar")
```
---
## Loan Purpose Decomposition
```{r results='asis', echo = FALSE, message=FALSE}
dat4 = read.csv(file = "purpose.csv", header = T)
dat4 = dat4[,-1]
names(dat4) = c("purpose","number_of_loans")
n2 <- hPlot(number_of_loans ~ purpose, type = "bar", data = dat4)
n2$print("chart3")
```
---
## Default Rate Decomposition by Loan Grade
```{r results='asis', echo = FALSE, message=FALSE, fig.width = 12}
dat2 = read.csv(file = "default_rate.csv", header = T)
dat2 = dat2[,-1]
n1 <- hPlot(x = "time", y = c("default_rate"), group = "grade", type = "line", data = dat2)
n1$print("chart2", include_assets = TRUE)
# n1$save("default_rate.html")
# n1$publish("default_rate.html",host = 'rpubs')
```
> * Default Rates becomes lower and lower.
---
## Borrower Profiles Analysis
[D3.js](http://www.columbia.edu/~xy2231/hist_interactive.html)
---
## Further Work...
There still are a lot of things we can explore on lending club's loan dataset.
> * Combination of variables...
> * Historical Return on Investment Analysis
> * Loan interest rate prediction...
---
## Packages used
> * plyr - the split-apply-combine paradigm for R
> * lubridate, zoo - Dates and times
> * ggplot2
> * rCharts - interactive charts
> * D3.js
> * slidify - slides
---
<br>
<br>
<br>
<br>
<br>
<br>
<center><h2>Thank You!</h2></center>