-
Notifications
You must be signed in to change notification settings - Fork 16
/
05-a-simple-regression-example.Rmd
360 lines (273 loc) · 9.77 KB
/
05-a-simple-regression-example.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
# A Simple Regression Example
```{r, echo = FALSE}
library(nnls)
library(kableExtra)
#' Print a matrix in a stylized way using row and column names if specified
#' @param the matrix to be printed
#' @param row_names optional row names to use can be math
#' @param col_names optional col names to use can be math
print_matrix <- function(m, row_names = NULL, col_names = NULL) {
if (!is.null(row_names)) rownames(m) <- row_names
if (!is.null(col_names)) colnames(m) <- col_names
knitr::kable(m, format = "html") %>%
kable_styling("striped") %>%
column_spec(1:2, background = "#ececec")
}
```
## Goals
- Basic introduction to `CVXR`
- Exercise on formulating a different objective, demonstrating how
`CVXR` works with native R functions
- Exercises on formulating linear dependence constraints on estimates
using linear algebra
- Exercises on formulating monotonicity constraints using `CVXR`
atoms
## Ordinary Least-Squares Regression
Consider a simple linear regression problem, where it is desired to
estimate a set of parameters using a least-squares criterion.
We generate some synthetic data in which we know the model completely,
i.e.
$$
Y = X\beta + \epsilon,
$$
where $Y$ is a $100\times 1$ vector, $X$ is a $100\times 10$ matrix,
$\beta = [-4, -3, \ldots ,4, 5]$ is a $10\times 1$ vector, and
$\epsilon \sim N(0, 1)$.
```{r}
set.seed(123)
n <- 50; p <- 10;
beta <- -4:5 # beta is just -4 through 5.
X <- matrix(rnorm(n * p), nrow=n)
colnames(X) <- paste0("beta_", beta)
Y <- X %*% beta + rnorm(n)
```
Given the data $X$ and $Y$, we can estimate the $\beta$ vector using the
`lm` function in R, which fits a standard regression model.
```{r}
ls.model <- lm(Y ~ 0 + X) # There is no intercept in our model above
m <- matrix(coef(ls.model), ncol = 1)
```
```{r, echo = FALSE}
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"))
```
These are the least-squares estimates and can be seen to be reasonably
close to the original $\beta$ values -4 through 5.
## The `CVXR` Formulation
The `CVXR` formulation states the above as an optimization problem:
$$
\begin{array}{ll}
\underset{\beta}{\mbox{minimize}} & \|y - X\beta\|_2^2,
\end{array}
$$
which directly translates into a problem that `CVXR` can solve as shown
in the steps below.
- Step 0. Load the `CVXR` library
```{r, message = FALSE}
library(CVXR, warn.conflicts=FALSE)
```
- Step 1. Define the variable to be estimated
```{r}
beta <- Variable(p)
```
- Step 2. Define the objective to be optimized
```{r}
objective <- Minimize(sum((Y - X %*% beta)^2))
```
Notice how the objective is specified using functions such as `sum`,
`*%*`, and `^` that are familiar to R users despite the fact that
`beta` is no ordinary R expression, but a `CVXR` expression.
- Step 3. Create a problem to solve
```{r}
problem <- Problem(objective)
```
- Step 4. Solve it!
```{r}
result <- solve(problem)
```
- Step 5. Examine solution status and obtain objective value and estimate
```{r, echo = FALSE}
solution_status <- result$status
objective_value <- result$value
solution <- result$getValue(beta)
cat(sprintf("OLS Solution Status: %s, OLS Objective value: %f\n", solution_status, objective_value))
```
We can indeed satisfy ourselves that the results we get match those
from `lm`.
```{r}
m <- cbind(result$getValue(beta), coef(ls.model))
```
```{r, echo = FALSE}
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"), col_names = c("CVXR est.", "lm est."))
```
### Exercise
Modify the objective to perform least absolute deviation regression
and solve the problem. Compare the results to OLS. Which objective has
a lower value?
_Hint_: In LAD regression, we minimize the sum of the absolute value
of the residuals. Also, since the default solver `OSQP` has some
issues at the time of this writing, do pass on the parameter `solver =
"ECOS"` in your call to `CVXR::solve`.
#### Solution
```{r}
objective2 <- Minimize(sum(abs(Y - X %*% beta)))
problem2 <- Problem(objective2)
result2 <- solve(problem2, solver = "ECOS")
cat(sprintf("LAD Solution Status: %s, LAD Objective value: %f\n", result2$status, result2$value))
m2 <- cbind(result2$getValue(beta), coef(ls.model))
```
```{r, echo = FALSE}
print_matrix(m2, row_names = paste0("$\\beta_{", 1:p, "}$"), col_names = c("CVXR LAD est.", "lm est."))
```
```{r}
cat("LAD objective value: %f, OLS objective value: %f\n",
result2$value, result$value)
```
__N.B.__ Note the reuse of `beta` in `objective2`. The value of
`beta` will change depending on the problem context, and the
function `result$getValue()` or `result2$getValue()` will account for
the context as shown below.
```{r}
m3 <- cbind(result$getValue(beta), result2$getValue(beta))
```
```{r, echo = FALSE}
print_matrix(m3, row_names = paste0("$\\beta_{", 1:p, "}$"), col_names = c("Problem 1 est.", "Problem 2 est."))
```
## Adding Constraints
On the surface, it appears that we have replaced one call to `lm` with
at least five or six lines of new R code. On top of that, the code
actually runs slower, so it is not clear what we really achieved.
However, suppose we knew that the $\beta$s were nonnegative and wished to
take this fact into account in our model. This
is [nonnegative least squares regression](https://en.wikipedia.org/wiki/Non-negative_least_squares), and
`lm` would no longer do the job.
In `CVXR`, the modified problem merely requires the addition of a constraint to the
problem definition.
```{r}
problem <- Problem(objective, constraints = list(beta >= 0))
result <- solve(problem)
betaEstimate <- result$getValue(beta)
```
```{r, echo = FALSE}
m <- matrix(betaEstimate, ncol = 1)
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"))
```
We can verify once again that these values are comparable to those
obtained from another R package,
say [nnls]( https://CRAN.R-project.org/package=nnls).
```{r}
nnls.fit <- nnls::nnls(X, Y)$x
m <- cbind(betaEstimate, nnls.fit)
```
```{r, echo = FALSE}
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"), col_names = c("CVXR NNLS est.", "nnls est."))
```
### Exercise
Suppose it is known that $\sum_{i=1}^4\beta_i \leq
0$. Modify the original OLS problem to add this constraint.
#### Solution
The obvious solution is to add a constraint of the form
```{r, eval = FALSE}
constraint1 <- beta[1] + beta[2] + beta[3] + beta[4] <= 0
```
but it is generally easier working with matrices in `CVXR`, and so
we construct a row vector with zeros everywhere except in positions 1
through 4.
```{r}
A <- matrix(c(rep(1, 4), rep(0, 6)), nrow = 1)
```
```{r, echo = FALSE}
print_matrix(A, col_names = paste0("$\\beta_{", 1:p, "}$"))
```
The sum constraint on $\beta$ is therefore
$$
A\beta \leq 0
$$
which we express in R as
```{r}
constraint1 <- A %*% beta <= 0
```
We are ready to solve the problem.
```{r}
problem <- Problem(objective, constraints = list(constraint1))
ex1 <- solve(problem)
```
And we can get the estimates of $\beta$.
```{r}
betaEstimate <- ex1$getValue(beta)
```
```{r, echo = FALSE}
m <- matrix(betaEstimate, ncol = 1)
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"))
```
### Exercise
Add an additional constraint to the previous exercise that
$\beta_i \leq 4$ for $i=5,\ldots,10$.
#### Solution
We create a diagonal matrix with ones along the diagonal entries $i=5,\ldots,10$.
```{r}
B <- diag(c(rep(0, 4), rep(1, 6)))
```
```{r, echo = FALSE}
print_matrix(B, row_names = paste0("$\\beta_{", 1:p, "}$"), col_names = paste0("$\\beta_{", 1:p, "}$"))
```
So this new constraint is nothing but
```{r}
constraint2 <- B %*% beta <= 4
problem2 <- Problem(objective, constraints = list(constraint1, constraint2))
ex2 <- solve(problem2)
betaEstimate <- ex2$getValue(beta)
```
```{r, echo = FALSE}
m <- matrix(betaEstimate, ncol = 1)
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"))
```
### Exercise
Solve the OLS regression problem under the constraint that the
$\beta_i$ are nonnegative and monotonically nondecreasing.
_Hint_: What function in R computes lagged differences?
#### Solution
This requires some additional knowledge about R and `CVXR`
functions. The `base::diff` generic function generates lagged
differences of any order. `CVXR` provides a method for the generic. So
the monotonicity constraint can be succintly expressed as `diff(beta) >= 0`.
```{r}
problem3 <- Problem(objective,
constraints = list(beta >= 0, diff(beta) >= 0))
ex3 <- solve(problem3)
betaEstimate <- ex3$getValue(beta)
```
```{r, echo = FALSE}
m <- matrix(betaEstimate, ncol = 1)
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"))
```
### Exercise
Fit OLS with just the following order constraints on $\beta$:
$\beta_{i} \leq \beta_{i+1}$ for $i=1,\ldots, 4$ and $\beta_i \geq
\beta_{i+1}$ for $i=5,\ldots,p$.
#### Solution
We have to combine all that we have learned earlier.
```{r}
D1 <- cbind(diag(5), diag(0, 5))
D2 <- cbind(matrix(0, 6, 4), diag(6))
constraints = list(diff(D1 %*% beta) >= 0, diff(D2 %*% beta) <= 0)
problem4 <- Problem(objective, constraints)
ex4 <- solve(problem4)
betaEstimate <- ex4$getValue(beta)
```
```{r, echo = FALSE}
m <- matrix(betaEstimate, ncol = 1)
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"))
```
## Summary
This introduction demonstrates a chief advantage of `CVXR`:
_flexibility_. Users can quickly modify and re-solve a problem, which
is ideal for prototyping and experimenting with new statistical
methods. The `CVXR` syntax is simple and mathematically
intuitive. Furthermore, `CVXR` combines seamlessly with native R code
as well as several popular packages, allowing it to be incorporated
easily into a larger analytical framework. The user is free to
construct statistical estimators that are solutions to a convex
optimization problem where there may not be a closed form solution or
even an implementation. Later, we will see how such solutions can be
used with resampling techniques like the bootstrap to estimate
variability.