05-a-simple-regression-example.Rmd

# A Simple Regression Example

```{r, echo = FALSE}
library(nnls)
library(kableExtra)

#' Print a matrix in a stylized way using row and column names if specified
#' @param the matrix to be printed
#' @param row_names optional row names to use can be math
#' @param col_names optional col names to use can be math
print_matrix <- function(m, row_names = NULL, col_names = NULL) {
  if (!is.null(row_names)) rownames(m) <- row_names
  if (!is.null(col_names)) colnames(m) <- col_names  
  knitr::kable(m, format = "html") %>%
    kable_styling("striped") %>%
    column_spec(1:2, background = "#ececec")
}
```

## Goals

- Basic introduction to `CVXR`
- Exercise on formulating a different objective, demonstrating how
  `CVXR` works with native R functions
- Exercises on formulating linear dependence constraints on estimates
  using linear algebra
- Exercises on formulating monotonicity constraints using `CVXR`
  atoms

## Ordinary Least-Squares Regression 

Consider a simple linear regression problem, where it is desired to
estimate a set of parameters using a least-squares criterion. 

We generate some synthetic data in which we know the model completely,
i.e.

$$ 
Y = X\beta + \epsilon,
$$ 

where $Y$ is a $100\times 1$ vector, $X$ is a $100\times 10$ matrix,
$\beta = [-4, -3, \ldots ,4, 5]$ is a $10\times 1$ vector, and
$\epsilon \sim N(0, 1)$.
```{r}
set.seed(123)
n <- 50; p <- 10;
beta <- -4:5    # beta is just -4 through 5.
X <- matrix(rnorm(n * p), nrow=n)
colnames(X) <- paste0("beta_", beta)
Y <- X %*% beta + rnorm(n)
```

Given the data $X$ and $Y$, we can estimate the $\beta$ vector using the
`lm` function in R, which fits a standard regression model.

```{r}
ls.model <- lm(Y ~ 0 + X)   # There is no intercept in our model above
m <- matrix(coef(ls.model), ncol = 1)
```

```{r, echo = FALSE}
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"))
```
These are the least-squares estimates and can be seen to be reasonably
close to the original $\beta$ values -4 through 5.

## The `CVXR` Formulation

The `CVXR` formulation states the above as an optimization problem:
$$
  \begin{array}{ll}
    \underset{\beta}{\mbox{minimize}} & \|y - X\beta\|_2^2,
  \end{array}
$$
which directly translates into a problem that `CVXR` can solve as shown
in the steps below.

- Step 0. Load the `CVXR` library

```{r, message = FALSE}
library(CVXR, warn.conflicts=FALSE)
```

- Step 1. Define the variable to be estimated

```{r}
beta <- Variable(p)
```

- Step 2. Define the objective to be optimized

```{r}
objective <- Minimize(sum((Y - X %*% beta)^2))
```

Notice how the objective is specified using functions such as `sum`,
`*%*`, and `^` that are familiar to R users despite the fact that
`beta` is no ordinary R expression, but a `CVXR` expression.

- Step 3. Create a problem to solve

```{r}
problem <- Problem(objective)
```

- Step 4. Solve it!

```{r}
result <- solve(problem)
```

- Step 5. Examine solution status and obtain objective value and estimate

```{r, echo = FALSE}
solution_status <- result$status
objective_value <- result$value
solution <- result$getValue(beta)
cat(sprintf("OLS Solution Status: %s, OLS Objective value: %f\n", solution_status, objective_value))
```

We can indeed satisfy ourselves that the results we get match those
from `lm`.

```{r}
m <- cbind(result$getValue(beta), coef(ls.model))
```

```{r, echo = FALSE}
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"), col_names = c("CVXR est.", "lm est."))
```

### Exercise

Modify the objective to perform least absolute deviation regression
and solve the problem. Compare the results to OLS. Which objective has
a lower value?

_Hint_: In LAD regression, we minimize the sum of the absolute value
of the residuals. Also, since the default solver `OSQP` has some
issues at the time of this writing, do pass on the parameter `solver =
"ECOS"` in your call to `CVXR::solve`.

#### Solution

```{r}
objective2 <- Minimize(sum(abs(Y - X %*% beta)))
problem2 <- Problem(objective2)
result2 <- solve(problem2, solver = "ECOS")
cat(sprintf("LAD Solution Status: %s, LAD Objective value: %f\n", result2$status, result2$value))
m2 <- cbind(result2$getValue(beta), coef(ls.model))
```

```{r, echo = FALSE}
print_matrix(m2, row_names = paste0("$\\beta_{", 1:p, "}$"), col_names = c("CVXR LAD est.", "lm est."))
```

```{r}
cat("LAD objective value: %f, OLS objective value: %f\n",
    result2$value, result$value)
```

__N.B.__ Note the reuse of `beta` in `objective2`. The value of
`beta` will change depending on the problem context, and the
function `result$getValue()` or `result2$getValue()` will account for
the context as shown below.

```{r}
m3 <- cbind(result$getValue(beta), result2$getValue(beta))
```

```{r, echo = FALSE}
print_matrix(m3, row_names = paste0("$\\beta_{", 1:p, "}$"), col_names = c("Problem 1 est.", "Problem 2 est."))
```

## Adding Constraints

On the surface, it appears that we have replaced one call to `lm` with
at least five or six lines of new R code. On top of that, the code
actually runs slower, so it is not clear what we really achieved.

However, suppose we knew that the $\beta$s were nonnegative and wished to
take this fact into account in our model. This
is [nonnegative least squares regression](https://en.wikipedia.org/wiki/Non-negative_least_squares), and
`lm` would no longer do the job.

In `CVXR`, the modified problem merely requires the addition of a constraint to the
problem definition.

```{r}
problem <- Problem(objective, constraints = list(beta >= 0))
result <- solve(problem)
betaEstimate <- result$getValue(beta)
```

```{r, echo = FALSE}
m <- matrix(betaEstimate, ncol = 1)
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"))
```

We can verify once again that these values are comparable to those
obtained from another R package,
say [nnls]( https://CRAN.R-project.org/package=nnls). 

```{r}
nnls.fit <- nnls::nnls(X, Y)$x
m <- cbind(betaEstimate, nnls.fit)
```

```{r, echo = FALSE}
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"), col_names = c("CVXR NNLS est.", "nnls est."))
```

### Exercise

Suppose it is known that $\sum_{i=1}^4\beta_i \leq
0$. Modify the original OLS problem to add this constraint.

#### Solution

The obvious solution is to add a constraint of the form

```{r, eval = FALSE}
constraint1 <- beta[1] + beta[2] + beta[3] + beta[4] <= 0
```

but it is generally easier working with matrices in `CVXR`, and so
we construct a row vector with zeros everywhere except in positions 1
through 4.

```{r}
A <- matrix(c(rep(1, 4), rep(0, 6)), nrow = 1)
```

```{r, echo = FALSE}
print_matrix(A, col_names =  paste0("$\\beta_{", 1:p, "}$"))
```

The sum constraint on $\beta$ is therefore 
$$
A\beta \leq 0
$$
which we express in R as

```{r}
constraint1 <- A %*% beta <= 0
```

We are ready to solve the problem.

```{r}
problem <- Problem(objective, constraints = list(constraint1))
ex1 <- solve(problem)
```

And we can get the estimates of $\beta$.
```{r}
betaEstimate <- ex1$getValue(beta)
```

```{r, echo = FALSE}
m <- matrix(betaEstimate, ncol = 1)
print_matrix(m, row_names = paste0("$\\beta_{", 1:p, "}$"))
```

### Exercise

Add an additional constraint to the previous exercise that
$\beta_i \leq 4$ for $i=5,\ldots,10$.

#### Solution 

We create a diagonal matrix with ones along the diagonal entries $i=5,\ldots,10$.

```{r}
B <- diag(c(rep(0, 4), rep(1, 6)))
```

```{r, echo = FALSE}
print_matrix(B, row_names = paste0("$\\beta_{", 1:p, "}$"), col_names = paste0("$\\beta_{", 1:p, "}$"))
```

So this new constraint is nothing but

```{r}
constraint2 <- B %*% beta <= 4
problem2 <- Problem(objective, constraints = list(constraint1, constraint2))
ex2 <- solve(problem2)
betaEstimate <- ex2$getValue(beta)
```

```{r, echo = FALSE}
m <- matrix(betaEstimate, ncol = 1)
print_matrix(m, row_names =  paste0("$\\beta_{", 1:p, "}$"))
```

### Exercise

Solve the OLS regression problem under the constraint that the
$\beta_i$ are nonnegative and monotonically nondecreasing.

_Hint_: What function in R computes lagged differences?

#### Solution

This requires some additional knowledge about R and `CVXR`
functions. The `base::diff` generic function generates lagged
differences of any order. `CVXR` provides a method for the generic. So
the monotonicity constraint can be succintly expressed as `diff(beta) >= 0`.

```{r}
problem3 <- Problem(objective,
                   constraints = list(beta >= 0, diff(beta) >= 0))
ex3 <- solve(problem3)
betaEstimate <- ex3$getValue(beta)
```

```{r, echo = FALSE}
m <- matrix(betaEstimate, ncol = 1)
print_matrix(m, row_names =  paste0("$\\beta_{", 1:p, "}$"))
```

### Exercise

Fit OLS with just the following order constraints on $\beta$:
$\beta_{i} \leq \beta_{i+1}$ for $i=1,\ldots, 4$ and $\beta_i \geq
\beta_{i+1}$ for $i=5,\ldots,p$.

#### Solution

We have to combine all that we have learned earlier. 

```{r}
D1 <- cbind(diag(5), diag(0, 5))
D2 <- cbind(matrix(0, 6, 4), diag(6))
constraints = list(diff(D1 %*% beta) >= 0, diff(D2 %*% beta) <= 0)
problem4 <- Problem(objective, constraints)
ex4 <- solve(problem4)
betaEstimate <- ex4$getValue(beta)
```

```{r, echo = FALSE}
m <- matrix(betaEstimate, ncol = 1)
print_matrix(m, row_names =  paste0("$\\beta_{", 1:p, "}$"))
```

## Summary

This introduction demonstrates a chief advantage of `CVXR`:
_flexibility_. Users can quickly modify and re-solve a problem, which
is ideal for prototyping and experimenting with new statistical
methods. The `CVXR` syntax is simple and mathematically
intuitive. Furthermore, `CVXR` combines seamlessly with native R code
as well as several popular packages, allowing it to be incorporated
easily into a larger analytical framework. The user is free to
construct statistical estimators that are solutions to a convex
optimization problem where there may not be a closed form solution or
even an implementation. Later, we will see how such solutions can be
used with resampling techniques like the bootstrap to estimate
variability.