ex3.Rmd

---
title: "Exercise 3"
author: "Jai Broome"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output: 
    html_document:
        toc: true
        number_sections: true
---

```{r dependencies, message=FALSE}
require(knitr)
require(ggplot2)
read_chunk("ex2/ex2_chunks.R")
```

# Multi-class Classification

## Dataset

```{r read-data-1}
ex3data1 <- cbind(1, as.matrix(read.csv("data/ex3data1.csv")))
initial_theta <- rep(0, times = ncol(ex3data1) - 1)
```

##  Visualizing the data

The script to visualize the greyscale digits was included in the resources and not actually a part of the assignment. I skipped it for now, but may revisit in the future. I should be able to modify part of exercise 7 to print the handwritten digits.

## Vectorizing Logistic Regression

### Vectorizing the cost function

I already implemented a vectorized vesion for exercise 2. I make use of `ex2_chunks.R` which contains all of the functions written for the previous exercise.

### Vectorizing the gradient

### Vectorizing regularized logistic regression

```{r sig}
```

```{r h}
```

```{r cost-function}
```

## One-vs-all Classification 

For one-vs-all classification, we just loop through each of the K classes and run logistic regression like we did in the previous exercise

```{r learn-thetas, cache=TRUE}
thetas <- data.frame()

for(i in 1:10){
    Mi <- cbind(ex3data1[, 1:401], ex3data1[, 402] == i)
    thetai <- optim(par = initial_theta,
                       fn = function(x){costFunction(Mi, x)$J},
                       gr = function(x){costFunction(Mi, x)$grad},
                       method = "BFGS", control = list(maxit = 400))
    thetas <- rbind(thetas, thetai$par)
}

```

### One-vs-all Prediction

The predicted class is just the class with the highest assigned probability

```{r eval-pred, cache=TRUE}
ex3pred1 <- apply(ex3data1, 1, FUN = function(x){
    which.max(as.vector(apply(thetas, 1, FUN = function(y){
        h(y, x[1:401])
    })))
})

sum(ex3data1[, 402] == ex3pred1) / nrow(ex3data1)
```

This is higher than the expected accuracy of 94.9%, although I'm not sure why.

# Neural Networks

This is just a quick, non-generalized implementation of forward propagation. A more generalized version is implemented in the next exercise

```{r nn, cache=TRUE}

Theta1 <- as.matrix(read.csv("data/ex3weights_Theta1.csv"))
Theta2 <- as.matrix(read.csv("data/ex3weights_Theta2.csv"))

z2 <- Theta1 %*% t(ex3data1[, 1:401])
a2 <- sig(z2)
a2 <- rbind(1, a2)

z3 <- Theta2 %*% a2
a3 <- sig(z3)

ex3pred2 <- apply(a3, 2, which.max)
sum(ex3data1[, 402] == ex3pred2) / nrow(ex3data1)
```

The expected accuracy is 97.5%.