Introduction to Neural Networks. Perceptron.

One of the first attempts to implement something similar to a modern neural network was done by Frank Rosenblatt from Cornell Aeronautical Laboratory in 1957. It was hardware implementation called "Mark-1", designed to recognize primitive geometric figures, such as triangles, squares and circles.

An input image was represented by 20x20 photocell array, so the neural network had 400 inputs and one binary output. Simple network contained one neuron, also called threshold logic unit. Neural network weights were potentiometers that required manual adjustment during the training phase.

New York Times wrote about perceptron at that time: the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.

Perceptron Model

Suppose we have N features in our model, in which case the input vector would be a vector of size N. Perceptron is a binary classification model, i.e. it can distinguish between two classes of input data. We will assume that for each input vector x the output of our perceptron would be either +1 or -1, depending on the class. The output will be computed using the formula

y(x) = f(w^Tx)

where f is a step activation function

Training the Perceptron

To train a perceptron we need to find weights vector w that classifies most of the values correctly, i.e. results in the smallest error. This error is defined by perceptron criterion on the following manner:

E(w) = -∑w^Tx_it_i

where

the sum is taken on those training data points i that result in the wrong classification
x_i is the input data, and t_i is either -1 or +1 for negative and positive examples accordingly.

This criteria is considered as a function of weights w, and we need to minimize it. Often, a method called gradient descent is used, in which we start with some initial weights w⁽⁰⁾, and then at each step update the weights according to the formula

w^(t+1) = w^(t) - η∇E(w)

Here η is so-called learning rate, and ∇E(w) denotes the gradient of E. After we calculate the gradient, we end up with

w^(t+1) = w^(t) + ∑ηx_it_i

The algorithm in Python looks like this:

def train(positive_examples, negative_examples, num_iterations = 100, eta = 1):

    weights = [0,0,0] # Initialize weights (almost randomly :)
        
    for i in range(num_iterations):
        pos = random.choice(positive_examples)
        neg = random.choice(negative_examples)

        z = np.dot(pos, weights) # compute perceptron output
        if z < 0: # positive example classified as negative
            weights = weights + eta*weights.shape

        z  = np.dot(neg, weights)
        if z >= 0: # negative example classified as positive
            weights = weights - eta*weights.shape

    return weights

Proceed to Notebook

To see how we can use perceptron to solve some toy as well as real-life problems, and to continue learning - go to Perceptron notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Introduction to Neural Networks. Perceptron.

Perceptron Model

Training the Perceptron

Proceed to Notebook

Files

README.md

Latest commit

History

README.md

File metadata and controls

Introduction to Neural Networks. Perceptron.

Perceptron Model

Training the Perceptron

Proceed to Notebook