Neural Networks#

Artificial neural networks (ANN) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains.

The neural network itself isn’t an algorithm, but rather a framework for many different machine learning algorithms to work together and process complex data inputs.

Such systems “learn” to perform tasks by considering examples, generally without being programmed with any task-specific rules.

For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as "cat" or "no cat" and using the results to identify cats in other images.

They do this without any prior knowledge about cats, e.g., that they have fur, tails, whiskers and cat-like faces. Instead, they automatically generate identifying characteristics from the learning material that they process.

An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain.

Each connection, like the synapses in a biological brain, can transmit a signal from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it.

Artificial Neuron

In common ANN implementations, the signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs.

The connections between artificial neurons are called edges.Artificial neurons and edges typically have a weight that adjusts as learning proceeds.

The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold.

Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer), to the last layer (the output layer), possibly after traversing the inner layers multiple times.

Neural Network

A multilayer perceptron (MLP) is a class of feedforward artificial neural network. An MLP consists of, at least, three layers of nodes:

  1. an input layer,

  2. a hidden layer and

  3. an output layer.

Except for the input nodes, each node is a neuron that uses a nonlinear activation function.

MLP utilizes a supervised learning technique called backpropagation for training.

Its multiple layers and non-linear activation distinguish MLP from a linear perceptron.

It can distinguish data that is not linearly separable.

A model is a linear model, if the relationship between input features and target follows a linearity (Ex: Celsius to degree Fahrenheit ), or mon linearity (Ex: Classification of Cats vs Dogs, which depends on a lot of features from images like edges, cat sitting style etc..)

Neuron Model (Logistic Unit)#

Here is a model of one neuron unit.

neuron

x-0

neuron x

Weights:

neuron weights

Here, we are adding non linearity to input features by passing it through an activation function

Network Model (Set of Neurons)#

Above we saw just 1 neuron , but a Neural network consists of multiple layers of multiple neuron units interconnected with each other.

Let’s take a look at simple example model with one hidden layer.

network model

a-i-j - “activation” of unit i in layer j.

Theta-j - matrix of weights controlling function mapping from layer j to layer j + 1. For example for the first layer: Theta-1.

Theta-j - total number of layers in network (3 in our example).

s-l - number of units (not counting bias unit) in layer l.

K - number of output units (1 in our example but could be any real number for multi-class classification).

Multi-class Classification#

In order to make neural network to work with multi-class notification we may use One-vs-All approach.

Let’s say we want our network to distinguish if there is a pedestrian or car of motorcycle or truck is on the image.

  • Input layer will be much bigger and it will have all the pixel from the image.

  • The output layer of our network will have 4 units, each unit will represent the different classes (pedestrian, car, motorcycle and truck).

If the model outputs a pedestrian, then pedestrian unit will provide a true signal (1) and rest false (0)

Let’s say if all our images will be 20x20 pixels then the input layer will have 400 units each of which will contain the black-white color of the corresponding picture).

multi-class-network

h-Theta-multi-class

In this case we would expect our final hypothesis to have following values:

h-pedestrian

h-car

h-motorcycle

In this case for the training set:

training-set

We would have:

y-i-multi