Neural Network introduction

Neural Network introduction

November 23, 2019


This is a super basic introduction into the topic behing the widely used buzzword "Neural Networks". This first post will introduce the terms needed to understand how complete networks work.


This post is written in blocks with possibly unknown highlighted. These words get explained in text-blocks located under the blocks.


The goal behind machine-learning is to feed a machine some input, "train" the machine using that input so that the machine learns. After that, input can be fed to the machine and things happen.

But "what is learning?" you might ask yourself, and what "things" happen? Lets dive into this:

First, we want to define a goal that the should be reached. Let's use a simple example: We want to predict if someone is at sleep, using the state of the persons bike lock, the state of the persons lamp and the state of the persons door (you'll see why we take this values into account quickly) as input values.

First of all, we've got a predefined input and a predefined output. Lets create a table with this in it:

Lock Lamp Door Asleep Annotation
\(0\) \(0\) \(0\) \(0\) Person not home
\(1\) \(1\) \(0\) \(0\) Person arrived home, locked bike
\(1\) \(1\) \(1\) \(0\) Person opened the door, light is on
\(1\) \(0\) \(0\) \(1\) Person sleeping

We want to create a "network" getting the value Lock, Lamp and Door as an input. The result of the output should be \(1\) if the person is asleep and \(0\) if the person is not asleep.

We can do this using a single neuron:


A neuron is the most basic component of a neural network. It takes the weighted input of the input values and returns their sum:

\[ y = \varphi \left(\sum_{j=0}^{m} w_{j} x_{j} \right) \]

The above equation can be vizualized as seen below:

The inputs \(x_1\), \(x_2\), ..., \(x_n\) correspond to the input values we've got: Lock, Lamp and Door. The Values \(w_1\), \(w_2\), ..., \(w_n\) are called weights. These are the values the neural network gains it's "experience" from. The circle in the middle of the above image represents the neuron. The input values from the left get summed up and the output of this gets output to the right. The result then gets "piped" through the transfer function


The neurons weights define how the input is interpreted or better, how strong the input influences the output. In order to under stand what the weights do, thing of a neuron getting inputs resulting in an output. You know what output should be reached and ajdust the weights so for a given input, the output is reached.

transfer function

For future work, we'll want the input and output of a single neuron to be inside of a specific number range. In this case, the input values should be a value inbetween \(0\) and \(1\), the output values should be "formatted" in a way so that we can insert the output of a neuron into the input of another neuron. In this case, \(\varphi\) is the transfer function used to do this. It converts the output value to value in the range \([-1, 1]\).

Returning to our initial example, below, you'll find an image displaying our neuron:

The weights \(w_0\), \(w_1\) and \(w_2\) initially get assigned random values. Let's assume that we assign these random values: \(w_0 = 0.6\), \(w_1 = 0.9\) and \(w_2 = 0.7\).

As an example, let's assume that the person just arrived home, locked it's bike, turned the lamp on but has not opened the door yet:

The output of the neuron would be calculated like this:

\[ y = \varphi \left( (1 * 0.6) + (1 * 0.9) + (0 * 0.7) \right) = \varphi \left( 0.6 + 0.9 \right) = \varphi(1.5) \]

We'll use the sigmoid function as an transfer function:

\[ \varphi(1.5) = \frac{1}{1 + e ^{-1.5}} \approx 0.81 \]

As the person is not asleep, we'd expect a result close to \(0\), but we get a result closer to \(1\), so we need to adjust the weights.

This is the learning process: we run our input though the neuron, compare the ouput with the output we want to accieve and adjust the weigts "teaching" the neuron what is right and what not.

In order to adjust the weights, we first need to calculate the actual "loss" using a loss function.

loss function

The loss function is used to calculate the "loss", so the badness of the result we've got. The error grows exponentially. The loss function we use can be seen below:

\[ L(t,~y) = (t-y)^2 = E \]

In the equation above, \(t\) is the expected result and \(y\) is the result we've got. The result \(E\) is the final error.

When calculating the loss for our example above, we get:

\[ L(0,~0.8) = (0 - 0.8)^2 \approx 0.65 \]

If we would have had an error or \(0.2\), we'd have had a much lower error:

\[ L(0,~0.2) = (0 - 0.2)^2 \approx 0.04 \]

Using this information we've got a value we can use when adjusting the weights.

Adjusting the weights

In order to make this as easy as possible, let's imaging we've got a single neuron with a single input. The expected output we want is \(0\). We now calculate all possible error values \(E\) for all possible input values \(x\):

\[ L(0, x) = (0-x)^2 = x^2 = E \]

As you can see, the further away we are from the desired output, the bigger the error gets.

The goal now is to find out, that input has what impact on the output. We know we've got a big error, but we need to find out where it came from. We can start doing this by calculating the derivative of the transport function. As you can see below, the

\( \varphi(z) = \frac{1}{1 + e^{-z}} \)

\( \frac{d\varphi(z)}{dz} = \varphi(z)(1 - \varphi(z)) \)

We do this to find out what input values impact the output.