Tuesday, 31 May 2016

Neural networks on the Raspberry Pi: Sigmoid, tanh and RL neurons

A brief introduction to ANNs - part 3

In the previous post about ANNs we looked at the linear neuron and the perceptron.

Perceptrons have been used in neural networks for decades, but they are not the only type of neuron in use today. When they were first invented, they seemed capable of learning almost anything.

However, in 1969, Minsky and Papert published their book 'Perceptrons' which showed that a single perceptron could never be trained to perform the XOR function. You'll see in the next post why this is so (and why it's not a huge problem), but for now, let's look at three other common neuron models.

Like the linear neuron and perceptron, these start by calculating the weighted sum of their inputs. Recall that you can implement the linear neuron like this:


sigmoid neuron calculates the same weighted sum of inputs, but then it applies the sigmoid function to the result. The sigmoid function is defined in wikipedia like this:

Here's how you define that function in APL:


That definition says 'take the reciprocal of 1 plus e to the power minus  ⍵', where ⍵ is the argument to the function.

You can implement a sigmoid neuron by combining the sigmoid function with a linear neuron.

      sn ← sigmoid ln

You can test it like this: 

      inputs←0.2 0.3 0.1
      weights ← 1 2 0.5
      inputs sn weights

As the name suggests, the sigmoid function is S-shaped. Here is a graph of the function, plotted using Dyalog APL's SharpPlot library:

As you can see, the sigmoid functions value is close to zero for large negative arguments; it has the value 0.5 when its input is zero; and it rises towards one as its input grows larger.

Another commonly used function, with a similar shape, is the tanh function.

APL has implementations of all common trigonometry-related functions. Sin is 1○⍵, and Cos is 2○⍵.  You can find a complete list here.

The definition you need is just


Here is its graph:

As you can see, tanh ranges from -1 for large negative arguments to +1 for large positive arguments. Its value at zero is zero.

The last neuron we'll consider in this post is the Rectified Linear Neuron or RLN.

The transfer function for this neuron is zero for inputs that are negative or zero, and is equal to the input for inputs that are positive.

Here is the APL definition:


The symbol (max) returns the maximum of its arguments. Here's a plot of the RLN function:

I mentioned earlier that the perceptron has some limitations, but why are these other functions popular? A future post will cover back-propagation - one of the most widely used techniques for training an network - and the functions you've been looking at work well for that purpose.

Before then, you'll take another look at the perceptron, you'll see how to train it, and review its limitations and ways of avoiding them.