Training a perceptron in Python
Perfect! We created a simple perceptron that takes input and spits out output but doesn't really do anything. Our perceptron needs to have its weights trained in order to actually do something. Fortunately, there is a well-defined method, known as gradient descent, that we can use to adjust each of those weights. Open up your Python editor again and update or enter the following code or open Chapter_1_3.py from the code download:
def perceptron_predict(inputs, weights):
activation = weights[0]
for i in range(len(inputs)-1):
activation += weights[i + 1] * inputs[i]
return 1.0 if activation >= 0.0 else 0.0
def train_weights(train, learning_rate, epochs):
weights = [0.0 for i in range(len(train[0]))]
for epoch in range(epochs):
sum_error = 0.0
for inputs in train:
prediction = perceptron_predict(inputs, weights)
error = inputs[-1] - prediction
sum_error += error**2
weights[0] = weights[0] + learning_rate * error
for i in range(len(inputs)-1):
weights[i + 1] = weights[i + 1] + learning_rate * error * inputs[i]
print('>epoch=%d, learning_rate=%.3f, error=%.3f' % (epoch, learning_rate, sum_error))
return weights
train = [[1.5,2.5,0],[2.5,3.5,0],[1.0,11.0,1],[2.3,2.3,1],[3.6,3.6,1],[4.2,2.4,0],[2.4,5.4,0],[5.1,5.1,1],[4.3,1.3,0],[4.8,4.8,1]]
learning_rate = 0.1
epochs = 10
weights = train_weights(train, learning_rate, epochs)
print(weights)
The train_weights function is new and will be used to train the perceptron using iterative error minimization and will be a basis for when we use gradient descent in more complex networks. There is a lot going on here, so we will break it down piece by piece. First, we initialize the weights list to a value of 0.0 with this line:
weights = [0.0 for i in range(len(train[0]))]
Then we start training each epoch in a for loop. An epoch is essentially one pass through our training data. The reason we make multiple passes is to allow our weights to converge at a global minimum and not a local one. During each epoch, the weights are trained using the following equation:
Consider the following:
= weight
= the rate at which the perceptron learns
= the labeled training value
= the value returned from the perceptron
= -
The bias is trained in a similar manner, but just recall it is weight. Note also how we are labeling our data points now in the train list, with an end value of 0.0 or 1.0. A value of 0.0 means no match, while a value of 1.0 means perfect match, as shown in the following code excerpt:
train = [[1.5,2.5,0.0],[2.5,3.5,0.0],[1.0,11.0,1.0],[2.3,2.3,1.0],[3.6,3.6,1.0],[4.2,2.4,0.0],[2.4,5.4,0.0],[5.1,5.1,1.0],[4.3,1.3,0.0],[4.8,4.8,1.0]]
This labeling of data is common in training neural networks and is called supervised training. We will explore other unsupervised and semi-supervised training methods in later chapters. If you run the preceding code, you will see the following output:
Now, if you have some previous ML experience, you will immediately recognize the training wobbling going on around some local minima, making our training unable to converge. You will likely come across this type of wobble several more times in your DL career, so it is helpful to understand how to fix it.
In this case, our issue is likely the choice of the activation function, which, as you may recall, was just a simple step function. We can fix this by entering a new function, called a Rectified Linear Unit (ReLU). An example of the step and ReLU functions, side by side, are shown in the following diagram:
In order to change the activation function, open up the previous code listing and follow along:
- Locate the following line of code:
return 1.0 if activation >= 0.0 else 0.0
- Modify it, like so:
return 1.0 if activation * (activation>0) >= 0.0 else 0.0
- That subtle difference in multiplying the activation function by itself if its value is greater than 0 is the implementation of the ReLU function. Yes, it is that deceptively easy.
- Run the code and observe the change in output.
When you run the code, the values quickly converge and remain stable. This is a tremendous improvement in our training and a cause of changing the activation function to ReLU. The reason for this is that now our perceptron weights can more slowly converge to a global maximum, whereas before they just wobbled around a local minimum by using the step function. There are plenty of other activation functions we will test through the course of this book. In the next section, we look at how things get much more complicated when we start to combine our perceptrons into multiple layers.