Building a self-driving CNN
Nvidia created a multi-layer CNN called PilotNet, in 2017, that was able to steer a vehicle by just showing it a series of images or video. This was a compelling demonstration of the power of neural networks, and in particular the power of convolution. A diagram showing the neural architecture of PilotNet is shown here:
PilotNet neural architecture
The diagram shows the input of the network moving up from the bottom where the results of a single input image output to a single neuron represent the steering direction. Since this is such a great example, several individuals have posted blog posts showing an example of PilotNet, and some actually work. We will examine the code from one of these blog posts to see how a similar architecture is constructed with Keras. Next is an image from the original PilotNet blog, showing a few of the types of images our self-driving network will use to train:
Example of PilotNet training images
The goal of training in this example is to output the degree to which the steering wheel should be turned in order to keep the vehicle on the road. Open up the code listing in Chapter_2_2.py and follow these steps:
- We will now switch to using Keras for a few samples. While the TensorFlow embedded version of Keras has served us well, there are a couple of features we need that are only found in the full version. To install Keras and other dependencies, open a shell or Anaconda window and run the following commands:
pip install keras
pip install pickle
pip install matplotlib
- At the start of the code file (Chapter_2_2.py), we begin with some imports and load the sample data using the following code:
import os
import urllib.request
import pickle
import matplotlib
import matplotlib.pyplot as plt
#downlaod driving data (450Mb)
data_url = 'https://s3.amazonaws.com/donkey_resources/indoor_lanes.pkl'
file_path, headers = urllib.request.urlretrieve(data_url)
print(file_path)
with open(file_path, 'rb') as f:
X, Y = pickle.load(f)
- This code just does some imports and then downloads the sample driving frames from the author's source data. The original source of this blog was written in a notebook by Roscoe's Notebooks and can be found at https://wroscoe.github.io/keras-lane-following-autopilot.html.
pickle is a decompression library that unpacks the data in datasets X and Y at the bottom of the previous listing. - Then we shuffle the order of the frames around or essentially randomize the data. We often randomize data this way to make our training stronger. By randomizing the data order, the network needs to learn an absolute steering value for an image, rather than a possible relative or incremental value. The following code does this shuffle:
import numpy as np
def unison_shuffled_copies(X, Y):
assert len(X) == len(Y)
p = np.random.permutation(len(X))
return X[p], Y[p]
shuffled_X, shuffled_Y = unison_shuffled_copies(X,Y)
len(shuffled_X)
- All this code does is use numpy to randomly shuffle the image frames. Then it prints out the length of the first shuffled set shuffled_X so we can confirm the training data is not getting lost.
- Next, we need to create a training and test set of data. The training set is used to train the network (weights), and the test, or validation, set is used to confirm the accuracy on new or raw data. As we have seen before, this is a common theme when using supervised training or labeled data. We often break the data into 80% training and 20% test. The following code is what does this:
test_cutoff = int(len(X) * .8) # 80% of data used for training
val_cutoff = test_cutoff + int(len(X) * .2) # 20% of data used for validation and test data
train_X, train_Y = shuffled_X[:test_cutoff], shuffled_Y[:test_cutoff]
val_X, val_Y = shuffled_X[test_cutoff:val_cutoff], shuffled_Y[test_cutoff:val_cutoff]
test_X, test_Y = shuffled_X[val_cutoff:], shuffled_Y[val_cutoff:]
len(train_X) + len(val_X) + len(test_X)
- After creating the training and test sets, we now want to augment or expand the training data. In this particular case, the author augmented the data just by flipping the original images and adding those to the dataset. There are many other ways of augmenting data that we will discover in later chapters, but this simple and effective method of flipping is something to add to your belt of machine learning tools. The code to do this flip is shown here:
X_flipped = np.array([np.fliplr(i) for i in train_X])
Y_flipped = np.array([-i for i in train_Y])
train_X = np.concatenate([train_X, X_flipped])
train_Y = np.concatenate([train_Y, Y_flipped])
len(train_X)
- Now comes the heavy lifting part. The data is prepped, and it is time to build the model as shown in the code:
from keras.models import Model, load_model
from keras.layers import Input, Convolution2D, MaxPooling2D, Activation, Dropout, Flatten, Dense
img_in = Input(shape=(120, 160, 3), name='img_in')
angle_in = Input(shape=(1,), name='angle_in')
x = Convolution2D(8, 3, 3)(img_in)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Convolution2D(16, 3, 3)(x)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Convolution2D(32, 3, 3)(x)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
merged = Flatten()(x)
x = Dense(256)(merged)
x = Activation('linear')(x)
x = Dropout(.2)(x)
angle_out = Dense(1, name='angle_out')(x)
model = Model(input=[img_in], output=[angle_out])
model.compile(optimizer='adam', loss='mean_squared_error')
model.summary()
- The code to build the model at this point should be fairly self-explanatory. Take note of the variation in the architecture and how the code is written from our previous examples. Also note the two highlighted lines. The first one uses a new layer type called Flatten. All this layer type does is flatten the 2 x 2 image into a vector that is then input into a standard Dense hidden fully connected layer. The second highlighted line introduces another new layer type called Dropout. This layer type needs a bit more explanation and will be covered in more detail at the end of this section.
- Finally comes the training part, which this code sets up:
import os
from keras import callbacks
model_path = os.path.expanduser('~/best_autopilot.hdf5')
save_best = callbacks.ModelCheckpoint(model_path, monitor='val_loss', verbose=1,
save_best_only=True, mode='min')
early_stop = callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=5,
verbose=0, mode='auto')
callbacks_list = [save_best, early_stop]
model.fit(train_X, train_Y, batch_size=64, epochs=4, validation_data=(val_X, val_Y), callbacks=callbacks_list)
- This last piece of code sets up a set of callbacks to update and control the training. We have already used callbacks to update the TensorBoard server with logs. In this case, we use the callbacks to resave the model after every checkpoint (epoch) and check for an early exit. Note the form in which we are saving the model – an hdf5 file. This file format represents a hierarchical data structure.
- Run the code as you have already been doing. This sample can take a while, so again be patient. When you are done, there will be no output, but pay special attention to the minimized loss value.
While there is no output for this example, in order to keep it simple, try to appreciate what is happening. After all, this could just as easily be set up as a driving game, where the network drives the vehicle by just looking at screenshots. We have omitted the results from the author's original blog post, but if you want to see how this performs further, go back and check out the source link.
One thing the author did in his blog post was to use pooling layers, which, as we have seen, is quite standard when working with convolution. However, when and how to use pooling layers is a bit contentious right now and requires further detailed discussion, which is provided in the next section.