Multilayer perceptron in TF
Thus far, we have been looking at a simple example of a single perceptron and how to train it. This worked well for our small dataset, but as the number of inputs increases, the complexity of our networks increases, and this cascades into the math as well. The following diagram shows a multilayer perceptron, or what we commonly refer to as an ANN:
In the diagram, we see a network with one input, one hidden, and one output layer. The inputs are now shared across an input layer of neurons. The first layer of neurons processes the inputs, and outputs the results to be processed by the hidden layer and so on, until they finally reach the output layer.
Multilayer networks can get quite complex, and the code for these models is often abstracted away by high-level interfaces such as Keras, PyTorch, and so on. These tools work well for quickly exploring network architecture and understanding DL concepts. However, when it comes to performance, which is key in games, it really requires the models to be built in TensorFlow or an API that supports low-level math operations. In this book, we will swap from Keras, a higher-level SDK, to TensorFlow and back for the introductory DL chapters. This will allow you to see the differences and similarities between working with either interface.
Unity ML-Agents was first prototyped with Keras but has since progressed to TensorFlow. Most certainly, the team at Unity, as well as others, has done this for reasons of performance and, to some extent, control. Working with TensorFlow is akin to writing your own shaders. While it is quite difficult to write shaders and TF code, the ability to customize your own rendering and now learning will make your game be unique, and it will stand out.
There is a great TensorFlow example of a multilayer perceptron next for your reference, listing Chapter_1_4.py. In order to run this code using TensorFlow, follow the next steps:
- First, install TensorFlow using the following command from a Python 3.5/3.6 window on Windows or macOS. You can also use an Anaconda Prompt, with administrator rights:
pip install tensorflow
OR
conda install tensorflow //using Anaconda
- Make sure you install TensorFlow suited to the default Python environment. We will worry about creating more structured virtual environments later. If you are not sure what a Python virtual environment is, step away from the book and take a course in Python right away.
- The following Python code is from the Chapter_1_4.py listing, with each section explained in the following steps:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
- We start by loading the mnist training set. The mnist dataset is a collection of 28 x 28 pixel images showing hand-drawn representations of the digits 0-9, or what we will refer to as 10 classes:
import tensorflow as tf
# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1
# Network Parameters
n_hidden_1 = 256 # 1st layer number of neurons
n_hidden_2 = 256 # 2nd layer number of neurons
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
- Then we import the tensorflow library as tf. Next, we set a number of parameters we will use later. Note how we are defining the inputs and hidden parameters as well:
# tf Graph input
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_classes])
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
- Next, we set up a couple of TensorFlow placeholders with tf.placeholder, to hold the number of inputs and classes as type 'float'. Then we create and initialize variables using tf.Variable, first doing the weights and then the biases. Inside the variable declaration, we initialize normally distributed data into a 2D matrix or tensor with dimensions equal to n_input and n_hidden_1 using tf.random_normal, which fills a tensor with randomly distributed data:
# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
# Hidden fully connected layer with 256 neurons
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
# Output fully connected layer with a neuron for each class
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Construct model
logits = multilayer_perceptron(X)
- Then we create the model by multiplying the weights and biases for each layer operation. What we are doing here is essentially converting our activation equation into a matrix/tensor of equations. Now instead of doing a single pass, we perform multiple passes in one operation using matrix/tensor multiplication. This allows us to run multiple training images or sets of data at a time, which is a technique we use to better generalize learning.
For each layer in our neural network, we use tf.add and tf.matmul to add matrix multiplication operations to what we commonly call a TensorFlow inference graph. You can see by the code we are creating that there are two hidden layers and one output layer for our model:
# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)
- Next, we define a loss function and optimizer. loss_op is used to calculate the total loss of the network. Then AdamOptimizer is what does the optimizing according to the loss or cost function. We will explain these terms in detail later, so don't worry if things are still fuzzy:
# Initializing the variables
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(mnist.train.num_examples/batch_size)
# Loop over all batches
for i in range(total_batch):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([train_op, loss_op], feed_dict={X: batch_x,Y: batch_y})
# Compute average loss
avg_cost += c / total_batch
- Then we initialize a new TensorFlow session by creating a new session and running it. We use that epoch iterative training method again to loop over each batch of images. Remember, an entire batch of images goes through the network at the same time, not just one image. Then, we loop through each batch of images in each epoch and optimize (backpropagate and train) the cost, or minimize the cost if you will:
# Display logs per epoch step
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch+1), "cost={:.9f}".format(avg_cost))
print("Optimization Finished!")
- Then we output the results of each epoch run, showing how the network is minimizing the error:
# Test model
pred = tf.nn.softmax(logits) # Apply softmax to logits
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
- Next, we actually run the prediction with the preceding code and determine the percentage of correct values using the optimizer we selected before on the logits model:
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print("Accuracy:", accuracy.eval({X: mnist.test.images, Y: mnist.test.labels}))
- Finally, we calculate and output the accuracy of our model. If you run the exercise, don't just go into how accurate the model is but think of ways the accuracy could be improved.
There is plenty going on in the preceding reference example, and we will break it down further in the next sections. Hopefully, you can see at this point how complex things can get. This is why for most of the fundamental chapters in this book, we will teach the concepts with Keras first. Keras is a powerful and simple framework that will help us build complex networks in no time and makes it much simpler for us to teach and for you to learn. We will also provide duplicate examples developed in TensorFlow and show some of the key differences as we progress through the book.
In the next section, we explain the basic concepts of TensorFlow, what it is, and how we use it.