Introduction to Keras
Building ANNs involves creating layers of nodes. Each node can be thought of as a tensor of weights that are learned in the training process. Once the ANN has been fitted to the data, a prediction is made by multiplying the input data by the weight matrices layer by layer, applying any other linear transformation when needed, such as activation functions, until the final output layer is reached. The size of each weight tensor is determined by the size of the shape of the input nodes and the shape of the output nodes. For example, in a single-layer ANN, the size of our single hidden layer can be thought of as follows:
If the input matrix of features has n rows, or observations, and m columns, or features, and we want our predicted target to have n rows (one for each observation) and one column (the predicted value), we can determine the size of our hidden layer by what is needed to make the matrix multiplication valid. Here is the representation of a single-layer ANN:
Here, we can determine that the weight matrix will be of size (mx1) to ensure the matrix multiplication is valid.
If we have more than one hidden layer in an ANN, then we have much more freedom with the size of these weight matrices. In fact, the possibilities are endless, depending on how many layers there are and how many nodes we want in each layer. In practice, however, certain architecture designs work better than others, as we will be learning throughout this book.
In general, Keras abstracts much of the linear algebra out of building neural networks so that users can focus on designing the architecture. For most networks, only the input size, output size, and the number of nodes in each hidden layer are needed to create networks in Keras.
The simplest model structure in Keras is the Sequential model, which can be imported from keras.models. The model of the Sequential class describes an ANN that consists of a linear stack of layers. A Sequential model can be instantiated as follows:
from keras.models import Sequential
model = Sequential()
Layers can be added to this model instance to create the structure of the model.
Note
Before initializing your model, it is helpful to set a seed using the seed function in NumPy's random library and the set_seed function from TensorFlow's random library.
Layer Types
The notion of layers is part of the Keras core API. A layer can be thought of as a composition of nodes, and at each node, a set of computations happen. In Keras, all the nodes of a layer can be initialized by simply initializing the layer itself. The inpidual operation of a generalized layer node can be seen in the following diagram. At each node, the input data is multiplied by a set of weights using matrix multiplication, as we learned earlier in this chapter. The sum of the product between the weights and the input is applied, which may or may not include a bias, as shown by the input node equal to 1 in the following diagram. Further functions may be applied to the output of this matrix multiplication, such as activation functions:
Some common layer types in Keras are as follows:
- Dense: This is a fully connected layer in which all the nodes of the layer are directly connected to all the inputs and all the outputs. ANNs for classification or regression tasks on tabular data usually have a large percentage of their layers with this type in the architecture.
- Convolutional: This layer type creates a convolutional kernel that is convolved with the input layer to produce a tensor of outputs. This convolution can occur in one or multiple dimensions. ANNs for the classification of images usually feature one or more convolutional layers in their architecture.
- Pooling: This type of layer is used to reduce the dimensionality of an input layer. Common types of pooling include max pooling, in which the maximum value of a given window is passed through to the output, or average pooling, in which the average value of a window is passed through. These layers are often used in conjunction with a convolutional layer, and their purpose is to reduce the dimensions of the subsequent layers, allowing for fewer training parameters to be learned with little information loss.
- Recurrent: Recurrent layers learn patterns from sequences, so each output is dependent on the results from the previous step. ANNs that model sequential data such as natural language or time-series data often feature one or more recurrent layer types.
There are other layer types in Keras; however, these are the most common types when it comes to building models using Keras.
Let's demonstrate how to add layers to a model by instantiating a model of the Sequential class and adding a Dense layer to the model. Successive layers can be added to the model in the order in which we wish the computation to be performed and can be imported from keras.layers. The number of units, or nodes, needs to be specified. This value will also determine the shape of the result from the layer. A Dense layer can be added to a Sequential model in the following way:
from keras.layers import Dense
from keras.models import Sequential
input_shape = 20
units = 1
model.add(Dense(units, input_dim=input_shape))
Note
After the first layer, the input dimension does not need to be specified since it is determined from the previous layer.
Activation Functions
An activation function is generally applied to the output of a node to limit or bound its value. The value from each node is unbounded and may have any value, from negative to positive infinity. These can be troublesome within neural networks where the values of the weights and losses have been calculated and can head toward infinity and produce unusable results. Activation functions can help in this regard by bounding the value. Often, these activation functions push the value to two limits. Activation functions are also useful for deciding whether the node should be "fired" or not. Common activation functions are as follows:
- The Step function: The value is nonzero if it is above a certain threshold; otherwise, it is zero.
- The Linear function: , which is a scalar multiplication of the input value.
- The Sigmoid function: , such as a smoothed-out step function with smooth gradients. This activation function is useful for classification since the values are bound from zero to one.
- The Tanh function: , which is a scaled version of the sigmoid with steeper gradients around x=0.
- The ReLU function: , otherwise 0.
Now that we have looked at some of the main components, we can begin to see how we might create useful neural networks out of these components. In fact, we can create a logistic regression model with all the concepts we have learned about in this chapter. A logistic regression model operates by taking the sum of the product of an input and a set of learned weights, followed by the output being passed through a logistic function. This can be achieved with a single-layer neural network with a sigmoid activation function.
Activation functions can be added to models in the same manner that layers are added to models. The activation function will be applied to the output of the previous step in the model. A tanh activation function can be added to a Sequential model as follows:
from keras.layers import Dense, Activation
from keras.models import Sequential
input_shape = 20
units = 1
model.add(Dense(units, input_dim=input_shape))
model.add(Activation('tanh'))
Note
Activation functions can also be added to a model by including them as an argument when defining the layers.
Model Fitting
Once a model's architecture has been created, the model must be compiled. The compilation process configures all the learning parameters, including which optimizer to use, the loss function to minimize, as well as optional metrics, such as accuracy, to calculate at various stages of the model training. Models are compiled using the compile method, as follows:
model.compile(optimizer='adam', loss='binary_crossentropy', \
metrics=['accuracy'])
After the model has been compiled, it is ready to be fit to the training data. This is achieved with an instantiated model using the fit method. Useful arguments when using the fit method are as follows:
- X: The array of the training feature data to fit the data to.
- y: The array of the training target data.
- epochs: The number of epochs to run the model for. An epoch is an iteration over the entire training dataset.
- batch_size: The number of training data samples to use per gradient update.
- validation_split: The proportion of the training data to be used for validation that is evaluated after each epoch.
- shuffle: Indicates whether to shuffle the training data before each epoch.
The fit method can be used on a model in the following way:
history = model.fit(x=X_train, y=y_train['y'], \
epochs=10, batch_size=32, \
validation_split=0.2, shuffle=False)
It is beneficial to save the output of calling the fit method of the model since it contains information on the model's performance throughout training, including the loss, which is evaluated after each epoch. If a validation split is defined, the loss is evaluated after each epoch on the validation split. Likewise, if any metrics are defined in training, they are also calculated after each epoch. It is useful to plot such loss and evaluation metrics to determine model performance as a function of the epoch. The model's loss as a function of the epoch can be visualized as follows:
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(history.history['loss'])
plt.show()
Keras models can be evaluated by utilizing the evaluate method of the model instance. This method returns the loss and any metrics that were passed to the model for training. The method can be called as follows when evaluating an out-of-sample test dataset:
test_loss = model.evaluate(X_test, y_test['y'])
These model-fitting steps represent the basic steps that need to be followed to build, train, and evaluate models using the Keras package. From here, there are an infinite number of ways to build and evaluate a model, depending on the task you wish to accomplish. In the following activity, we will create an ANN to perform the same task that we completed in Chapter 1, Introduction to Machine Learning with Keras. In fact, we will recreate the logistic regression algorithm with ANNs. As such, we expect there to be similar performance between the two models.
Activity 2.01: Creating a Logistic Regression Model Using Keras
In this activity, we are going to create a basic model using the Keras library. We will perform the same classification task that we did in Chapter 1, Introduction to Machine Learning with Keras. We will use the same online shopping purchasing intention dataset and attempt to predict the same variable.
In the previous chapter, we used a logistic regression model to predict whether a user would purchase a product from a website when given various attributes about the online session's behavior and the attributes of the web page. In this activity, we will introduce the Keras library, though we'll continue to utilize the libraries we introduced previously, such as pandas, for easily loading in the data, and sklearn, for any data preprocessing and model evaluation metrics.
Note
Preprocessed datasets have been provided for you to use for this activity. You can download them from https://packt.live/2ApIBwT.
The steps to complete this activity are as follows:
- Load in the processed feature and target datasets.
- Split the training and target data into training and test datasets. The model will be fit to the training dataset and the test dataset will be used to evaluate the model.
- Instantiate a model of the Sequential class from the keras.models library.
- Add a single layer of the Dense class from the keras.layers package to the model instance. The number of nodes should be equal to the number of features in the feature dataset.
- Add a sigmoid activation function to the model.
- Compile the model instance by specifying the optimizer to use, the loss metric to evaluate, and any other metrics to evaluate after each epoch.
- Fit the model to the training data, specifying the number of epochs to run for and the validation split to use.
- Plot the loss and other evaluation metrics with respect to the epoch that will be evaluated on the training and validation datasets.
- Evaluate the loss and other evaluation metrics on the test dataset.
After implementing these steps, you should get the following expected output:
2466/2466 [==============================] - 0s 15us/step
The loss on the test set is 0.3632 and the accuracy is 86.902%
Note
The solution for this activity can be found on page 356.
In this activity, we looked at some of the fundamental concepts of creating ANNs in Keras, including various layer types and activation functions. We used these components to create a simple logistic regression model using a package that gives us similar results to the logistic regression model we used in Chapter 1, Introduction to Machine Learning with Keras. We learned how to build the model with the Keras library, train the model with a real-world dataset, and evaluate the performance of the model on a test dataset to provide an unbiased evaluation of the performance of the model.