Introduction
In the previous chapter, you learned about the mathematics of neural networks, including linear transformations with scalars, vectors, matrices, and tensors. Then, you implemented your first neural network using Keras by building a logistic regression model to classify users of a website into those who will purchase from the website and those who will not.
In this chapter, you will extend your knowledge of building neural networks using Keras. This chapter covers the basics of deep learning and will provide you with the necessary foundations so that you can build highly complex neural network architectures. We will start by extending the logistic regression model to a simple single-layer neural network and then proceed to more complicated neural networks with multiple hidden layers.
In this process, you will learn about the underlying basic concepts of neural networks, including forward propagation for making predictions, computing loss, backpropagation for computing derivatives of loss with respect to model parameters, and, finally, gradient descent for learning about optimal parameters for the model. You will also learn about the various choices that are available so that you can build and train a neural network in terms of activation functions, loss functions, and optimizers.
Furthermore, you will learn how to evaluate your model while understanding issues such as overfitting and underfitting, all while looking at how they can impact the performance of your model and how to detect them. You will learn about the drawbacks of evaluating a model on the same dataset that's used for training, as well as the alternative approach of holding back a part of the available dataset for evaluation purposes. Subsequently, you will learn how to compare the model error rate on each of these two subsets of the dataset that can be used to detect problems such as high bias and high variance in the model. Lastly, you will learn about a technique called early stopping to reduce overfitting, which is again based on comparing the model's error rate to the two subsets of the dataset.