Deep Learning for Beginners
上QQ阅读APP看书,第一时间看更新

Preparing Data

Now that you have successfully prepared your system to learn about deep learning, see Chapter 2, Setup and Introduction to Deep Learning Frameworks, we will proceed to give you important guidelines about data that you may encounter frequently when practicing deep learning. When it comes to learning about deep learning, having well-prepared datasets will help you to focus more on designing your models rather than preparing your data. However, everyone knows that this is not a realistic expectation and if you ask any data scientist or machine learning professional about this, they will tell you that an important aspect of modeling is knowing how to prepare your data. Knowing how to deal with your data and how to prepare it will save you many hours of work that you can spend fine-tuning your models. Any time spent preparing your data is time well invested indeed.

This chapter will introduce you to the main concepts behind data processing to make it useful in deep learning. It will cover essential concepts of formatting outputs and inputs that are categorical or real-valued, and techniques for augmenting data or reducing the dimensions of data. At the end of the chapter, you should be able to handle the most common data manipulation techniques that can lead to successful choices of deep learning methodologies down the road.

Specifically, this chapter discusses the following:

  • Binary data and binary classification
  • Categorical data and multiple classes
  • Real-valued data and univariate regression
  • Altering the distribution of data
  • Data augmentation
  • Data dimensionality reduction
  • Ethical implications of manipulating data