
Introduction
The first project we will introduce in this book is an analysis of automobile fuel economy data. The primary tool that we will use to analyze this dataset is the R statistical programming language. R is often referred to as the lingua franca of data science since it is currently the most popular language for statistics and data analysis. As you'll see from the examples in this book, R is an excellent tool for data manipulation, analysis, modeling, visualization, and creating useful scripts to get analytical tasks done.
The recipes in this chapter will roughly follow these five steps in the data science pipeline:
- Acquisition
- Exploration and understanding
- Munging, wrangling, and manipulation
- Analysis and modeling
- Communication and operationalization
Process-wise, the backbone of data science is the data science pipeline, and in order to get good at data science, you need to gain experience going through this process while swapping various tools and methods along the way so that you always use the ones that are appropriate for the dataset you are analyzing.
The goal of this chapter is to guide you through an analysis project on automobile fuel efficiency via step-by-step examples that you can learn from and then apply to other datasets and analysis projects in the future. Think of this chapter as a warm-up for the longer and more challenging chapters to come.