Practical Data Science Cookbook（Second Edition）

上QQ阅读APP看书，第一时间看更新

Introduction

The first project we will introduce in this book is an analysis of automobile fuel economy data. The primary tool that we will use to analyze this dataset is the R statistical programming language. R is often referred to as the lingua franca of data science since it is currently the most popular language for statistics and data analysis. As you'll see from the examples in this book, R is an excellent tool for data manipulation, analysis, modeling, visualization, and creating useful scripts to get analytical tasks done.

The recipes in this chapter will roughly follow these five steps in the data science pipeline:

Acquisition
Exploration and understanding
Munging, wrangling, and manipulation
Analysis and modeling
Communication and operationalization

Process-wise, the backbone of data science is the data science pipeline, and in order to get good at data science, you need to gain experience going through this process while swapping various tools and methods along the way so that you always use the ones that are appropriate for the dataset you are analyzing.

The goal of this chapter is to guide you through an analysis project on automobile fuel efficiency via step-by-step examples that you can learn from and then apply to other datasets and analysis projects in the future. Think of this chapter as a warm-up for the longer and more challenging chapters to come.

本周热推：

Python编程：从入门到实践 Java从初学到精通 Python编程：从入门到实践（第2版）编码：隐匿在计算机软硬件背后的语言深度学习入门：基于Python的理论与实现