Hands-On Big Data Analytics with PySpark

上QQ阅读APP看书，第一时间看更新

Getting Your Big Data into the Spark Environment Using RDDs

Primarily, this chapter will provide a brief overview of how to get your big data into the Spark environment using resilient distributed datasets (RDDs). We will be using a wide array of tools to interact with and modify this data so that useful insights can be extracted. We will first load the data on Spark RDDs and then carry out parallelization with Spark RDDs.

In this chapter, we will cover the following topics:

Loading data onto Spark RDDs
Parallelization with Spark RDDs
Basics of RDD operation

本周热推：

利用Python进行数据分析（原书第2版）数据分析师养成宝典 21天学通SQL Server 数字化转型方法论：落地路径与数据中台数据库原理与实践（Access版）