上QQ阅读APP看书，第一时间看更新

How to do it...

PySpark is not configured to work within Jupyter notebooks by default, but a slight tweak of the .bashrc script can remedy this issue. We will walk through these steps in this section:

Access the .bashrc script by executing the following command:

$ nano .bashrc

Scrolling all the way to the end of the script should reveal the last command modified, which should be the PATH set by Anaconda during the installation earlier in the previous section. The PATH should appear as seen in the following:

# added by Anaconda3 4.4.0 installer
export PATH="/home/asherif844/anaconda3/bin:$PATH"

Underneath, the PATH added by the Anaconda installer can include a custom function that helps communicate the Spark installation with the Jupyter notebook installation from Anaconda3. For the purposes of this chapter and remaining chapters, we will name that function sparknotebook. The configuration should appear as the following for sparknotebook():

function sparknotebook()
{
export SPARK_HOME=/home/asherif844/spark-2.2.0-bin-hadoop2.7
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
$SPARK_HOME/bin/pyspark
}

The updated .bashrc script should look like the following once saved:

Save and exit from the .bashrc file. It is recommended to communicate that the .bashrc file has been updated by executing the following command and restarting the terminal application:

$ source .bashrc