
Python graphics in Jupyter
How do Python graphics work in Jupyter?
I started another view for this named Python graphics so as to distinguish the work.
If we were to build a sample dataset of baby names and the number of births in a year of that name, we could then plot the data.
The Python coding is simple:
import pandas import matplotlib %matplotlib inline # define our two columns of data baby_name = ['Alice','Charles','Diane','Edward'] number_births = [96, 155, 66, 272] # create a dataset from the to sets dataset = list(zip(baby_name,number_births)) dataset # create a Python dataframe from the dataset df = pandas.DataFrame(data = dataset, columns=['Name', 'Number']) df # plot the data df['Number'].plot()
The steps for the script are as follows:
- Import the graphics library (and data library) we need
- Define our data
- Convert the data into a format that allows for an easy graphical display
- Plot the data
We would expect a resultant graph of the number of births by baby name.
Taking the previous script and placing it into cells of our Jupyter node, we get something that looks like this:
- I have broken the script into different cells for easier readability. Having different cells also allows you to develop the script easily, step-by-step, where you can display the values computed so far to validate your results. I have done this in most of the cells by displaying the dataset and dataframe at the bottom of those cells.
- When we run this script (Cell | Run All), we can see the results at each step being displayed as the script progresses:
- And finally, we can see our plot of the births:
- I was curious about what metadata was stored for this script. Looking into the .ipynb file, you can see the expected value for the formula cells.
- The tabular data display of the dataframe is stored as HTML convenient:
...{ "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Number</th>\n", " </tr>\n",...
- The graphic output cell is stored like this:
{ { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x1822deb44a8>" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0... "<a hundred lines of hexcodes> ...VTRitYII=\n", "text/plain": [ "<matplotlib.figure.Figure at 0x1822e26a828>" ] },...
Where the image/png tag contains a large hex digit string representation of the graphical image displayed on screen (I abbreviated the display in the coding that's shown). So, the actual generated image is stored in the metadata for the page.
So, rather than a cache, Jupyter is remembering the output from when each cell was last executed.