apply
The apply function allows us to work with both DataFrames and series. We'll start with an example that would work equally well with map, before moving on to examples that would only work with apply.
Using our iris DataFrame, let's make a new column based on petal width. We previously saw that the mean for the petal width was 1.3. Let's now create a new column in our DataFrame, wide petal, that contains binary values based on the value in the petal width column. If the petal width is equal to or wider than the median, we will code it with a 1, and if it is less than the median, we will code it 0. We'll do this using the apply function on the petal width column:
A few things happened here, so let's walk through them step by step. The first is that we were able to append a new column to the DataFrame simply by using the column selection syntax for a column name, which we want to create, in this case wide petal. We set that new column equal to the output of the apply function. Here, we ran apply on the petal width column that returned the corresponding values in the wide petal column. The apply function works by running through each value of the petal width column. If the value is greater than or equal to 1.3, the function returns 1, otherwise it returns 0. This type of transformation is a fairly common feature engineering transformation in machine learning, so it is good to be familiar with how to perform it.
Let's now take a look at using apply on a DataFrame rather than a single series. We'll now create a feature based on the petal area:
Notice that we called apply not on a series here, but on the entire DataFrame, and because apply was called on the entire DataFrame, we passed in axis=1 in order to tell pandas that we want to apply the function row-wise. If we passed in axis=0, then the function would operate column-wise. Here, each column is processed sequentially, and we choose to multiply the values from the petal length (cm) and petal width (cm) columns. The resultant series then becomes the petal area column in our DataFrame. This type of power and flexibility is what makes pandas an indispensable tool for data manipulation.