Creating bar charts
In this recipe, we will learn how to make bar plots that are useful to visualize summary data across various categories, such as sales of products or results of elections.
Getting ready
First, we need to load the citysales.csv
example data file (you can download this file from the code download section of the book's companion website):
sales<-read.csv("citysales.csv",header=TRUE)
How to do it...
Just like the plot()
function we used to make scatter plots and line graphs in the earlier recipes, the barplot()
and dotchart()
functions are part of the base graphics library in R. This means that we don't need to install any additional packages or libraries to use these functions.
We can make bar plots using the barplot()
function as follows:
barplot(sales$ProductA, names.arg= sales$City, col="black")
The default setting of orientation for bars is vertical. To change the bars to horizontal, use the horiz
argument (by default, it is set to FALSE
):
barplot(sales$ProductA, names.arg= sales$City, horiz=TRUE, col="black")
How it works...
The first argument of the barplot()
function is either a vector or matrix of values that you want to plot as bars, such as the sales data variables in our previous examples. The labels for the bars are specified by the names.arg
argument, but we use this argument only when plotting single bars. In the example with sales figures for multiple products, we didn't specify names.arg
. R automatically used the product names as the labels and we had to instead specify the city names as the legend.
As with the other types of plots, the col
argument is used to specify the color of the bars. This is a common feature throughout R, and col
is used to set the color of the main feature in any kind of graph.
There's more...
Bar plots are often used to compare the values of groups of values across categories. For example, we can plot the sales in different cities for more than one product using the beside
argument:
barplot(as.matrix(sales[,2:4]), beside=TRUE, legend=sales$City, col=heat.colors(5), border="white")
You will notice that when plotting data for multiple products (columns), we used the square bracket notation in the sales[,2:4]
form. In R, the square bracket notation is used to refer to specific columns and rows of a dataset. For example, sales[2,3]
refers to the value in the second row and third column.
So, the notation is of the sales[row,column]
form. If you want to refer to all the rows in a certain column, you can omit the row number. For example, if you want to refer to all the rows in column 2, you would use sales[,2]
. Similarly, for all the columns of row 3, you would use sales[3,]
.
So, sales[,2:4]
refers to all the data in columns 2 to 4, which is the product sales data as shown:
The orientation of bars is set to vertical by default. It is controlled by the optional horiz
(for horizontal) argument. If we do not use this argument in our barplot()
function call, it is set to FALSE
. To make the bars horizontal, we set horiz
to TRUE
.
The beside
argument is used to specify whether we want the bars in a group of data to be stacked or adjacent to each other. By default, beside
is set to FALSE
, which produces a stacked bar graph. To make the bars adjacent, we set beside
to TRUE
.
To change the color of the border around the bars, we used the border
argument. The default border color is black. However, if you wish to use another color, say white, you can set it with border="white"
.
To make the same graph with horizontal bars, we will type:
barplot(as.matrix(sales[,2:4]), beside=TRUE, legend=sales$City, col=heat.colors(5), border="white", horiz=TRUE)
See also
Bar charts are explored in a lot more detail with some advanced recipes in Chapter 6, Creating Bar, Dot, and Pie Charts.