Practical Data Science Cookbook(Second Edition)
上QQ阅读APP看书,第一时间看更新

How to do it...

The following steps will walk you through the initial import of the data into the R environment:

  1. First, set the working directory to the location where we saved the vehicles.csv.zip file:
setwd("path")  

Substitute the path for the actual directory.

  1. We can load the data directly from compressed (ZIP) files, as long as you know the filename of the file inside the ZIP archive that you want to load:
vehicles <- read.csv(unz("vehicles.csv.zip", "vehicles.csv"), 
stringsAsFactors = F)
  1. To see whether this worked, let's display the first few rows of data using the head command:
head(vehicles) 

You should see the first few rows of the dataset printed on your screen.

Note that we could have used the tail command, which would have displayed the last few rows of the data frame instead of the first few rows.
  1. The labels command gives the variable labels for the vehicles.csv file. Note that we use labels, since labels is a function in R. A quick look at the file shows that the variable names and their explanations are separated by -. So we will try to read the file using - as the separator:
labels <- read.table("varlabels.txt", sep = "-", header = FALSE)
## Error: line 11 did not have 2 elements
  1. This doesn't work! A closer look at the error shows that in line 11 of the data file, there are two - symbols, and it thus gets broken into three parts rather than two, unlike the other rows. We need to change our file-reading approach to ignore hyphenated words:
labels <- do.call(rbind, strsplit(readLines("varlabels.txt"), " - ")) 
  1. To check whether it works, we use the head function again:
head(labels) 

[,1] [,2]
[1,] "atvtype" "type of alternative fuel or advanced
technology vehicle"
[2,] "barrels08" "annual petroleum consumption in barrels for
fuelType1 (1)"
[3,] "barrelsA08" "annual petroleum consumption in barrels for
fuelType2 (1)"
[4,] "charge120" "time to charge an electric vehicle in hours
at 120 V"
[5,] "charge240" "time to charge an electric vehicle in hours
at 240 V"