Tuesday, May 11, 2010

Reading in data in R

Reading in data can be a chore in R if you haven't done it before. In this post, I describe how to read data into R's most common data object -- the data frame.

## ----------------------------------------------------##
## The most basic structure in R is a data frame. The ##
## goal here is to read data from some external source ##
## into an R data frame ##
## ----------------------------------------------------##

## --------------------------------------------------- ##
## The standard space-delmited function is read.table()##
## ##
## To read in the data, you will want to save the file ##
## to a known directory. For me, this is "C:/R/" ##
## ##
## Time saving note: these commands sometimes require ##
## double backspaces in the file path. So, even if I ##
## saved my file at "C:/R/data.txt" I tend to type ##
## "C://R//data.txt" ##
## --------------------------------------------------- ##

## No header is default ##
traffic.df = read.table("C://R//traffic.txt")
traffic.df ## Display the data frame

## How to tell R to store variable names ##
traffic.df = read.table("C://R//traffic.txt",header=T)

## --------------------------------------------------- ##
## A common difficulty is that the basic command does ##
## not read in Excel Workbook files. For this, there ##
## are a couple of workarounds. My favorite is to ##
## use read.csv(). ##
## --------------------------------------------------- ##

## First, save your Excel file as type "csv"
## Second, use a command like the following:

insulin.df = read.csv("C://R//insulin.csv", header = T)

##----------------------------------------------------##
## Suppose you want to read in an Stata .dta file ##
## The command is read.dta(). ##
## ##
## This requires loading a package. ##
## In this case, the "foreign" library. ##
## ##
## Because external packages are not part of the base ##
## group of packages, you'll have to tell R to use ##
## them with the command library() ##
##----------------------------------------------------##

library(foreign)

caschool.df = read.dta("C://R//caschool.dta")

------------------------------------------------------
If you came across this page looking for advice on how to read data into R, you likely have a data set that you want to use. If you want to use some of the data referenced in this post, the traffic data set is here (just copy and paste into Notepad, and save to your favorite directory):


density speed
20.4 38.8
27.4 31.5
106.2 10.6
80.4 16.1
141.3 7.7
130.9 8.3
121.7 8.5
106.5 11.1
130.5 8.6
101.1 11.1
123.9 9.8
144.2 7.8
29.5 31.8
30.8 31.6
26.5 34.0
35.7 28.9
30.0 28.8
106.2 10.5
97.0 12.3
90.1 13.2
106.7 11.4
99.3 11.2
107.2 10.3
109.1 11.4

No comments:

Post a Comment