3. dataframes in R
What is a dataframe?
default R objects for holding data
can mix numeric, and text data
ordered/unordered factors
many statistical functions require dataframe inputs
4. dataframes in R
Problems:
print!
slow searching
verbose syntax
no built-in methods for aggregation
Which is most annoying depends on who you are. . .
5. Constructing data.tables
myDT <- data.table(
number=1:3,
letter=c('a','b','c')
) # like data.frame constructor
myDT2 <- as.data.frame(myDF) #conversion
The data.table class inherits dataframe, so data.tables (mostly) can
be used exactly like dataframes, and should not break existing code.
14. Fast insertion
A new column can be inserted by:
E[,country_t := paste0(country,year)]
head(E[,country_t])
## [1] "Afghanistan1990" "Afghanistan1991" "Afghanistan1992
## [5] "Afghanistan1994" "Afghanistan1995"
19. Summary
more compact
faster (sometimes lots)
less memory
great for aggregation/exploratory data crunching
But: - a few traps for the unwary
Good package vignettes & FAQ,
20. Related
aggregate in base R
plyr: use of ddply
sqldf: good if you know SQL
RSQLlite: ditto
other: - RODBC etc: talk to databases - dplyr: nascent, by Hadley,
internal & external