ruk·si

📊 R
Data Frames

Updated at 2017-10-13 17:56

Data frames are like database table. They keep relevant data together. In R, they are highly similar to matrices, thus most things that work with matrices work with data frames. Most notable difference is that each column in data frames can have a different type of variables.

treasure <- data.frame(weights, prices, types)
print(treasure)
#   weights prices  types
# 1     300   9000   gold
# 2     200   5000 silver
# 3     100  12000   gems
# 4     250   7500   gold
# 5     150  18000   gems

Data frame columns have names which you can use to accessing the column.

treasure[[2]]
# [1]  9000  5000 12000  7500 18000

names(treasure)
# [1 ] weights prices types

treasure[["prices"]]
# [1]  9000  5000 12000  7500 18000

treasure$prices
# [1]  9000  5000 12000  7500 18000

mean(treasure$prices)
# [1] ?

You can filter rows by values in column.

high.value <- subset(treasure, prices > 8000)
print(high.value)

You can load files with the read command.

# CSV file
read.csv("targets.csv")
#          Port Population Worth
# 1   Cartagena      35000 10000
# 2 Porto Bello      49000 15000
# 3      Havana     140000 50000
# 4 Panama City     105000 35000

# Tab-delimited text file
read.delim("targets.txt")
#          Port Population Worth
# 1   Cartagena      35000 10000
# 2 Porto Bello      49000 15000
# 3      Havana     140000 50000
# 4 Panama City     105000 35000

# Other way to bring tab-delimited files.
read.table("infantry.txt", sep="\t")

# Specify first row as headers.
read.table("infantry.txt", sep="\t", header=TRUE)
#          Port Infantry
# 1 Porto Bello      700
# 2   Cartagena      500
# 3 Panama City     1500
# 4      Havana     2000

Combining data frames. You can either add them as columns cbind or rows rbind.

df <- data.frame(x = 1:3, y = c("a", "b", "c"), stringsAsFactors = FALSE)
cbind(df, data.frame(z = 3:1)) # Number of rows must match.
#   x y z
# 1 1 a 3
# 2 2 b 2
# 3 3 c 1
rbind(df, data.frame(x = 10, y = "z")) # Columns names must match.
#    x y
# 1  1 a
# 2  2 b
# 3  3 c
# 4 10 z

Merging data frames. By default searches columns with same name.

targets <- read.csv("targets.csv")
infantry <- read.table("infantry.txt", sep="\t", header=TRUE)
merge(x = targets, y = infantry)

#          Port Population Worth Infantry
# 1   Cartagena      35000 10000      500
# 2      Havana     140000 50000     2000
# 3 Panama City     105000 35000     1500
# 4 Porto Bello      49000 15000      700

Rememer to convert time data to date object. Helps to create nice graphs and calculater time differneces.

targets <- read.csv("targets.csv")
targets$date <- as.Date(targets$date, "%d-%b-%y")

Sources

  • Google Developers R Programming Videos
  • Try R