📊 R - Basics
Updated at 2017-10-13 17:40
This note is mostly about R syntax and how to use R. It is a programming language and environment for statistical computing.
Use an IDE. For example RStudio. It will make your life a lot easier.
How to get started on Mac.
# create file test.R
write('{"epoch" : 1, "loss" : 0.9}', stdout())
write('{"epoch" : 2, "loss" : 0.5}', stdout())
write('{"epoch" : 3, "loss" : 0.1}', stdout())
brew install r
Rscript test.R
Every R installation comes with datasets-package
. This contains 100 or so helpful example datasets.
# Speed (mph) and stopping distances (ft) of cars in 1920s.
head(cars)
# speed dist
# 1 4 2
# 2 4 10
# 3 7 4
# 4 7 22
# 5 8 16
# 6 9 10
# Edgar Anderson's Iris Data
# 150 rows of 5 variables each
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
# Air quality in New York, May to September in 1973.
head(airquality)
# Ozone Solar.R Wind Temp Month Day
# 1 41 190 7.4 67 5 1
# 2 36 118 8.0 72 5 2
# 3 12 149 12.6 74 5 3
# 4 18 313 11.5 62 5 4
# 5 NA NA 14.3 56 5 5
# 6 28 NA 14.9 66 5 6
# A time series of 468 CO2 observations in Mauna Loa; monthly from 1959 to 1997.
head(co2)
# [1] 315.42 316.31 316.50 317.56 318.13 318.00
# Topographic information on Auckland's Maunga Whau volcano
# A matrix with 87 rows and 61 columns
head(volcano)
# ...
Summary Statistics
You can get summaries for most variables types.
targets <- read.csv("targets.csv")
str(targets) # Summary
summary(targets) # Summary with min, max, median, mean, quarters and NA count.
table(targets) # Counts of all existing values.
unique(targets) # Single instance of all existing values.
Mean, add together and divide by count.
# Count of limbs crew member has.
limbs <- c(4, 3, 4, 3, 2, 4)
names(limbs) <- c('One-Eye', 'Peg-Leg', 'Smitty', 'Hook', 'Scooter', 'Dan')
# Average limb count a.k.a. mean.
mean(limbs)
# Generating a bar plot with average line.
barplot(limbs)
abline(h = mean(limbs))
Median, choose the middle value.
abline(h = median(limbs))
Standard deviation; describes the range of typical values from a data set.
# Loot amounts from raids.
pounds <- c(45000, 50000, 35000, 40000, 35000, 45000, 10000, 15000)
barplot(pounds)
meanValue <- mean(pounds)
abline(h = meanValue)
# What is normal "loot" amount?
# Use standard derivation to see normal range.
deviation <- sd(pounds)
abline(h = meanValue + deviation)
abline(h = meanValue - deviation)
Apply
Apply runs a function on each element of a data structure.
targets <- read.csv("targets.csv")
apply(targets, 2, sum)
apply(targets, 1, sum, na.rm=TRUE)
apply(targets, 2, function(x) {
sd(x) / sqrt(length(x))
})
products <- read.csv("products.csv")
tapply(products$totalPrice, products$condition, mean)
# ==>
tapply(products$totalPrice, products$wheels, mean)
# ==>
tapply(products$totalPrice, products[ ,c("condition","wheels")], length)
With
With is useful for one-off calculation on dataset.
products <- read.csv("products.csv")
# With products, calculate total price minus shipping cost.
priceProfit <- with(products, totalPrice - shippingCost)
Within is useful to including new variables to datasets.
products <- read.csv("products.csv")
pr <- within(products, {
priceProfit <- totalPrice - shippingCost
})
products <- read.csv("products.csv")
# Here only total price and conditions are returned.
aggregate(totalPrice ~ wheels + cond, products, mean)
# Dot . means to return all columns.
aggregate(. ~ wheels + cond, products, mean)
Sources
- Google Developers R Programming Videos
- Try R