ruk·si

R
Correlation

Updated at 2012-12-29 08:32

This note is about finding correlations in data using R. Correlation is statistical relationship between two variables or data sets. For example, there is a correlation between demand for a product and its price.

Finding correlation in a plot.

# Loading data.
piracy <- read.csv("piracy.csv")
gdp <- read.table("gdp.txt", sep="  ", header=TRUE)
countries <- merge(x = gdp, y = piracy)

# To get correlation between country GDP and piracy rate.
# Falling points means negative correlation.
# Rising points mean positive correlation.
plot( countries$GDP, countries$PiracyRate )

Correlation between two vectors.

# Show correlations.
cor(a)

# Test correlaction.
cor.test(countries$GDP, countries$Piracy)

# ...
# t = -14.8371, df = 107, p-value < 2.2e-16
# ...

# If p-value is below 0.05, it is considered statically significant.
# Thus, there is negative correlation between GDP and piracy on our data.

Linear model lm().

line <- lm(countries$Piracy ~ countries$GDP)
abline(line)
# Now we can predict piracy rate if we know GDP.

Generalized Linear Model glm().

# glm()

Others.

# Analysis of Variance
aov()
anova()

# T test
# http://en.wikipedia.org/wiki/Student%27s_t-test
t.test()

# Test of Equal or Given Proportions
prop.test()

# Binomial test
# http://en.wikipedia.org/wiki/Binomial_test
binom.test()

# Chi-squared test
# http://en.wikipedia.org/wiki/Chi-squared_test
chisq.test(matrix1)

# Fisher's exact test
# http://en.wikipedia.org/wiki/Fisher%27s_exact_test
fisher.test()

# Friedman test
http://en.wikipedia.org/wiki/Friedman_test
friedman.test()