R - Correlation
Updated at 2012-12-29 08:32
This note is about finding correlations in data using R. Correlation is statistical relationship between two variables or data sets. For example, there is a correlation between demand for a product and its price.
Finding correlation in a plot.
# Loading data.
piracy <- read.csv("piracy.csv")
gdp <- read.table("gdp.txt", sep=" ", header=TRUE)
countries <- merge(x = gdp, y = piracy)
# To get correlation between country GDP and piracy rate.
# Falling points means negative correlation.
# Rising points mean positive correlation.
plot( countries$GDP, countries$PiracyRate )
Correlation between two vectors.
# Show correlations.
cor(a)
# Test correlaction.
cor.test(countries$GDP, countries$Piracy)
# ...
# t = -14.8371, df = 107, p-value < 2.2e-16
# ...
# If p-value is below 0.05, it is considered statically significant.
# Thus, there is negative correlation between GDP and piracy on our data.
Linear model lm()
.
line <- lm(countries$Piracy ~ countries$GDP)
abline(line)
# Now we can predict piracy rate if we know GDP.
Generalized Linear Model glm()
.
# glm()
Others.
# Analysis of Variance
aov()
anova()
# T test
# http://en.wikipedia.org/wiki/Student%27s_t-test
t.test()
# Test of Equal or Given Proportions
prop.test()
# Binomial test
# http://en.wikipedia.org/wiki/Binomial_test
binom.test()
# Chi-squared test
# http://en.wikipedia.org/wiki/Chi-squared_test
chisq.test(matrix1)
# Fisher's exact test
# http://en.wikipedia.org/wiki/Fisher%27s_exact_test
fisher.test()
# Friedman test
http://en.wikipedia.org/wiki/Friedman_test
friedman.test()