rukÂ·si

# ðŸ“Š R - Correlation

Updated at 2012-12-29 10:32

This note is about finding correlations in data using R. Correlation is statistical relationship between two variables or data sets. For example, there is a correlation between demand for a product and its price.

Finding correlation in a plot.

``````# Loading data.
countries <- merge(x = gdp, y = piracy)

# To get correlation between country GDP and piracy rate.
# Falling points means negative correlation.
# Rising points mean positive correlation.
plot( countries\$GDP, countries\$PiracyRate )
``````

Correlation between two vectors.

``````# Show correlations.
cor(a)

# Test correlaction.
cor.test(countries\$GDP, countries\$Piracy)

# ...
# t = -14.8371, df = 107, p-value < 2.2e-16
# ...

# If p-value is below 0.05, it is considered statically significant.
# Thus, there is negative correlation between GDP and piracy on our data.
``````

Linear model `lm()`.

``````line <- lm(countries\$Piracy ~ countries\$GDP)
abline(line)
# Now we can predict piracy rate if we know GDP.
``````

Generalized Linear Model `glm()`.

``````# glm()
``````

Others.

``````# Analysis of Variance
aov()
anova()

# T test
# http://en.wikipedia.org/wiki/Student%27s_t-test
t.test()

# Test of Equal or Given Proportions
prop.test()

# Binomial test
# http://en.wikipedia.org/wiki/Binomial_test
binom.test()

# Chi-squared test
# http://en.wikipedia.org/wiki/Chi-squared_test
chisq.test(matrix1)

# Fisher's exact test
# http://en.wikipedia.org/wiki/Fisher%27s_exact_test
fisher.test()

# Friedman test
http://en.wikipedia.org/wiki/Friedman_test
friedman.test()
``````