📊 Statistics

Updated at 2016-01-02 02:33

You can always take the more practical approach to statistics by simulating everything. Less math-heavy, more programmer friendly, and above all a lot more fun because you get to solve the problem with your own skills, not by Googling around the solution.

# You toss a coin 30 times and see 22 heads. Is the coin fair?

# Null Hypothesis:
import numpy.random as nprnd
M = 0
for i in range(10001):
    trials = nprnd.randint(2, size=30)
    if (trials.sum() >= 22):
        M = M + 1
p = M / 10000.0
print p # => 0.0087
# assume that not a fair coin because probability is around 0.008

# Yertle stacks turtles on top of each other.
# Recorded stacks:
# 48, 24, 32, 61, 51, 12, 32, 18, 19, 24, 21, 41, 29, 21, 25, 23, 42, 18, 23, 13
# How high can he stack the turtles?
import numpy as np
import numpy.random as nprnd
N = [48, 24, 32, 61, 51, 12, 32, 18, 19, 24, 21, 41, 29, 21, 25, 23, 42, 18, 23, 13]
xbar = []
for i in range(10001):
    n = []
    for x in range(len(N)):
        sample = N[nprnd.randint(len(N))]
        n.append(sample)
    xbar.append(np.mean(n))
print np.mean(xbar), np.std(xbar)
# => 28.8561593841 2.8927852906
# Estimated height is 29 +- 3 turtles.

# We have a data set of As with average of 73.5
# We have a data set of Bs with average of 66.9
# => Difference is 6.6, is it the statistically significant?

# Cross validation
# XY coordinate points, set A
# take random points from the set to create B
# find best model for A, RMS
# find best model for B, RMS
# swap the models, check average RMS after

TODO:

Bayesian Methods for Hackers
Statistical thinking for Data Science
Statistics is easy by Shasha and Wilson
https://en.wikipedia.org/wiki/Nelson_rules

Sources

Statistics for Hackers