📊 Statistics
Updated at 2016-01-02 02:33
You can always take the more practical approach to statistics by simulating everything. Less math-heavy, more programmer friendly, and above all a lot more fun because you get to solve the problem with your own skills, not by Googling around the solution.
# You toss a coin 30 times and see 22 heads. Is the coin fair?
# Null Hypothesis:
import numpy.random as nprnd
M = 0
for i in range(10001):
trials = nprnd.randint(2, size=30)
if (trials.sum() >= 22):
M = M + 1
p = M / 10000.0
print p # => 0.0087
# assume that not a fair coin because probability is around 0.008
# Yertle stacks turtles on top of each other.
# Recorded stacks:
# 48, 24, 32, 61, 51, 12, 32, 18, 19, 24, 21, 41, 29, 21, 25, 23, 42, 18, 23, 13
# How high can he stack the turtles?
import numpy as np
import numpy.random as nprnd
N = [48, 24, 32, 61, 51, 12, 32, 18, 19, 24, 21, 41, 29, 21, 25, 23, 42, 18, 23, 13]
xbar = []
for i in range(10001):
n = []
for x in range(len(N)):
sample = N[nprnd.randint(len(N))]
n.append(sample)
xbar.append(np.mean(n))
print np.mean(xbar), np.std(xbar)
# => 28.8561593841 2.8927852906
# Estimated height is 29 +- 3 turtles.
# We have a data set of As with average of 73.5
# We have a data set of Bs with average of 66.9
# => Difference is 6.6, is it the statistically significant?
# Cross validation
# XY coordinate points, set A
# take random points from the set to create B
# find best model for A, RMS
# find best model for B, RMS
# swap the models, check average RMS after
TODO:
- Bayesian Methods for Hackers
- Statistical thinking for Data Science
- Statistics is easy by Shasha and Wilson
- https://en.wikipedia.org/wiki/Nelson_rules