The Big Picture

Alex D'Amour
October 31, 2014

HW Note: SD vs SE

Standard deviation is a property of any RV.

Standard error is the standard deviation of an estimator. Confidence intervals are based on standard errors.

Example:

\[ X_i \sim N(\mu, \sigma^2); \quad i = 1, \cdots, N. \]

SD of each \( X_i \): \( \sigma \).

SE of sample mean estimator for \( \mu \): \[ SD(\bar X) = \sigma/\sqrt{N} \].

Inference in a Nutshell

	Observed	Unobserved
Observable	\[ Y, X, X^{rep} \]	\[ Y^{rep}, Z^{rep} \]
Unobservable		\[ \theta \]

\( Y \) are outcomes, \( X \) are predictors, \( X^{rep} \) are future predictors.
\( Y^{rep} \) is future data of the same type as training.
\( Z^{rep} \) is future data of a different type.
\( \theta \) is a parameter.

Goal: Use observed quantities to learn about unobserved quantities and made decisions.

Mathematically, apply procedures to the observed data, \( g(Y, X, X^{rep}) \).

Model-Based vs Algorithmic

Model-based approach is the traditional statistical approach based on probability models of data generation. Derive procedures based on optimality or robustness with repsect to data generation model.

Interpretable, encode intuition about system.

Algorithmic approaches treat data generation as a black box. Want to replicate relationships and patterns found in the data. Often based on geometry of raw data.

Straight to result without intermediate explanation, require little intuition.

Points of Comparison

Common ground: Prediction. Both model-based and algorithmic approaches give estimates for \( Y^{rep} \).

Always go with what predicts better in testing?

Science or Engineering? Do you care about the system's inner workings?

Predict alike or predict different? Do you want your conclusions need to be portable to slightly different data streams? What about \( Z^{rep} \)?

Uncertainty? Model-based approaches give uncertainty estimates. Many (but not all) algorithmic approaches do not.

Google Flu

alt text http://www.sciencemag.org.ezp-prod1.hul.harvard.edu/content/343/6176/1203/F2.large.jpg

Equivalences

Can often find equivalent statistical interpretation of algorithmic approaches.

In traditional modeling, the flow usually goes like:

Model \( \rightarrow \) Estimation \( \rightarrow \) Prediction or Decision

For example,

\( X_i \sim N(\mu, \sigma^2) \) \( \rightarrow \) \( \bar X \) \( \rightarrow \) \( 0 \in \bar X \pm 2.54\sigma/\sqrt{N} \)?

Algorithmic approaches compress to one step. SVM?

Principles

Principles do cross over between model-based inference and algorithmic approaches. In my opinion, easier to interpret in model-based approach.

Bias/Variance Tradeoff. Simple/inflexible vs complex/sensitive. Which variation is important? How much can we explain with the given data size?

Regularization/Shrinkage. Pull raw estimates toward a value. Alternatively, turn \( p \) into \( N \). Sharing information between units, for example by assuming they all come from the same distribution.

Calibration. Implied probabilities match actual probabilities.

Bias/Variance and Sources of Variation

Overly simple models leave true variation unexplained – real data appear more variable than expected.
Can compensate for excess variation by adding more sources of variation to model.
Duality: Unexplained variability looks like bias; Explained variability contributes variance.
Modeling gives intuitive ways of bridging.

Regularization

Which poll do you trust?

Sample Size	Democrat	Republican	p-value
100	75%	25%	3.8 E -9
10,000	52.5%	47.5%	2.8 E -8

Regularization

alt text

Calibration

http://nbviewer.ipython.org/github/cs109/2014/blob/master/labs/Lab8_Notes.ipynb