Alex D'Amour
October 31, 2014
Standard deviation is a property of any RV.
Standard error is the standard deviation of an estimator. Confidence intervals are based on standard errors.
\[ X_i \sim N(\mu, \sigma^2); \quad i = 1, \cdots, N. \]
SD of each \( X_i \): \( \sigma \).
SE of sample mean estimator for \( \mu \): \[ SD(\bar X) = \sigma/\sqrt{N} \].
Observed | Unobserved | |
---|---|---|
Observable | \[ Y, X, X^{rep} \] | \[ Y^{rep}, Z^{rep} \] |
Unobservable | \[ \theta \] |
Goal: Use observed quantities to learn about unobserved quantities and made decisions.
Mathematically, apply procedures to the observed data, \( g(Y, X, X^{rep}) \).
Model-based approach is the traditional statistical approach based on probability models of data generation. Derive procedures based on optimality or robustness with repsect to data generation model.
Algorithmic approaches treat data generation as a black box. Want to replicate relationships and patterns found in the data. Often based on geometry of raw data.
Common ground: Prediction. Both model-based and algorithmic approaches give estimates for \( Y^{rep} \).
Always go with what predicts better in testing?
Science or Engineering? Do you care about the system's inner workings?
Predict alike or predict different? Do you want your conclusions need to be portable to slightly different data streams? What about \( Z^{rep} \)?
Uncertainty? Model-based approaches give uncertainty estimates. Many (but not all) algorithmic approaches do not.
Can often find equivalent statistical interpretation of algorithmic approaches.
In traditional modeling, the flow usually goes like:
Model \( \rightarrow \) Estimation \( \rightarrow \) Prediction or Decision
For example,
\( X_i \sim N(\mu, \sigma^2) \) \( \rightarrow \) \( \bar X \) \( \rightarrow \) \( 0 \in \bar X \pm 2.54\sigma/\sqrt{N} \)?
Algorithmic approaches compress to one step. SVM?
Principles do cross over between model-based inference and algorithmic approaches. In my opinion, easier to interpret in model-based approach.
Bias/Variance Tradeoff. Simple/inflexible vs complex/sensitive. Which variation is important? How much can we explain with the given data size?
Regularization/Shrinkage. Pull raw estimates toward a value. Alternatively, turn \( p \) into \( N \). Sharing information between units, for example by assuming they all come from the same distribution.
Calibration. Implied probabilities match actual probabilities.
Which poll do you trust?
Sample Size | Democrat | Republican | p-value |
---|---|---|---|
100 | 75% | 25% | 3.8 E -9 |
10,000 | 52.5% | 47.5% | 2.8 E -8 |