top of page

Missing Data

Skip the imputation

When the goal is prediction, missing data should not be imputed. This is counterintuitive, but imputing values solves the wrong problem: it is an inferential fix applied to a predictive task. Prediction only requires what is observed, and optimal strategies condition directly on observed data patterns rather than reconstructing what is missing. Here is the idea.

Prediction Stability

Are those predictions reproducible?

When predictive models guide decisions, accuracy is not enough—stability is essential. Standard metrics obscure this, masking the fact that overparameterized models can produce different predictions for the same patient due solely to randomness in optimization. As a result, models with identical performance can yield inconsistent, and potentially conflicting, treatment decisions. Here is the idea.

Regularization

Empirical bayes regularization

Regularization usually shrinks coefficients toward zero. This is convenient, but rarely scientific. When prior models are available, shrinkage should reflect existing knowledge. An empirical Bayes approach provides a principled way to pull estimates toward what is already known, improving prediction without ignoring the literature. Here is the idea.

Surroget Markers

Imperfect surroget markers make prediction modeling hard

Regularization usually shrinks coefficients toward zero. This is convenient, but rarely scientific. When prior models are available, shrinkage should reflect existing knowledge. An empirical Bayes approach provides a principled way to pull estimates toward what is already known, improving prediction without ignoring the literature. Here is the idea.

bottom of page