What is Statistical Evidence?

Statistical Evidence

Three Questions of Statistical Inference

Data. Data are statistical evidence.

 

The degree to which the data support one hypothesis over another is called the strength of statistical evidence. Virtually all assessments of the strength of the evidence in a given set of data are model dependent. Because of this, a proper comparison of evidential assessments can only occur when the underling model is the same. This may seem obvious, but it is sometimes hard to disentangle an evidential metric with its modeling strategy.

Statistical inference is comprised of assessing the evidence (analyzing data), determining belief (do the results this make sense?), and choosing an action (publish, collect more data, pretend it never happened). Each of these steps is an equally important part of the scientific process. Royall (1997) framed this process as answering a series of questions:

  1. What do these data say?

  2. What should I believe, now that I have seen these data?

  3. What should I do, now that I have seen these data?

Royall's point was that the answers to the last two questions never substitute for the answer to the first. It is true that the answers to the last two questions depend on the answer to the first, but they also depend on additional factors such as prior beliefs and potential gains or losses which may, or may not, overwhelm the evidence at hand. And while there are formal mathematical frameworks for answering the second question (Bayesian Inference) and the third question (Decision Theory), there is no such framework for answering the first.

The problem, of course, is that the majority of scientific activity - reporting and interpreting data as scientific evidence - falls under the first question. Although there is no generally accepted mathematical framework for conducting an evidential analysis, there are three key metrics that every framework should incorporate: 

  1. A measure of the strength of evidence in given body of observations

  2. The probability that the measure (#1) will be misleading in given setting

  3. The probability that an observed measure - one computed from observed data - is mistaken

 

The first is the scale of evidence, the second is the error rate, and the third is the false discovery rate. Once these metrics are clearly defined, a comparison of evidential frameworks easily follows.

What is the goal?