I am not an engineer
Must read:Is there a real future in data analysis for self-learners without a math degree?

Multiple Testing

If you see an error in the article, please comment or drop me an email.

Multiple Testing is all about minimizing errors due to chance. For instance, when running 20 hypotheses with an alpha level of .05, we expect to have one error, just by chance.

H_0 is true H_a is true Total
Declared significant V S R
Declared non-significant U T m – R
Total m_0 m – m_0 m

–> Note that only R and m are known

False positives and false negatives

  • Type I errors are also called False Positives, since they falsely claim a significant (=positive) result. In the table, read this as V = reject h_0 (significant) whereas h_a is actually true.

  • Type II errors are also called False Negatives, since they falsely claim a non-significant (negative) result.

  • Results declared significant are called discoveries.

  • The ratio of false discoveries is V/R

  • The expected value of the ratio of false discoveries is called the False Discovery Rate, which is equivalent to the Type I error rate

  • The False Positive Rate the expected value of the false positive ratio as expressed by V/m_0.

  • The probability of having at least one false positive is called the Family Wise Error Rate

The Bonferroni Correction

Given that we…

  • perform m tests

  • want to control the FWER at level alpha: P(V >= 1) < alpha

Therefore, we…

  • reduce alpha by dividing it by the number of tests m: alpha_fwer = alpha/m

Which are the drawbacks of this method?

It might be too conservative (high false negative rate).

The Benjamini-Hochberg Method

Given that we…

  • perform m tests

  • want to control the False Discovery Rate

Therefore, we…

  • calculate p-values as usual

  • order the p-values from smallest to largest

  • call significant any result with p_i <= alpha * (i/m)

Which are the drawbacks of this method?

It might let more false positives through and it may behave strangely if the tests aren’t independent.

Using R’s p.adjust

The p.adjust() function lets you choose the method (e.g. with the argument method=“bonferroni” or “BH”)

Other Methods

Multiple testing is an entire subfield of statistical inference. Usually a basic Bonferroni/BH correction is good enough to eliminate false positives, but if there is strong dependence between tests there may be problems. Another correction method to consider is the Benjamini, Hochberg, and Yekutieli method (BY).