Analysis of Variance (ANOVA)
If you see an error in the article, please comment or drop me an email.
Three Conditions for using ANOVA
 Homogeneity of variances in each group
sd_1 < 64.43
sd_2 < 38.63
sd_3 < 52.24
sd_4 < 64.90
sd_5 < 54.13
sd_6 < 48.84
sds < c(sd_1,sd_2,sd_3,sd_4,sd_5,sd_6)
sds_ratio < round(min(sds)/max(sds),2)
print(sds_ratio)
## [1] 0.6
ifelse(sds_ratio>=.5 && sds_ratio <= 2,"variances are equal","variances are unequal")
## [1] "variances are equal"

Nearly normal distribution in each group

Independence of observations
See more at https://statistics.laerd.com/statisticalguides/onewayanovastatisticalguide2.php
Used parameters in ANOVA
MSG – Mean square between groups –> describes variability between groups
MSE – Mean square error –> describes variability within groups
F = MSG/MSE = ratio of variability in the sample means relative to the variability within the groups
The FStatistic comes with two degrees of freedom : df_g (group) and df_e (error)
df_t = n – 1 (number of samples minus one)
df_g = k – 1 (number of groups minus one)
df_e = n – k (combined sample size minus number of groups)
Example of ANOVA
#HYPOTHESES
# H_0 : mu_a = mu_b = mu_c
# H_A : mu_... != mu_...
# This is necessarily onesided as the F distribution is exclusively positive
n < 999 # number of samples
k < 3 # number of groups
df_t < n  1
df_g < k  1
df_e < (n  1)  (k  1) #< n  k
SSG < 8888
SSE < 7777
MSG < SSG/df_g
MSE < SSE/df_e
f_statistic < MSG/MSE
p_value < pf(f_statistic, df_g, df_e, lower.tail = FALSE)
Multiple comparisons
Why use multiple comparisons?

To check which means are different

To control the Type 1 Error Rate
# HYPOTHESIS
# H_0 : mu_lower  mu_middle = 0
# H_A : mu_lower  mu_middle != 0
# This is a twosided test
# GIVEN DATA
n_1 < 41 # sample size of lower class
n_2 < 331 # sample size of middle class
n < 792 # total sample size
df < n  1
null_value < 0
mean_1 < 5.07
mean_2 < 6.76
alpha < .5 # Significance level
sides < 2
MSE < 3.628
# Standard error for multiple pairwise comparisons
SE < sqrt((MSE/n_1) + (MSE/n_2))
t_statistic < (abs(mean_1  mean_2)  null_value) / SE
# Calculate the number of comparisons
k < 3 # number of groups
nb_comp < (k*(k1))/2
# Bonferroni correction of the significance level
alpha < alpha/nb_comp
p_value < pt(t_statistic,df=df, lower.tail=FALSE) * sides