# Analysis of Variance (ANOVA)

If you see an error in the article, please comment or drop me an email.

# Three Conditions for using ANOVA

1. Homogeneity of variances in each group
``````sd_1 <- 64.43
sd_2 <- 38.63
sd_3 <- 52.24
sd_4 <- 64.90
sd_5 <- 54.13
sd_6 <- 48.84
sds <- c(sd_1,sd_2,sd_3,sd_4,sd_5,sd_6)
sds_ratio <- round(min(sds)/max(sds),2)
print(sds_ratio)``````
``##  0.6``
``ifelse(sds_ratio>=.5 && sds_ratio <= 2,"variances are equal","variances are unequal")``
``##  "variances are equal"``
1. Nearly normal distribution in each group

2. Independence of observations

# Used parameters in ANOVA

MSG – Mean square between groups –> describes variability between groups

MSE – Mean square error –> describes variability within groups

F = MSG/MSE = ratio of variability in the sample means relative to the variability within the groups

The F-Statistic comes with two degrees of freedom : df_g (group) and df_e (error)

df_t = n – 1 (number of samples minus one)

df_g = k – 1 (number of groups minus one)

df_e = n – k (combined sample size minus number of groups)

# Example of ANOVA

``````#HYPOTHESES
# H_0 : mu_a = mu_b = mu_c
# H_A : mu_... != mu_...
# This is necessarily one-sided as the F distribution is exclusively positive

n <- 999 # number of samples
k <- 3 # number of groups

df_t <- n - 1
df_g <- k - 1
df_e <- (n - 1) - (k - 1) #<- n - k

SSG <- 8888
SSE <- 7777

MSG <- SSG/df_g
MSE <- SSE/df_e

f_statistic <- MSG/MSE

p_value <- pf(f_statistic, df_g, df_e, lower.tail = FALSE)``````

# Multiple comparisons

### Why use multiple comparisons?

1. To check which means are different

2. To control the Type 1 Error Rate

``````# HYPOTHESIS
# H_0 : mu_lower - mu_middle = 0
# H_A : mu_lower - mu_middle != 0
# This is a two-sided test

# GIVEN DATA
n_1 <- 41 # sample size of lower class
n_2 <- 331 # sample size of middle class
n <- 792 # total sample size
df <- n - 1
null_value <- 0
mean_1 <- 5.07
mean_2 <- 6.76
alpha <- .5 # Significance level
sides <- 2

MSE <- 3.628

# Standard error for multiple pairwise comparisons
SE <- sqrt((MSE/n_1) + (MSE/n_2))

t_statistic <- (abs(mean_1 - mean_2) - null_value) / SE

# Calculate the number of comparisons
k <- 3 # number of groups
nb_comp <- (k*(k-1))/2

# Bonferroni correction of the significance level
alpha <- alpha/nb_comp

p_value <- pt(t_statistic,df=df, lower.tail=FALSE) * sides``````