I am not an engineer
Must read:Is there a real future in data analysis for self-learners without a math degree?

Inference for Multiple Linear Regression

If you see an error in the article, please comment or drop me an email.

Inference for Multiple Linear Regression

#Load the data
cognitive <- read.csv("http://bit.ly/dasi_cognitive")

Let us start with the full model, thus including all variables:

#Fit the full model and show the summary
cog_full <- lm(kid_score ~ mom_hs + mom_iq + mom_work + mom_age, data = cognitive)
cog_full_summary <- summary(cog_full)
print(cog_full_summary)
## 
## Call:
## lm(formula = kid_score ~ mom_hs + mom_iq + mom_work + mom_age, 
##     data = cognitive)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -54.045 -12.918   1.992  11.563  49.267 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 19.59241    9.21906   2.125   0.0341 *  
## mom_hsyes    5.09482    2.31450   2.201   0.0282 *  
## mom_iq       0.56147    0.06064   9.259   <2e-16 ***
## mom_workyes  2.53718    2.35067   1.079   0.2810    
## mom_age      0.21802    0.33074   0.659   0.5101    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.14 on 429 degrees of freedom
## Multiple R-squared:  0.2171, Adjusted R-squared:  0.2098 
## F-statistic: 29.74 on 4 and 429 DF,  p-value: < 2.2e-16

Hypothesis testing for models

Null Hypothesis: beta_1 = beta_2 = beta_3 = ... = 0

Alternative hypothesis: at least one beta != 0

How to interpret the result of the F-Test

If p-value of the F-test is lower than the significance level (.05), then the full model is significant. However, this does not mean that the model fits the data well. It only means that at least one of the betas is non-zero.

If the p-value exceeds the significance level, it means that the combination of the variables does not yield a good model. Certain individual variables, even among those included in the model, might still be good predictors of y.

Hypothesis testing for slopes

Null hypothesis: beta_1 = 0 when all other variables are included in the model

Alternative hypothesis: beta_1 != 0 when all other variables are included in the model

The difference in DFs between SLR and MLR

Note that the degrees of freedom (DF) are to be calculated differently in multiple linear regression (MLR) than in Single Linear Regression (SLR):

MLR: df = n - k - 1

SLR: df = n - 1 - 1