I am not an engineer
Must read:Is there a real future in data analysis for self-learners without a math degree?

Inference for Multiple Linear Regression

If you see an error in the article, please comment or drop me an email. Inference for Multiple Linear Regression #Load the data cognitive <- read.csv("http://bit.ly/dasi_cognitive") Let us start with the full model, thus including all variables: #Fit the full model and show the summary cog_full <- lm(kid_score ~ mom_hs + mom_iq + mom_work +

Multiple Linear Regression

If you see an error in the article, please comment or drop me an email. Conditions for multiple linear regression linear relationship between each (numerical) explanatory variable and the response – checked using scatterplots of y vs. each x, and residuals plots of residuals vs. each x nearly normal residuals with mean 0 – checked using a

Model Selection

If you see an error in the article, please comment or drop me an email. Scott Zeger: “a model is a lense through which to look at your data”. George Box: “All models are wrong, some are useful.” Collinearity and parsimony Collinearity: a high correlation between two independent variables such that the two variables contribute

Linear Regression Intro

If you see an error in the article, please comment or drop me an email. The basics of linear regression Linear regression is one form of regression among others. It is probably the most intuitive and easiest one. The reason for regression is to 1) predict values for which there are no observed values and

Comparing Categorical Variables

If you see an error in the article, please comment or drop me an email. Introduction When Do We Test for Goodness of Fit (GOF)? A goodness-of-fit test is a one variable Chi-square test. The goal of a Chi-square goodness-of-fit test is to determine whether a set of frequencies or proportions is similar to and

Analysis of Variance (ANOVA)

If you see an error in the article, please comment or drop me an email. Three Conditions for using ANOVA Homogeneity of variances in each group sd_1 <- 64.43 sd_2 <- 38.63 sd_3 <- 52.24 sd_4 <- 64.90 sd_5 <- 54.13 sd_6 <- 48.84 sds <- c(sd_1,sd_2,sd_3,sd_4,sd_5,sd_6) sds_ratio <- round(min(sds)/max(sds),2) print(sds_ratio) ## [1] 0.6 ifelse(sds_ratio>=.5

Proportions

If you see an error in the article, please comment or drop me an email. Conditions for near normality of the distribution of sample proportions? 1 observations are independent 2 sample size: np >= 10 and n (1 – p) >= 10 Proportion inference in a nutshell Let’s say we are interested in the proportion

Bootstrapping

If you see an error in the article, please comment or drop me an email. The basic bootstrap principle uses observed data to construct an estimated population distribution using random sampling with replacement. Sample –> samplings –> estimated distribution Steps of bootstrapping 1 Take a bootstrap sample (random sample with replacement, of the same size

Power and sample size

If you see an error in the article, please comment or drop me an email. You can calculate the required sample size for a targeted level of power… …or the obtained power with a given sample size. Calculate the obtained power with a given sample size 1) Hypotheses H_0 : mu_diff = 0 H_A :

R Programming

This is a set of flashcards based on the introductory course on R Programming offered by John Hopkins University on Coursera. See the flashcards on Studyblue… First week Second week Third week Fourth week …or get them in CSV and RDA format here on GitHub