Must read:Is there a real future in data analysis for self-learners without a math degree?

## Fitting a function

If you see an error in the article, please comment or drop me an email. How to fit functions using linear models $Y_i = \beta_0 + \beta_1 X_i + \sum_{k=1}^d (x_i – \xi_k)_+ \gamma_k + \epsilon_{i}$ Simulated example Source: https://github.com/DataScienceSpecialization/courses Separate the n values into k+1 spans. (k standing for knots) Create a basis: a

## Logistic Regression

If you see an error in the article, please comment or drop me an email. Logistic regression is a generalized linear model where the outcome is a categorical variable. Logistic regression can be binomial (using binary independent variables), ordinal (if categories are ordered) or multinomial (with more than two categories). Binary Generalized Linear Models Binary

## Poisson regression

If you see an error in the article, please comment or drop me an email. Poisson regression In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its

## Inference for Multiple Linear Regression

If you see an error in the article, please comment or drop me an email. Inference for Multiple Linear Regression #Load the data cognitive <- read.csv("http://bit.ly/dasi_cognitive") Let us start with the full model, thus including all variables: #Fit the full model and show the summary cog_full <- lm(kid_score ~ mom_hs + mom_iq + mom_work +

## Multiple Linear Regression

If you see an error in the article, please comment or drop me an email. Conditions for multiple linear regression linear relationship between each (numerical) explanatory variable and the response – checked using scatterplots of y vs. each x, and residuals plots of residuals vs. each x nearly normal residuals with mean 0 – checked using a

## Model Selection

If you see an error in the article, please comment or drop me an email. Scott Zeger: “a model is a lense through which to look at your data”. George Box: “All models are wrong, some are useful.” Collinearity and parsimony Collinearity: a high correlation between two independent variables such that the two variables contribute

## Linear Regression Intro

If you see an error in the article, please comment or drop me an email. The basics of linear regression Linear regression is one form of regression among others. It is probably the most intuitive and easiest one. The reason for regression is to 1) predict values for which there are no observed values and

## Comparing Categorical Variables

If you see an error in the article, please comment or drop me an email. Introduction When Do We Test for Goodness of Fit (GOF)? A goodness-of-fit test is a one variable Chi-square test. The goal of a Chi-square goodness-of-fit test is to determine whether a set of frequencies or proportions is similar to and

## Analysis of Variance (ANOVA)

If you see an error in the article, please comment or drop me an email. Three Conditions for using ANOVA Homogeneity of variances in each group sd_1 <- 64.43 sd_2 <- 38.63 sd_3 <- 52.24 sd_4 <- 64.90 sd_5 <- 54.13 sd_6 <- 48.84 sds <- c(sd_1,sd_2,sd_3,sd_4,sd_5,sd_6) sds_ratio <- round(min(sds)/max(sds),2) print(sds_ratio) ## [1] 0.6 ifelse(sds_ratio>=.5

## Proportions

If you see an error in the article, please comment or drop me an email. Conditions for near normality of the distribution of sample proportions? 1 observations are independent 2 sample size: np >= 10 and n (1 – p) >= 10 Proportion inference in a nutshell Let’s say we are interested in the proportion