Distributions and probability
If you see an error in the article, please comment or drop me an email.
Normal distribution
Randomized sample of independent and identically distributed variables.
Normal model can be used for sampling distributions if

sample size > 30

independent probabilities

randomized
F distribution
Ratio of the mean squares of n1 and n2 independent standard normals.
MSG – Mean square between groups –> describes variability between groups
MSE – Mean square error –> describes variability within groups
F = MSG/MSE = ratio of variability in the sample means relative to the variability within the groups
The FStatistic comes with two degrees of freedom : df1 and df2
df1 = k – 1 (number of groups minus one)
df2 = n – k (combined sample size minus number of groups)
f_check < function () {
print("ANOVA CONDITIONS:")
print("1) Observations are independent within and across groups > random sample of 10% or less of the population")
print("2) Data within each group are nearly normal > normal probability plto for each group")
print("3) Variability across the groups is about equal > compare sds/vars of the different groups")
}
Geometric distribution
The first success in n Bernoulli trials
Binomial distribution
The kth success in the nth trial
Check whether the distribution is binomial
binom_check < function () {
print("1) The trials are independent")
print("2) The number of trials *n* is fixed")
print("3) Each trial outcome can be classified as a *success* or *failure*")
print("4) The probability of success *p* is the same for each trial")
}
Obtain the probability of k successes in n trials at probability p:
binom_probability < function (n=0,k=0,p=0) {
if (sum(c(n,k,p))==0) {
print("You need to specify n trials, k successes and p probability")
print("FORMULA : choose(n,k)*p^k*(1p)^(nk)")
} else {
choose(n,k)*p^k*(1p)^(nk)
}
}
Obtain the mean of a binomial distribution:
binom_mean < function (n=0,p=0) {
if (sum(c(n,p))==0) {
print("You need to specify n trials and p probability.")
print("FORMULA : Mean = n * p*")
} else {
print(n*p)
}
}
Obtain the standard deviation of a binomial distribution:
binom_sd < function (n=0,p=0) {
if (sum(c(n,p))==0) {
print("You need to specify n trials and p probability.")
print("FORMULA : sigma = sqrt(n*p*(1p))")
} else {
print(sqrt(n*p*(1p)))
}
}
Negative binomial distribution
Check whether the distribution is negative binomial
nbinom_check < function () {
print("1) The trials are independent")
print("2) Each trial outcome can be classified as a *success* or *failure*")
print("3) The probability of success *p* is the same for each trial")
print("4) The last trial is a success")
}
Obtain the probability of the kth success in n trials at probability p:
nbinom_probability < function (n=0,k=0,p=0) {
if (sum(c(n,k,p))==0) {
print("You need to specify k successes at the nth trial, at p probability")
print("FORMULA : choose(n1,k1)*p^k*(1p)^(nk)")
} else {
print(choose(n1,k1)*p^k*(1p)^(nk))
}
}
Poisson distribution
The Poisson distribution is useful for estimating the number of events in a large population over a unit of time.
Check whether it is a Poisson distribution:
pois_check < function () {
print("1) We are looking for the number of events (=successes)")
print("2) The population is large")
print("3) Events occur independently from each other")
}
Obtain Poisson probability:
pois_probability < function (lambda=0,k=0) {
if (sum(c(lambda,k))==0) {
print("You need to specify lambda and k successes")
print("Note the difference between pois_probability and ppois! pois_probability provides probability for EXACTLY k successes, whereas ppois provides probability of k or less successes (=chunk of the distribution)")
print("FORMULA : ((lambda^k)*exp(1)^(1*lambda))/factorial(k)")
} else {
print(((lambda^k)*exp(1)^(1*lambda))/factorial(k))
}
}