import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as st
import seaborn as sns
A principal at a certain school claims that the students in his school are above average intelligence. A random sample of 30 students have a mean score of 112.5. Is there sufficient evidence to support the principal’s claim? The mean population IQ is 100 with a standard deviation of 15.
Note, although this problem does not sound like an example of comparing group means, it is indeed the same. Instead of comparing two sample datasets, we are comparing one sample dataset to the population mean.
Some additional questions can also be asked to understand the impact of various quantities involved,
population_mean = 100
population_std = 15
n_sample = 30
This will be considered an 'average' class i.e. not having above-average intelligence as the principal claims.
avg_class = np.vectorize(int)(np.random.normal(loc=population_mean,scale=population_std,size=n_sample))
print("A typical class I.Q.:",avg_class)
given_class = np.vectorize(int)(np.random.normal(loc=112.5,scale=population_std,size=n_sample))
print("Given class I.Q.:",given_class)
plt.figure(figsize=(7,5))
sns.kdeplot(avg_class,shade=True)
sns.kdeplot(given_class,shade=True)
plt.legend(['Average class','Given class'],fontsize=14)
plt.vlines(x=avg_class.mean(),ymin=0,ymax=0.025,color='blue',linestyle='--')
plt.vlines(x=given_class.mean(),ymin=0,ymax=0.025,color='brown',linestyle='--')
plt.show()
std_err = population_std/np.sqrt(n_sample)
z_stat = (given_class.mean()-population_mean)/std_err
print("Standard error of the mean:",std_err)
print("Z-statistic:",z_stat)
alpha = 0.05
rejection_threshold = st.norm.ppf(1-alpha)
if z_stat>rejection_threshold:
print("We reject the NULL hypothesis. The class I.Q. is indeed above average")
else:
print("We cannot reject the NULL hypothesis that class average is same as population average.")
def hypothesis_testing(n_sample=30,population_mean=100,population_std=15,alpha=0.05):
"""
Tests the hypothesis of above average I.Q. and reports the conclusion
"""
given_class=np.vectorize(int)(np.random.normal(loc=112.5,scale=population_std,size=n_sample))
std_err = population_std/np.sqrt(n_sample)
z_stat = (given_class.mean()-population_mean)/std_err
alpha = 0.05
rejection_threshold = st.norm.ppf(1-alpha)
if z_stat>rejection_threshold:
print("We reject the NULL hypothesis. The class I.Q. is indeed above average")
else:
print("We cannot reject the NULL hypothesis that class average is same as population average.")
hypothesis_testing()
hypothesis_testing(population_mean=107)
hypothesis_testing(population_std=10)
hypothesis_testing(population_std=40)
hypothesis_testing(n_sample=4)
hypothesis_testing(n_sample=100)
hypothesis_testing(alpha=0.01)
hypothesis_testing(alpha=0.001)
The Student’s t-Test is a statistical hypothesis test for testing whether two samples are expected to have been drawn from the same population.
It is named for the pseudonym “Student” used by William Gosset, who developed the test.
The test works by checking the means from two samples to see if they are significantly different from each other. It does this by calculating the standard error in the difference between means, which can be interpreted to see how likely the difference is, if the two samples have the same mean (the null hypothesis).
The t-statistic calculated by the test can be interpreted by comparing it to critical values from the t-distribution. The critical value can be calculated using the degrees of freedom and a significance level with the percent point function (PPF).
We can interpret the statistic value in a two-tailed test, meaning that if we reject the null hypothesis, it could be because the first mean is smaller or greater than the second mean.
def independent_ttest(data1, data2, alpha=0.05):
"""
Student's t-test for independent groups
Argument:
data1: First group data in numpy array format
data2: Second group two data in numpy array format
alpha: Significance level
Returns:
t_stat: Computed t-statistic
df: Degrees of freedom
cv: Critical value
p: p-value (of NULL hypothesis)
"""
import scipy.stats as st
# calculate means
mean1, mean2 = np.mean(data1), np.mean(data2)
# calculate standard errors
se1, se2 = st.sem(data1), st.sem(data2)
# standard error on the difference between the samples
sed = np.sqrt(se1**2.0 + se2**2.0)
# calculate the t statistic
t_stat = (mean1 - mean2) / sed
# degrees of freedom
df = len(data1) + len(data2) - 2
# calculate the critical value
cv = st.t.ppf(1.0 - alpha, df)
# calculate the p-value
p = (1.0 - st.t.cdf(abs(t_stat), df)) * 2.0
# return everything
return t_stat, df, cv, p
n_sample = 20
data1 = 5 * np.random.randn(n_sample) + 50
data2 = 5 * np.random.randn(n_sample) + 51
plt.figure(figsize=(7,5))
sns.kdeplot(data1,shade=True)
sns.kdeplot(data2,shade=True)
plt.legend(['data1','data2'],fontsize=14)
plt.vlines(x=data1.mean(),ymin=0,ymax=0.09,color='blue',linestyle='--')
plt.vlines(x=data2.mean(),ymin=0,ymax=0.09,color='brown',linestyle='--')
plt.show()
# calculate the t test
alpha = 0.05
t_stat, df, cv, p = independent_ttest(data1, data2, alpha)
print('t=%.3f, df=%d, cv=%.3f, p=%.3f' % (t_stat, df, cv, p))
print()
# interpret via critical value
if abs(t_stat) <= cv:
print('Fail to reject null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
# interpret via p-value
if p > alpha:
print('Fail to reject null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
The power of a binary hypothesis test is the probability that the test rejects the null hypothesis ($H_0$) when a specific alternative hypothesis ($H_a$) is true.
The statistical power ranges from 0 to 1, and as statistical power increases, the probability of making a type II error (wrongly failing to reject the null hypothesis) decreases. For a type II error probability of $\beta$, the corresponding statistical power is 1 − $\beta$.
For example, if experiment 1 has a statistical power of 0.7, and experiment 2 has a statistical power of 0.95, then there is a stronger probability that experiment 1 had a type II error than experiment 2, and experiment 2 is more reliable than experiment 1 due to the reduction in probability of a type II error.
It can be equivalently thought of as the probability of accepting the alternative hypothesis ($H_a$) when it is true—that is, the ability of a test to detect a specific effect, if that specific effect actually exists.
n_sample = 200
data1 = 5 * np.random.randn(n_sample) + 50
data2 = 5 * np.random.randn(n_sample) + 51
plt.figure(figsize=(7,5)
)
sns.kdeplot(data1,shade=True)
sns.kdeplot(data2,shade=True)
plt.legend(['data1','data2'],fontsize=14)
plt.vlines(x=data1.mean(),ymin=0,ymax=0.09,color='blue',linestyle='--')
plt.vlines(x=data2.mean(),ymin=0,ymax=0.09,color='brown',linestyle='--')
plt.show()
# calculate the t test
alpha = 0.05
t_stat, df, cv, p = independent_ttest(data1, data2, alpha)
print('t=%.3f, df=%d, cv=%.3f, p=%.3f' % (t_stat, df, cv, p))
print()
# interpret via critical value
if abs(t_stat) <= cv:
print('Fail to reject null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
# interpret via p-value
if p > alpha:
print('Fail to reject null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
f_oneway()
method)¶n_sample = 50
group1 = 5 * np.random.randn(n_sample) + 50
group2 = 5 * np.random.randn(n_sample) + 52
plt.figure(figsize=(7,5)
)
sns.kdeplot(group1,shade=True)
sns.kdeplot(group2,shade=True)
plt.legend(['group1','gropu2'],fontsize=14)
plt.vlines(x=group1.mean(),ymin=0,ymax=0.09,color='blue',linestyle='--')
plt.vlines(x=group2.mean(),ymin=0,ymax=0.09,color='brown',linestyle='--')
plt.show()
plt.figure(figsize=(7,5))
sns.boxplot(data=[group1,group2])
sns.swarmplot(data=[group1,group2],color='.2')
plt.legend(['group1','gropu2'],fontsize=14)
plt.show()
f,p=st.f_oneway(group1,group2)
print("The F-statistic obtained running ANOVA on the data groups:",f)
print("The p-value of the ANOVA test:",p)
if p>0.05:
print("\nANOVA fails to reject the hypothesis of equal mean")
else:
print("\nWe reject the hypothesis of equal mean as per ANOVA test result")
def multi_anova(groups,alpha=0.05):
"""
Two-way ANOVA between multiple groups
groups: A dictionary object of trial groups
"""
from itertools import combinations
list_anova = list(combinations(list(groups.keys()),2))
for comb in list_anova:
_,p=st.f_oneway(groups[comb[0]],groups[comb[1]])
if p>0.05:
print("\nANOVA fails to reject the hypothesis of equal mean for {} and {}".format(comb[0],comb[1]))
else:
print("\nWe reject the hypothesis of equal mean for {} and {} as per ANOVA test result".format(comb[0],comb[1]))
Let's suppose we have three versions of a blood pressure medicine developed - 'A','B','C'.
We give this medicine to four groups - three trial groups and one control group, whose participants were given placebo (being a blind experiment, they did not know). There were total 155 participants and due to some logistic reason, the divisions were slightly uneven.
No problem, ANOVA can handle slight difference in the sample count among groups!
As a data scientist, you have been asked to analyze the clinical trial data and recommend whether to go ahead with further development on any (or more than one) of these drugs.
Since they all the variations of same fundamental molecule, their impact is not drastically different and depending on trial participants physiology, there are lot of variations among the trial groups.
Let's put our multi_anova
function to use for this imaginary study!
trial_A = 9*np.random.randn(34) + 102
trial_B = 6*np.random.randn(48) + 109
trial_C = 12*np.random.randn(38) + 103
trial_CONTROL = 10*np.random.randn(35) + 110
plt.figure(figsize=(10,5))
ax=sns.boxplot(data=[trial_A,trial_B,trial_C,trial_CONTROL])
ax=sns.swarmplot(data=[trial_A,trial_B,trial_C,trial_CONTROL],color='.2')
ax.set_xticklabels(['trial_A','trial_B','trial_C','trial_CONTROL'],fontsize=15)
ax.tick_params(axis="y", labelsize=15)
plt.grid(True)
plt.show()
plt.figure(figsize=(9,5))
sns.kdeplot(trial_A,shade=True,color='Blue')
sns.kdeplot(trial_B,shade=True,color='red')
sns.kdeplot(trial_C,shade=True,color='yellow')
sns.kdeplot(trial_CONTROL,shade=True,color='black')
plt.legend(['trial_A','trial_B','trial_C','trial_CONTROL'],fontsize=14)
plt.show()
groups = {'trial_A':trial_A,'trial_B':trial_B,'trial_C':trial_C,'trial_CONTROL':trial_CONTROL}
multi_anova(groups)
From the results, printed out, we see that equal mean hypothesis (with CONTROL group) could not be rejected for trial_B, whereas the hypothesis was rejected for the cases - trial_A and trial_CONTROL, trial_B and trial_CONTROL.
Therefore, the trials of medicine A and medicine C showed statistically significant lowering of blood pressure whereas medicine B did not.