Hypothesis Testing

Hypothesis testing is an important concept in science and especially in computational genomics. You are probably already familiar with it. In general, hypothesis testing is used to determine whether a certain effect is significant given a sample from a population. The p-value we get from a hypothesis test measures how likely it is that an effect equal or more extreme than the one observed might be measured given the null hypothesis. In general, if the p-value is less than 0.05 we say that the effect is significant; however, we may adjust this value if we are concerned about the confounding effect of multiple hypothesis testing.

We will cover these topics in more detail in class. Below is an example of how to do a standard unpaired t-test using scipy.stats. The unpaired t-test is typically used to determine whether two samples of data are significantly different from each other under t-distributions, when the two samples are assumed to be from different, unrelated populations. On the other hand, we might use a paired t-test if we were taking samples at different times from the same population.

The formula for the unpaired two sample t test is as follows (assuming the sample sizes are equal and the populations have the same variance): Where denotes the sample mean of , is the sample variance over both populations, and n is the sample size.

# example adapted from http://iaingallagher.tumblr.com/post/50980987285/t-tests-in-python
from scipy.stats import ttest_ind

female = [63.8, 56.4, 55.2, 58.5, 64.0, 51.6, 54.6, 71.0]  # weights (kg) of group of elderly women
male = [75.5, 83.9, 75.7, 72.5, 56.2, 73.4, 67.7, 87.9]  # weights (kg) of group of elderly men

result = ttest_ind(male, female)
statistic = result[0]
p_value = result[1]

print "t-statistic:", statistic
print "p-value:", p_value

# assuming unequal population variances; run Welch's t-test
result_diff_var = ttest_ind(male, female, equal_var=False)
statistic = result_diff_var[0]
p_value = result_diff_var[1]

print "t-statistic:", statistic
print "p-value:", p_value

From the results here you can see that the difference in weights between the two population samples is significant, albeit perhaps barely less significant if we assume that the variances of the populations are different.

results matching ""

    No results matching ""