First, statistical tests examine

*differences*between quantities. For instance, is the mean height of men different from the mean height of women? The

*null hypothesis*is that the two things being compared are the same.

The second thing to understand is the p-value itself. The p-value, generated by mathematical procedures we will not discuss, is the probability that your data would look the way it does if you assume the null hypothesis is true. That is, assuming that the two quantities are the same, what are the chances that you would observe what you did?

For instance, let's say you randomly select 20 men and 20 women. The mean height of the 20 men is five foot ten, and the mean height of the 20 women is five foot seven. Would your data support the claim that men (on average) are taller than women, or could the observed height difference simply be due to chance? This is what the p-value tells you. It tells you the probability of making the observations that you did, under the assumption that men and women have the same average height.

Interpreting the p value follows from the above. If the p-value is very high (e.g., 0.99), then your observations are well within the bounds of what we would expect if the null hypothesis were true. That is, your data doesn't support a rejection of the null hypothesis. Such instances of high p-values yield a

*failure to reject*the null hypothesis (for technical reasons, we typically avoid saying we

*accept*the null hypothesis).

Alternatively, if the p-value is very low (the convention is below

*0.05*), this suggests that the two quantities you are comparing are truly different, i.e., that the null hypothesis is not true. That is, it would be very unlikely to observe what you did if the two quantities were indeed the same. In this case, we say we have

*rejected the null hypothesis*. For our example, we would reject the null hypothesis that men and women are the same average height.

------------------------

For the mavens: technically the p-value is the probability of getting the measured test statistic,

*or a more extreme value.*But that is a wrinkle that makes things too complicated for this very dirty synopsis.