Clinical Relevance and Confusion About the P-value

What do statistical significance and clinical relevance actually mean? To be clear, that the American Journal of Periodontology has decided to accept another educational paper for clarifying common misconceptions is of course a good idea. The journal is mainly read by practicing periodontists who may not be so much familiar with statistics in general. What Chambrone and Armitage actually deliver (the accepted paper has just gone online in JOP) is, however, disappointing. Most scientists will probably stop reading after the first sentence of the second paragraph. Kannste vergessen.

It has been unmistakably demonstrated that statistically differences (e.g., P-value < 0.05) are more likely to be detected with large sample sizes compared to small ones.1-3,11

For instance, given you have a randomly selected sample from an underlying population, the variable in question has a Normal distribution and you know the population standard deviation, you can calculate the one-sample z test statistic as


i.e. the difference between the sample mean and the hypothesized population mean divided be the population standard deviation by the square root of n, the sample size. The z score is referred to respective tables or statistical software to get the p-value. The larger z, the lower is p. It is clear that, with increasing n, z will increase. So, with increasing n, p will become smaller. Inevitably.

But this awkward sentence is not the only concern in the paper’s introduction. The authors claim,

A P-value < 0.05 merely means that the finding might have occurred by chance 5% of the time.

No word about the null hypothesis. It is actually well-known that the p-value’s meaning is largely obscure among clinicians and considerable source for misinterpretation. That fact has even prompted at least one scientific journal to ban it (or rather the entire concept of hypothesis testing).

But what does a p-value actually mean? A common interpretation is that the p-value is the probability of obtaining a result equal to or “more extreme” than what was actually observed, assuming that the null hypothesis is true. Conventionally, a p-value of, or less than, 0.05 may signal what is called statistical significance.

What is not is, for instance, that p is the probability that the null hypothesis is true or that it is the probability of falsely rejecting the null hypothesis. Nor that the finding observed might have occurred by chance 5% of the time.

In their educational paper on the important difference between statistical significance and clinical relevance, Chambrone and Armitage fail to entertain a fundamental issue in planning any randomized clinical trial, i.e. calculating a proper sample size. Hence, one has to consider a treatment effect which is reasonably clinically relevant and which should then be shown to be statistically significant as well. Minimum sample size calculation also prevents over-ambitious scientists from randomizing unnecessarily large numbers of patients who volunteer in a trial with certain known and unknown risks. It is unfortunate that Chambrone and Armitage do not mention that indispensable issue of any RCT.

They rightfully stress, though, the close connection between clinical relevance and patient-centered outcome variables, and the number of teeth turning to clinical periodontal health being more important than, for instance, any pocket depth reduction or clinical attachment gain.  On the other hand, a 2 mm pocket reduction is in fact clinically relevant, as respective pockets, still with slightly increased probing depths, may be easily maintained. A 0.5 mm reduction is of course not, in particular when considerable costs for a new product emerge. In the search for evidence for claimed effects of, for instance, adjunctive topical antibiotics, low-dose tetracycline derivatives, laser therapy; but also certain regenerative treatments, clinicians have often been bedazzled with net effects of about or less than 1 mm mean pocket depth reduction and/or clinical attachment gain. The Journal of Periodontology had recently dedicated a supplement to the proceedings of a workshop on all kinds of regenerative therapy. Respective systematic reviews generally emphasize statistically significant effects beyond what can be achieved after conventional therapy, which may largely be regarded clinically irrelevant. It might even be assumed that the AAP’s intention was to shift the standard of care toward expensive regenerative treatments despite the fact that real breakthroughs are still largely missing after decades of application. Conflict of interest declarations in the respective papers indicate that most authors were heavily related to the companies whose products they had reviewed as regards their performance in clinical settings.

19 January 2016 @ 11:31 am.

Last modified January 19, 2016.




One comment

  1. Marc Charms

    I think the author of this post (Muller) should study more Periodontics, and not only statistics…it is very easy to talk about someone else`s paper without accounting for the clinical implications of a study…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s