In a recent commentary in the Journal of Periodontology, Merchant and Josey (2016) had suggested directed acyclic graphs to better comprehend the partly conflicting results from randomized controlled trials (RCT) on diabetic control after periodontal treatment in diabetic patients. In particular the influence of obesity caught their attention.
As a matter of fact, a remarkable number of systematic reviews (whose varying quality have recently been reviewed in at least two further SRs of SRs) have shown that numerous small-scale, single-center, often poorly designed RCTs had shown that the marker for diabetic control, HbA1c, might be reduced by, say 0.4% 3 months after in essence non-surgical periodontal therapy. The only large-scale, multi-center trial (DPTT) by Engebretson et al. (2013) couldn’t confirm that, though, which sparked harsh criticism of a large number of our thought leaders. A professor in the Department of Epidemiology and Biostatistics at the University of South Carolina, Columbia, Dr. Anwar Merchant himself had written a letter to the editors of JAMA pointing first to the fact that most participants in the paper by Engebretson et al. were utterly obese. He had further noticed that, “[i]n RCTs conducted among mostly nonobese individuals, periodontal treatment has been shown to reduce systemic inflammation2,4 and improve glycemic control among those with type 2 diabetes.2 However, periodontal treatment has not been shown to affect glycemic control in RCTs conducted among predominantly obese individuals with type 2 diabetes.1,3”
Obesity is positively correlated with inflammatory markers in the blood and strongly related to insulin resistance and metabolic dysregulation mediated by chronic systemic inflammation.5 These findings, taken together with results from RCTs evaluating the effects of periodontal treatment, suggest that the lack of effect of periodontal treatment on glycemic control observed in the study by Engebretson et al may be attributed to the high level of obesity in the study population. Therefore, the findings may be generalizable only to predominantly obese populations with type 2 diabetes.
Well, Americans in general and American diabetics in particular are mainly utterly obese, see the picture above comparing height and weight of middle-aged men in various countries. It is also true that not all type 2 diabetics are obese. Engebretson et al. (2014) mention, in their specific response to these comments, that,
[s]ubgroup analyses of different BMI cut points found no effect of periodontal therapy on glycemic control (P>.10) in any subgroup examined. Differences in baseline periodontitis levels, use of antibiotics, and race/ethnicity may also explain the inconsistent findings among studies. (Emphasis added.) 
“We found no effect” (when p>0.1) is, of course, an utterly wrong interpretation of the p-value. It is in fact more than stunning that a measure of statistical evidence which is demanded in all scientific papers is hardly interpreted correctly. As Goodman (2008) explains,
[the p-value’s] interpretation is made extraordinarily difficult because it is not part of any formal system of statistical inference. As a result, the P value’s inferential meaning is widely and often wildly misconstrued, a fact that has been pointed out in innumerable papers and books appearing since at least the 1940s.
But what does the p-value actually mean?
As everybody who is involved in clinical research should know, p means (complicated enough) the probability of getting the results you did (or more extreme results) given that the null hypothesis is true. In particular, it does not mean that there is a less than 5% chance (if p<0.05) that the null hypothesis is true, or that there is a less than 5% chance of a Type I error, or that there is a more than 95% chance that the results would replicate if the study was repeated. Or, if p>0.05, that there is no difference between groups (sic). Or that you have proved your experimental hypothesis (again, if p<0.05) . So, p>0.10 (sic) when comparing obese with not obese participants in the DPTT is conspicuous, beyond question. But this is a (forced by the letter writer) secondary analysis of the data and giving p is then anyway unsubstantiated.
The false discovery rate
The most common interpretation of p (if less than 0.05) is that a “significant” difference between two groups has been discovered or that “the null hypothesis is to be rejected.” As regards the latter, celebrated pharmacologist and bio-statistician David Colquhoun cautions, in a recent review article (or rather polemic when considering the diction throughout ), that one errs in at least about 30% of times (if p was 0.05), the false discovery rate (“you make a fool of yourself”). As regards the former, he strongly objects to using the word “significant” at all but indicates that if one wishes to keep the false discovery rate below 5% (note the above mentioned misconception of p), one needs to use a three-sigma rule or insist on p<0.001.
The false discovery rate corresponds to false positives of, in a way more familiar, diagnostic tests. While Colquhoun elaborates the nonsense of screening for mild cognitive impairment for forecasting Alzheimer’s (with sensitivity of a diagnostic test of 80% and specificity 95%, pointing to commonly used corresponding power and type I error in hypothesis testing), let’s consider a hypothetical example which I had entertained for some time in my teaching . Suppose,
for a life-threatening disease with a high risk for transmission (under certain circumstances) very high sensitivity (e.g., 99%), and even higher specificity (e.g., 99.9%) is required for a useful diagnostic test. Consider the following: The prevalence, in a certain population, of individuals infected with a certain virus is, say, 0.1%. By applying a test with the above performance (99% sensitivity, 99.9% specificity), only 2 x 10-6% of negatively tested individuals are in fact infected by the virus. This false-negative rate may be ignored. On the other hand, about 50% of individuals who tested positive are actually not infected – a rather high false-positive rate! Interpretation: The test is useful, since virtually all individuals infected with the virus are identified. Preventive and therapeutic measures can immediately be implemented and the population at large protected. As a consequence of the high false-positive rate, persons who tested positive must be retested by a better, usually more expensive test, to reduce the false-positive rate.
The false-positive rate of a diagnostic test is equivalent to the false discovery rate of a significance test. If, as outlined in Colquhoun’s example, the prevalence of real effects is 10% (0.1), the power of the test (commonly assumed) 0.8 (“specificity”), and the significance level 0.05 (1-“sensitivity”), one would get 36% false positive results, much more than the above misconception, widely reported when asking scientists, supposes namely that there is a less than 5% chance (if p<0.05) that the null hypothesis is true.
So, what to do?
How to reduce this shocking rate? First, altering the “significance level” would do. Scientists in genome-wide association studies have learned that lesson some time ago when it turned out that most of their “discoveries” were actually false. It is now set at p<5 x 10-8. Keeping p<0.001 would actually reduce the false-negative rate to more comfortable 1.8% in Colquhoun’s above example.
There is another undesired effect, as Colquhoun outlines, the “inflation effect”. The estimated effect size in underpowered studies turns out to be much larger than its true value. That is because the test is more likely to be positive in the small number of experiments that show larger than average effect size. So, the chance of making a fool of oneself increases enormously when experiments are underpowered which was probably the case in numerous of the above mentioned small, single-center, poorly conducted clinical trials which had actually shown a small and “significant” reduction of HbA1c, marker of diabetic control, after nonsurgical periodontal therapy. Engebretson’s large, multicenter study has set the record straight, I suppose .
 As readers may have noticed, I have utterly criticized bigotry and the attempt of the vast majority of our “thought leaders” to discredit the large, multi-center trial by Engebretson et al. (2013) simply because of yielding unwelcome results, see, for example, here, here and here. When having been invited to an evening seminar arranged for by the Berlin Society of Periodontology earlier this year, I had taken the opportunity to discuss two remarkable examples of serious academic misconduct in Periodontology in the last 15 years when scientists reporting on unwelcome findings had to be silenced, see here.
 When having criticized an utterly wrong interpretation of p in a recent commentary by Chambrone and Armitage (2016) in the Journal of Periodontology, a commentator advised me, “the author of this post (Muller) should study more Periodontics, and not only statistics…it is very easy to talk about someone else`s paper without accounting for the clinical implications of a study…”.
 Early in the introduction of his paper, Colquhoun quotes from his 1971 lectures in biostatistics, “the function of significance tests is to prevent you from making a fool of yourself, and not to make unpublishable results publishable.” Well said. (He concedes that nowadays “one appreciates better the importance of publishing all results, whether negative or positive.”) But “making a fool of yourself” becomes a major issue of his current review on the false discovery rate, a little bit too much for an unprepared audience who doesn’t know better, in particular as a p value of less than 0.05 is still commonplace in science.
 This seems not to be clear to most of our disappointed thought leaders. Professor Kocher at Greifswald University has, on various occasions, see for example here, tried to “dissect” flaws in the study by Engebretson et al. (2013). In the short clip below he expressed satisfaction with the small studies before that by Engebretsson. So, when all is nice, according to Kocher after having reviewed several of small, single-center and rather problematic studies (Engebretson and Kocher explicitly state that in their systematic review of 2013), why should then anybody bother when further research would only challenge our current view? Professor Kocher “still believe[s] after the lack of any effect of periodontal treatment on diabetic control had been demonstrated by Engebretson et al. (2013)] that if inflammation is reduced, regardless by what means, that this is something positive for a patient. […] When we work on metabolic control, we do something positive for the patient.” Well, critics may note that while the first opinion is not questioned, the second is hubris (as a dentist).
I had contacted Dr. Merchant, who wrote, together with Dr. Josey, the above mentioned commentary in the Journal of Periodontology, who wants to make the point that obesity and high body mass index was the reason for not finding an effect in the study by Engebretson, so conclusions needed to be more restricted to obese diabetics. I wrote in my email,
Dear Professor Merchant,
Thanks for your thoughtful comments about the results of recent RCTs on the effects of periodontal tx on diabetes.
As you know, the DPTT study has been heavily criticized in particular by Borgnakke et al. (2014), the authors list of which contained each and every editor of our main professional journals, presidents of our scientific academies and associations, and more pundits. These authors wanted even to prevent other researchers to use the results by Engebretson et al. (2013) in further publications or grant applications,
We urge all interested parties to refrain from using this study results as a basis for future scientific texts, new research projects, guidelines, policies, and advice regarding the incorporation of necessary periodontal treatment in diabetes management. (Borgnakke et al. 2014.)
So, claiming as strength of DPTT “that it was well-conducted” may not be shared by most of our opinion leaders. But what does it make exceptional is its proper sample size. We are presently plagued with more than a dozen systematic reviews of varying quality on RCTs addressing the effects of periodontal tx on diabetic control, many of which low quality, which have now even been reviewed in at least two very recent meta-reviews (with, by the way, incongruent Amstar scores) by Botero et al. (2016) and Faggion et al. (2016). There is also an excellent Cochrane review by Simpson et al. (2015) on the topic. Most of the RCTs have insufficient sample sizes. So, please remember that “most published research findings are false” (Ioannidis 2005) which actually may be due to “the false discovery rate and the misinterpretation of p-values” (Colquhoun 2014).
The question of whether obesity (that is what is found in US American diabetics most frequently, but not so often in East Asian diabetics) may prevent measurable effects of periodontal treatment on systemic markers of CVD and diabetes has in fact been discussed before (e.g., by Munenaga et al. 2013, Borgnakke et al. 2014).
When Professor Merchand responded, he likewise stressed his believe that it is possible that periodontal treatment has a favorable impact on systemic outcomes but was not detected in the DPTT study.
“Even if the findings of the DPTT study are internally valid, they are not generalizable (or have external validity). The DPTT findings […] are only generalizable to populations with type 2 diabetes with high levels of obesity, moderate prevalence [sic] of periodontal disease, HbA1c between 7% and 9%, who receive only nonsurgical SRP. There may be other populations with type 2 diabetes , such as those with more uncontrolled diabetes, severe periodontal disease, with lower obesity prevalence, in whom periodontal treatment maybe effective in impacting HbA1c. But the DPTT study did not evaluate the question under these conditions, which is why the broad conclusion is misplaced.”
While Engebretson et al. (2013) do explicitly mention, in the Discussion section, that they cannot rule out the possibility that individuals with values outside of this range [HbA1c between 7% and 9%] might experience HbA1c reduction following periodontal treatment, they do not mention obesity. DPTT was designed to recruit a representative sample of the US type 2 diabetes population with periodontitis. As it turned out, they were utterly obese (note that obesity is associated with both type 2 diabetes and periodontitis; the latter relationship may not be causal).
17 December 2016 @ 9:27 am.
Last modified January 6, 2017.