Multilevel Modeling of Periodontal Data (I)

As promised, below comes the first chapter of a manual for MLwiN which uses exclusively my own periodontal data. Much of this has been published over the years. As you may see on the list of contents in the beginning of the pdf below, this is still work in progress but those who are interested (and in possession of the software) may now contact me via email for getting access to respective EXCEL files and may go through respective analyses.

The manual is very much based, but yet not exhaustive, on a respective Users’s Guide to MLwiN by the Centre of Multilevel Modelling at Bristol University which was written by late Professor J. Rasbash and his coworkers in 2012. So, until the current work has been finished, the more interested new applicant of multilevel modeling is referred  to Rasbash et al. (2012). Anyway, I hope that what I have written so far is useful. And, since more chapters are being uploaded soon, stay tuned!



Manual version 2014.1

Hans-Peter Müller

1 Introduction

How clinically collected data are properly statistically analyzed very much depends on its structure. Periodontal and other dental data are usually manifold observations which are made in one oral cavity. For instance, in order to describe the overall periodontal situation in a certain cohort, (i) sites (or gingival units) around (ii) teeth within (iii) patients or subjects are considered by using metric, ordinal, or binary variables. Then, observations may be (iv) repeated in a longitudinal way. This is a typical hierarchical situation with lower (occasions, sites) and upper levels (teeth, subjects). In clinical trials, a further (higher) level is present when patients are assigned to different centers.

A suitable armamentarium for the study of fixed (estimates of covariates) and random effects (variances and covariances) is multilevel modeling which has been applied to dental research data for long (Sterne 1988, Albandar and Goldstein 1992, Gilthorpe et al. 2000). Whereas the methods are well-known and have now been implemented in major statistical software packages such as SAS, STATA, R, even SPSS (and many others; for a comprehensive review of software programs and packages that are designed or can be used for multilevel analyses see de Leeuw and Kreft (2001)), major and somewhat revealing obstacles for applying them has long been at least twofold: a perceived (by clinicians) unwillingness of common biostatisticians to make themselves familiar with the more sophisticated methods of multilevel modeling, which are otherwise rarely used in medicine; and the simple fact that their application by clinical scientists, if not of most other statistical methods (Tu and Gilthorpe 2012), is vehemently discouraged by some biostatisticians.

The easy-to-apply special software MLwiN has been developed more than a decade ago, and the program has been applied in a considerable number of papers in dentistry; see, for instance, Gilthorpe et al. (2000), Ciantar et al. (2005), Müller (2008, 2009a), Müller et al. (2006), Müller and Stadermann (2006), Müller and Barrieshi-Nusair (2010), Tomasi et al. (2007), Fransson et al. (2010). Usually insights into complex data structure are revealing. Since a respective manual by the Centre of Multilevel Modelling in Bristol (Rasbash et al. 2012) explicitly uses examples and data sets from the social sciences, the aim of the present tutorial is to give a rather non-technical description of the basic principles of multilevel modeling using exclusively periodontal datasets which have been collected over the past ten years in order to further promote the correct statistical analysis of frequently hierarchically organized dental data.


1.1 The Problem

As Rasbash et al. (2012) commence in the introduction of the latest MLwiN manual, “In the social, medical and biological sciences multilevel or hierarchical structured data are the norm and they are also found in many other areas of application.”

Whereas any statistical model should explicitly recognize a hierarchical structure when it is present, and data structure is expected to be commonly hierarchical in dentistry and, in particular, periodontology, there are essentially two traditional approaches to data analysis.


1.2 Traditional Solutions

1.2.1 Site-specific analysis disregarding the subject

This approach, which can mainly be traced in scientific papers in Periodontology well up into the mid- or end-1980s, has vehemently been condemned by biostatisticians (Imrey 1986). As fact of the matter, clustered or hierarchical observations made in a certain subject are not independent, which is a fundamental assumption required for most statistical hypothesis testing. For instance, measures of periodontal disease within an oral cavity of a given patient are more alike than observations across oral cavities of other patients or subjects. By ignoring the subject level, standard errors of regression coefficients will inevitably be underestimated with grave consequences for hypothesis testing.                                                       

1.2.2 Aggregate analysis

By far the most common approach is, therefore, aggregating observations at the subject level. As an example, consider the cohort of 127 young adults with gingivitis where the association between presence or absence of supragingival dental plaque (a biofilm constantly forming on tooth surfaces, which can and should be removed regularly by toothbrushing) and gingival bleeding on probing (BOP) with a periodontal probe exerted with a more or less defined pressure (a sign for gingival inflammation caused, according to common sense, by dental plaque) had been assessed (Müller et al. 2000a).

In a subject-level, aggregate analysis one could have a look at the correlation between the proportion of tooth surfaces covered by plaque in each subject and the proportion of respective gingival units bleeding on probing. As an example, Fig. 1.1 displays results of such an analysis.

Ordinary regression was used to assess the relationship between the two variables. What might be stunning is the considerable scatter of data pairs representing the subjects. Correspondingly, the correlation between the two sets of proportions was only a moderate with Pearson’s r of 0.54.

Continue reading.

31 March 2014 @ 12:55 pm.

Last modified March 31, 2014.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s