Purpose 3: To Sharpen Comparisons With the considerable emphasis on using multivariate techniques such as analysis of covariance to control bias, it is often forgotten that these methods may also be used to eliminate unwanted variability and thereby increase the signal to noise ratio. Users and readers alike often have the impression that if the subjects in two groups are balanced with respect to some major explanatory variable, there is no need to account for that variable in any analysis. This misconception is especially likely to arise in a large randomized trial in which the balance is expected, and seen, to be good. A recent example (50), dealing with a subject that may be more amusing than relevant from a public health viewpoint, illustrates the usefulness of analysis of covariance in increasing the precision of various comparisons. Figure 2a shows the responses of the 25 subjects in each of the five groups. The considerable "within group" variation makes it difficult to judge whether, compared with this large source of "noise," any apparent systematic differences in longevity among the groups are more random than real. Some guidance is given by Figure 2b, which shows that much of the noise is due to the fact that larger subjects tend to live for longer and smaller ones for shorter lengths of time. Faced with this, it is clear that the smaller subjects should be compared with other smaller subjects and larger ones with other larger ones. This way, within each size category the within-group variation would be considerably less, thereby allowing systematic between group differences to "shine through" more easily. Thus, the strong relationship between longevity and size would become irrelevant. Indeed, the experiment could have been planned very tightly by matching on size and analyzing the inter group comparisons by paired l-tests or other techniques for matched subjects. However, this would pose problems if subjects were to be individually matched, since it might not be possible to obtain perfect matches. Moreover, in human studies, with fewer cooperative subjects to subdivide along a wide scale, with many variables to match on, with the difficulty of obtaining all matching data before forming study groups or (in the observational study) with groups who had formed themselves well before any study was contemplated, the difficulties become formidable. To understand how a multivariate analysis can help to overcome these practical problems and allow the researcher to still benefit from a more tightly controlled study, imagine for the moment that the longevity study had been performed not with 25 but 10 subjects per group. Figure 3a illustrates one such possibility. At this point, any efforts at forming size categories, as in Figure 3b, would lead to a certain amount of "trading," i.e. it might be that a slight advantage for one group in the "small-size" category could be balanced off against a disadvantage for that group in the "next size up" category. However, one might not be so lucky, and in any case the within-group responses in the now broader size-categories will be larger. Intuitively, one would like to "homogenize" the subjects within each category by making them all the same size. One way to do this would be to forcibly "slide" the points laterally until they coincide on the size scale as in Figure 3c; to compensate for this change, one would likewise slide the responses vertically by corresponding increments, using an appropriate "exchange ratio" or slope. The slope could be estimated from the data by regression methods. This simple concept of equalization, which is the basis for analysis of covariance, is largely obscured by the all-in-one computational packages that fit the slope and calculate the between and within group variation in a single step. To perform an analysis of covariance for two extraneous variables X1 and X 2, one might imagine responses plotted as vertical bars standing on a two-dimensional grid of (X1, X2) points. To homogenize the responses with respect to X1 and X2, one would first slide the bars diagonally along the grid to a single (X1, X2) point and adjust each vertical height (response) by the sum of B1 X shift in X1 and B2 X shift in X2, where B1 and B2 are regression coefficients describing how the response changes with each variable (while holding all other variables constant). Figure 3 Longevity of ten fruitflies in each of two groups: (a) longevity shows wide within group variation; (b) subjects cannot be easily matched on thorax size; (c) "matching" produced by analysis of covariance; lifetimes are adjusted to what would have been expected had each subject's thorax length been 0.82 mm (adjustment process shown for six subjects). Analysis (i) corrects imbalance of 0.40 mm in average thorax lengths of two groups and (ii) reduces within-group variation. Provided that a large fraction of the observations ("degrees of freedom") do not need to be expended in estimating what the form of the adjustment should be, this analysis of covariance technique can be extended to several extraneous variables. Purpose 4: To Study Several Factors In many health studies, there will be several stimulus variables of primary interest. For example, one might investigate what characteristics of schoolchildren and their environment are associated with their caries experience. Even when the stimulus variables are categorical, the classical multiway analysis of variance is rarely appropriate for such observational studies, since the cells will be of varying sizes (the "design" will be unbalanced). Instead, one usually analyzes such survey data by multiple regression methods, using indicator ("dummy") variables for factors that are categorical (e.g. gender). It is this flexibility that makes multiple regression so attractive. Indeed, if one had to choose between becoming familiar with classical analysis of variance or with regression techniques, one should probably choose the latter: it can accommodate a mixture of categorical and continuous variables and can evaluate these factors in the presence of other variables that are of a disturbing nature rather than of any direct interest. The key to understanding both its strength and at the same time its synthetic nature is realizing that it produces an estimate of the effect of a factor even though there may be no two individuals in the data set for whom all other relevant factors are in fact equal. I comment below on the opportunities for misinterpretation of multiple regression analyses; however, there are three points that are specifically related to "risk- factor" studies. The first concerns the situation in which the distributions of the different risk factors are not independent of each other in a fairly small data set, that is, if risk factor B was present in different proportions in those individuals who had risk factor A and in those who did not. Here, even if the two factors truly contribute independently in an additive way to the response being studied, it is still not possible to obtain independent estimates of these two effects from the sample. The two estimates will be correlated, and each estimated effect will have to be presented "adjusted for the other." This problem, addressed under "collinearity" in statistics textbooks, can become quite serious in health studies if one cannot obtain a good spread of one factor) such as amount of chronic exposure to loud noise, across each level of another factor, such as age. In such situations, one may have to adjust the response (hearing loss) through the use of some outside age-specific norms for hearing loss in unexposed individuals. The second concerns how to deal with the variable "age" in the following hypothetical stepwise multiple regression analysis of caries experience. Factor Multiple R-squared Change Age of child 43% -- Education of mother 50% 7% Intake of fluoride 55% 5% Frequency of tooth brushing 59% 4% Consumption of soft drinks 62% 3% It is mistaken to interpret this kind of output as evidence that the last four factors account for "only 19%" of the variance, when in fact they account for 19 out of the 57 percentage points (100 minus 43) that remain after age has already been accounted for. Because the crude or total variation in caries in this study could have been arbitrarily widened or narrowed by simply studying a wider or narrower age range, and because the real interest is in why two individuals, of the same age, have had different caries experience, the variation introduced by studying children of different ages is quite irrelevant. It can be removed either by actually subtracting from each response an amount attributable to age and analyzing the residuals or, as was indicated above, by a conceptual subtraction in which age is left in the analysis of variance table but all further explanations of variance are measured out of 57 rather than out of 100. A formal statistical test of whether these latter variables are really explaining any variation does in fact judge their contribution relative to what is left to explain, rather than to what has already been explained. [See Reference (15) for a useful discussion of the appropriate terminology for variables such as age and sex.] The third point deals with submitting our caries study, with its multitude of explanatory variables, some of them demographic, such as language group, race and place of residence, and some that are more "basic" (including life style characteristics such as diet and quality of dental care) to a multiple regression. Because either set of variables, or a combination of variables from the two sets, might do well in explaining the observed variation in caries, one needs to be careful and be guided by the purpose of the analysis. Broad demographic labels, e.g. Language spoken at home, that are only predictive through their association with more causal variables, are more relevant for using the results locally to identify those with greater dental care needs. However, the results of an analysis that focuses on direct or proximal variables, e.g. mother's knowledge of oral hygiene practice, are more likely to be transportable to other settings and to uncover mechanisms governing caries. If one does not separate these two sets of variables, but instead submits them all to a regression analysis, the resulting picture may be quite blurred: part of the variance associated with a certain factor may be correctly credited to that factor, whereas part of it may be credited to some demographic variable that is only a proxy for the factor. For the results to make sense, the variables offered to a regression must first make sense.