from Annual Review of Public Health 1983. 4:155 -180

APPROPRIATE USES OF MULTIVARIATE ANALYSIS

James A. Hanley

Department of Epidemiology and Health, McGill University, Montreal, 
Quebec, Canada H3A 2A4
(name and address have since changed)

INTRODUCTION

Comparison of the articles in today's biomedical literature with those 
of twenty years ago reveals many changes. In particular, there seem to 
have been large increases over time in three indices: the number of 
authors per article, the number of data-items considered, and the use 
of multivariate statistical methods. While cause and effect among these 
three indices is unclear, there is little doubt that the growth in a 
fourth factor, namely, computing power and resources, has made it much 
easier to assemble larger and larger amounts of data. Packaged 
collections of computer programs, driven by simple keywords and 
multiple options, allow investigators to manage, edit, transform, and 
summarize these data and fit them to a wide array of complicated 
multivariate statistical "models." In addition to making it easy for 
the investigator to include a larger number of variables in otherwise 
traditional methods of statistical analysis, the increased speed and 
capacity of computers have also been partly responsible for the new 
methods being developed by contemporary statisticians. For example, 
some of the survival analysis techniques discussed below can involve 
several million computations.

How do these trends in the availability and use of multivariate 
statistical methods affect the health researcher who must decide what 
data to collect and how to analyze and present them? How does the 
reader of the research report get some feeling for what the writer is 
attempting to do when he uses some of these complex-sounding 
statistical techniques! Are these methods helping or are they possibly 
confusing the issue?

Unfortunately one cannot look to one central source for guidance about 
these newer methods. Descriptions of many of them are still largely 
scattered in the (often highly technical) statistical literature or else 
presented in monographs in which the connections to other related 
techniques may not be very evident. Moreover, the reader is often not 
interested in references to the technical intricacies of maximum 
likelihood equations, to the methods of solving them, or to the computer 
program or package used to perform the calculations; rather he is 
worried about what the technique is attempting to do, what the 
parameters mean, and whether the assumptions and conclusions are 
appropriate.

The plan of this chapter then is not so much to review all of the recent 
developments in statistical methodology, but rather to use examples 
from the literature (a) to give an overview of what multivariate 
analysis is all about, (b) to describe, in general terms, what it can 
and cannot be expected to do, and (c) to discuss in a little more 
detail some newer techniques, as well as some that were developed some 
time ago but are only now becoming popular, namely (i) logistic 
regression, (ii) log-linear models for multiway contingency tables, 
(iii) proportional hazards models for survival data, and (iv) 
discriminant analysis.

MULTIVARIATE ANALYSIS: AN OVERVIEW

Scope

The term multivariate analysis has come to describe a collection of 
statistical techniques for dealing with several data- items in a single 
analysis. Although authors differ about where to draw exact boundaries, 
for example whether multiple regression is a univariate or multivariate 
technique, it is more a matter of semantics than it is of substance. I 
follow here the convention of others (10, 28, 33, 43) and define any 
analysis that involves three or more variables simultaneously as 
"multivariate." As such, the term multivariate analysis encompasses 
everything except confidence intervals, chisquare tests for two-way 
contingency tables, l-tests (unpaired), one-way analysis of variance, 
and simple correlation and regression. It includes a huge variety of 
techniques, since even with just three variables, there are a large 
number of possibilities (Table 1). The method of analysis depends 
heavily on whether one is interested in interrelationships or in 
comparisons, and on whether variables are qualitative or quantitative. 
The most I can do in this short space is to give a brief roadmap, along 
with pointers to helpful descriptions or examples. In many situations 
there will not be one single best method of analysis. As Bishop et al 
(10) point out, multivariate analysis should be thought of as a 
"codification of techniques of analysis, regarded as attractive paths 
rather than straightjackets, which offer the scientist valuable 
directions to try."

Table I A taxonomy of parametric statistical methods

                              Response variable(s)

                       Univariate              Multivariate
                      ============           ================
Stimulus         Discrete   Continuous     Discrete       Continuous
variable(s)      --------   ----------     --------       ----------
                    [1]         [2]           [3]             [4]

Univariate
==========

  Discrete     Contingency  t-test      Multidimensional   Discriminant
  --------     table                    contingency table  analysis
                            1-way  of                     
                            analysis of                    logistic
                            variance                       regression
                            (Anova)                        

  Continuous   Logistic     Correlation                    Multivariate
  ----------                                               regression
               Discriminant Simple 
               analysis     regression

Multivariate
============

  Discrete     Multi-       Multi-way    Multi-            Multivariate 
  --------     dimensional  Anova        dimensional       anova
               contingency               contingency       (Manova)
               table                     table                   

  Continuous   Logistic     Partial                        Multivariate 
  ----------   regression   correlation                    regression

               Discriminant Multiple                       Canonical 
               analysis     regression                     analysis

  Mixed        Logistic     Analysis of                     Multivariate
  -----        regression  covariance                      regression
                           (Ancova)

               Discriminant                                Canonical
               analysis                                    analysis

Types of  analyses

Multivariate statistical techniques may be conveniently divided into 
those in which the variables involved (a) are all of "equal status" or 
(b) fall naturally (or with some gentle pushing) into two sets, those 
which are influenced (response variables) and those which influence 
(stimulus variables).

In the first group of techniques, which includes Principal Components 
Analysis, Factor Analysis, and Cluster Analysis, the emphasis is on the 
internal structure of the data-items in a single sample.

Principal Components Analysis (PCA) asks whether a large number of 
quantitative data items on each subject can be combined and reduced to a 
single (or at most a few) new variables (principal components) without 
losing much of the original information. In other words, the aim is to 
describe the subjects in terms of their scores (weighted sums of the 
original variables) on a much smaller number of new variables. These new 
variables (components) are built to be uncorrelated with each other, so 
as to avoid any redundancy. Also, they are arranged in decreasing order 
of "information" so that subjects are furthest apart from each other on 
the first component, less far apart on the second, and so on. If the 
total information in the original variables is "compressible," the 
subjects will not vary very much on the latter components, and these can 
be discarded as redundant. Theoretically, since there are as many 
principal components as there are original variables, retaining them all 
permits one to reproduce the original data. An example in which the 
first principal component captured 67% of phenotypic variance in a 
population and was then used as a (univariate) index of overall body 
size in all subsequent analyses can be found in (11).

Factor Analysis (FA) asks whether subjects' quantitative responses on a 
large number of items and the patterns or correlations among these 
responses are "explainable" by thinking of each item or variable as 
measuring or reflecting a different mix of a smaller number of 
underlying "factors" or "traits" or "dimensions." As originally 
conceived, it differs from PCA in a number of ways. Whereas PCA 
"constructs" new variables from already observed ones, FA goes in the 
other direction, "reconstructing" the observed variables from latent 
ones. This distinction may have been too subtle and has largely 
evaporated; moreover, most computer packages use principal components as 
one way of extracting factors. Second, FA usually assumes that although 
factors are translated into variables by a "mixing formula" that is 
common to all subjects, variables will also contain some variation that 
is unique to each subject. Third, whereas PCA is more a data-reduction 
technique, FA seeks actually to understand and label the various 
"factors." Fourth, unlike PCA, FA does not necessarily produce unique 
answers. Indeed, there are many methods of factor analysis.

FA techniques are used primarily to explore relationships and to reduce 
the dimensionality of a data set. They serve more for instrument 
building and index construction than as direct analytic tools. However, 
although they are closely associated in psychology with establishing 
construct validity, at least one author (40) considers them generally 
inappropriate for developing health indices. These techniques have been 
somewhat more useful when the context is of a physical nature, such as 
in studying air pollution patterns (35), but even then, there are 
difficulties (5). The few published examples of FA in epidemiology and 
public health have either concluded the obvious or concluded nothing at 
all. The same seems to hold true for their use in the medical literature 
(28).

By far the majority of the applications of multivariate statistical 
methods in the health sciences are of the second kind, where one or more 
variables serve as "outcomes" or "responses" or "target variables" (28), 
and others serve as "predictors" or "explanatory" or "carrier" (48) 
variables. These two sets of terms are gradually replacing the older and 
quite misleading terms, "dependent" and "independent" variables. Some 
authors subdivide the explanatory variables further into those of 
primary interest ("study variables") and those of a "disturbing" or 
"confounding" or "nuisance" nature; I return to this subdivision below.

The main types of techniques for dealing with stimulus-response studies 
are presented in Table 1, in the form of a multiway grid, according to 
whether the stimulus and response variable(s) (rows and columns, 
respectively) are one or many and according to whether they are all 
recorded on continuous measurement scales, or are all categorical 
(discrete), or a mixture of both.

It is worth dwelling for a moment on a number of contrasts between 
methods for analyzing a single (univariate) response that is "measured" 
on a continuous scale (column 2) and those for a corresponding response 
that is discrete (column 1).

1. Methods for analyzing a continuous response have been in existence 
for considerably longer (the principle of least squares for fitting a 
regression line dates back at least two centuries; the newest technique, 
analysis of covariance, is at least 50 years old).

2. These methods tend to choose parameters and judge the amount of 
variation explained by various factors using easily understood 
"distance" criteria such as least squares; in other words, they keep the 
analysis in the same scale or "metric" that the actual observations were 
measured on; by contrast, methods for analyzing a discrete response tend 
to measure "distance" and "fit" using a probability or "likelihood" 
scale (likelihood is defined as the probability, calculated after the 
fact, of observing the data values one did). Although the method of 
fitting parameters to maximize the likelihood is in no sense inferior 
(if anything it is generally superior from a technical standpoint), it 
is easier for readers to comprehend changes in it-squared than changes 
in a log-likelihood!

3. Regression equations for a continuous response are usually linear, 
involving additive terms, and can be fitted from simple summary 
statistics, whereas those for a discrete response are often nonlinear, 
and need to be fitted iteratively with several passes through the data.

4. Estimates from these nonlinear regressions tend to have skewed 
sampling distributions, giving rise to confidence intervals that are 
not symmetric. The odds ratio used in epidemiologic studies is a case 
in point. Fortunately, it is often possible to work in a scale (e.g. 
log) in which the confidence interval will be of a simpler, symmetric, 
shape and to change back to the desired scale at the finish.

As can be seen from Table 1, multiway contingency tables, logistic 
regression, and discriminant analysis all play dual functions: they can 
be used to analyze either a single response variable and several stimuli 
or several responses and a single stimulus. Indeed, as discussed below, 
this ability to reverse a "multiple response, single stimulus" situation 
and cast it into a more traditional and more workable "one response, 
multiple stimuli" regression framework is key to handling multiple 
response data.

As one proceeds to treat several response variables and several stimulus 
variables simultaneously, the level of complexity increases 
considerably: all but the few with e-dimensional vision are quickly 
lost. As a result, even though computer programs are available for them, 
the two "doubly-multivariate" techniques, multivariate regression and 
multivariate analysis of variance (Column 4, Table 1), are seldom used. 
Instead, investigators try first to construct a "univariate" response 
and then relate this to the several stimulus variables.

MULTIVARIATE ANALYSIS: PURPOSES

In this section I discuss the Why of multivariate techniques. Although 
there are many different techniques, they share a number of common aims 
and a common underlying philosophy. Of course, they also have many of 
the same pitfalls; I discuss some of these below.

It is difficult to discuss multivariate techniques without also 
discussing the concept of statistical "models." It sometimes helps to 
think of these models as comprising two parts, one that is deterministic 
(dealing with the expected structure, almost like a "law") and one that 
is stochastic (dealing with random variation). This first part will be 
of a more global nature, describing what should happen. It might 
describe how two chemical agents act together on a host or how a lung 
grows in volume as it grows in linear dimensions; it might be based on 
or summarize a psychological or sociological theory; or it might be a 
rough straight-line or curvilinear pattern seen in the data, and which 
one wants to follow up. This "structural" part of the overall 
statistical model can be thought of as describing the systematic 
variations or pattern one would expect in a body of data. Although it is 
usually described in explicit mathematical equations with coefficients, 
powers, and the like, it does not have to be so precise. For example, 
the model might be: "the dose response relationship has no threshold," 
or "the underlying curve is expected to be concave," or "the risk of 
cancer will vary with age and be different in exposed and nonexposed 
groups, but the risk of cancer among the exposed relative to that among 
the nonexposed will remain the same over all ages."

The other part of the model, which some would regard as the 
probabilistic element, deals with the deviation of the observed data 
from the postulated pattern. It is often difficult, however, to separate 
the two parts of the overall model, since it is not clear where prior 
knowledge (pattern) ends and ignorance (unexplained variation) begins, 
i.e. whether aberrations are observed because the postulated pattern is 
a poor one (lack of fit) or because of some other reason. Although this 
separation into systematic and random components, i.e. into signal and 
noise, is often used for responses that are recorded on a continuous 
scale, it is done much less frequently for binary responses. One learns 
very early in linear regression to think of both the systematic (the 
straight line) and the random (the scatter of the individual points from 
the line). In a binary regression, one still thinks of a systematic line 
(possibly "s-shaped" such as a probit or logit curve) but seldom stops 
to think about the noise about this curve. Part of the reason for not 
doing so is that the curve is fitted using likelihood, rather than 
distance, as the metric and part is that the variation is binary, not 
continuous. The virtue of this "systematic plus random" paradigm has 
been recently illustrated in the Generalised Linear Interactive 
Modelling (GLIM) computer program (6): the program "generalizes" to a 
wide variety of continuous and binary response regressions by using 
different probabilistic models (Gaussian, Binomial, Poisson, etc.) and 
different "link functions" for changing the systematic portion of the 
model from straight line to s-shaped and so on. GLIM points out that in 
fact there is a "distance" minimization intrinsic to the method of 
Maximum Likelihood.

With this preamble, I now go on to discuss, via examples where possible, 
the main aims and uses of multivariate statistical techniques and 
models. We see four main purposes:

1. to summarize, to smooth out, to see patterns 2. to make comparisons 
fair, to compare like with like 3. to make comparisons clear, to remove 
noise 4. to study many factors at once, to explain variation.

Purpose 1: To Smooth Out, to See the Forest From the Trees

How might one investigate whether and in what way breast cancer 
incidence rates have changed over time, using the available incidence 
data from 1935 to 1980 collected by the Connecticut tumor registry? This 
is an example of a single target variable, binary in nature (cancer or 
not), and the influence of two "stimulus" variables, age and year of 
birth. Suppose we know the numbers of cancers in each of nine five-year 
periods from 1935 to 1980 for each of 12 five-year age groups, along 
with numbers at risk in each of these 9 X 12 = 108 "cells."

As a first step, one could plot the 108 observed age specific incidence 
rates against age and use lines of different colors to connect together 
the data points to form age-specific incidence curves for the different 
birth cohorts. Some of these plots, derived from the data published in 
Reference (60), are given in Figure 1 (left); they show that although 
there seem to be cohort effects, it is difficult to measure them very 
precisely from these "raw" data points. Most would believe that the 
jagged pattern of straight-line segments has no special meaning, and 
would think of it only as noise that is obscuring the "real" underlying 
pattern. They would prefer instead a series of "smoother" incidence 
plots, one for each birth cohort. These systematic "curves" could be 
produced by smoothing each one by eye, but doing so would ignore two 
considerations: first, the rates are calculated from numerators and 
denominators of varying stability (something the eye looking at a data 
point cannot see) and, second, if rates vary smoothly across age, they 
probably also do so across cohorts. Thus, one would need to smooth in 
two directions at once. This could be done by postulating a single 
"parent" plot, consisting of 12 points (left unsmoothed to begin with) 
and specifying that the plots for the separate cohorts are to be 
obtained by multiplying the parent plots by separate proportionality 
factors. Admittedly, the task is too complicated to perform manually, 
but that is hardly an obstacle. This "model-fitting" serves a number of 
purposes.

1. It produces more realistic plots, and uses many fewer numbers or 
"parameters" to do so (for the entire dataset, there would be 20 
cohort parameters and 12 age parameters).

2. It draws the eye away from the randomness (which should be binomial 
or Poisson around each fitted point) and toward the pattern, in the 
same way that an image becomes clearer the further away one stands 
from its rough grain.

The raw plots generated from the earliest and latest cohorts are based 
on fewer data points (age groups) and are the most diflicult to judge, 
whereas the corresponding synthetic plots are generated from parameters 
that were estimated from the entire data set. This concept of borrowing 
strength from neighboring data points is a central one in multivariate 
analysis.

To some, the idea that it takes 20 + 12 = 32 numbers to describe 20 
plots is still unappealing. Surely, they might argue, the parent plot 
(12 parameters) is not in reality so complicated that it could not be 
described by a truly smooth, two or three parameter curve or possibly by 
separate curve segments for pre- and post-menopause. Likewise, they 
would consider it quite likely that the 20 proportionality factors by 
which this incidence curve changes from cohort to cohort themselves form 
a smoothly changing series that could be described by many fewer 
parameters. Others would argue that one should "leave well enough alone" 
and that any further smoothing or modeling might do more harm than good. 
In this example, with the relatively large amount of data, the 
additional reduction might indeed be unnecessary; however, had the data 
been scarcer, it is likely that the further smoothing would have been 
required.

There are two more serious objections to the approach just described. 
First, for any one cohort, the entire parent curve is multiplied through 
by the same value. This does not allow for cohort effects that are 
age-specific, e.g. changes in the age at which women in different 
cohorts completed their first full-term pregnancy might affect the risk 
of premenopausal breast cancer differently than they would the risk of 
postmenopausal cancer. This is an example of what statisticians call an 
interaction: an effect of one factor (age) that is not constant across 
different values or levels of another (year of birth). Second, the 
actual goodness of fit of the smoothed curves to the raw data points 
needs to be evaluated. Before it is, any other expected or suspected 
patterns can be built into the fitted curves (provided that there are 
not so many assumptions and exceptions that one ends up with almost as 
many parameters as data points) and their "fit" tested by examining 
whether in fact the fitted curves come closer to the raw data points 
than before, and whether the discrepancies (residuals) are more or less 
haphazard and unexplainable. See (51) for a nice account of the use of 
regression models in studying regional variations in cardiovascular 
mortality.

As already mentioned, the assumption of smoothness and of orderly 
patterns of change is a central one in multivariate analysis. It stems 
from the belief (or maybe just the hope) that nature is basically 
straightforward, and that if there are no good biologic or other reasons 
to the contrary, relationships tend to be linear rather than quadratic, 
quadratic rather than cubic, etc. [For a description of this principle 
of "Occam's Razor," see Ref. (54).] In the breast cancer example just 
described, however, the changes in some possible risk factors have been 
"man-made" and more sudden, e.g. world wars, shifts in childbearing 
habits, oral contraceptives, etc, and it may indeed be some sudden 
changes in incidence (as it was with liver cancer) that alert us to 
newly introduced causative (or protective) agents.

Purpose 2: To Make Comparisons Fair

The majority of analytic studies involving humans are of an 
observational, rather than experimental, nature. As a result, when one 
compares responses of one group with those of another, the fundamental 
scientific principle of holding all other factors constant or equal may 
be violated. Consequently, differences (or nondifferences) in responses 
may be caused by differences (imbalances) in factors that cannot be 
controlled experimentally, rather than by the basic variable (groups) 
under study. Such variables, referred to as "confounding," 
"disturbing,' or "extraneous" by various authors, can, if ignored, have 
insidious effects. For example, male and female applicants had similar 
acceptance rates in each of the various faculties at Berkeley, yet the 
crude overall (schoolwide) acceptance rate for females was considerably 
lower (9) because females were more likely to apply to those faculties 
for which the acceptance rates were lower. This artifact is referred to 
as Simpson's Paradox, and is always a possibility in observational 
studies.

Although standardization for imbalances (e.g. in age or sex), used to 
put comparisons of rates on a fair footing, is one of the oldest 
epidemiologic tools, it is sometimes ignored. A particularly distressing 
example is the recent controversy in the US and Britain regarding 
possible cancer-causing effects of water fluoridation, based on findings 
that cancer rates had increased more in cities that had been fluoridated 
than in those that had not. As subsequent articles pointed out, these 
effects disappear if differences in the demographic structure of the two 
groups of cities are taken into account. [See Refs. (19, 20) for some 
recent British investigations and a guide to the earlier US studies.] 
One of the benefits (didactically speaking) was the helpful illustration 
of two methods of standardization (41).

Standardization was also used recently in a slightly different context 
(31). It showed that, although the crude infant mortality rate is much 
higher in Massachusetts than in Sweden, if infant mortality rates in the 
two areas were standardized for birthweight, Massachusetts would 
actually have a slightly lower one. The point of the analysis was not to 
explain away or hide the differences in mortality rates, but rather to 
show that it is an advantage in birthweight, and not the superiority of 
Swedish hospital care, that gives Swedish infants a survival advantage. 
Although the country of birth seems as if it is the main study variable 
and birthweight simply a "nuisance factor," in reality, birthweight 
matters everything and country not at all. Luckily, as the accompanying 
editorial pointed out, of the two variables, birthweight (and through 
it, presumably the infant mortality rate) is the mod)fiable one.

To many, the term multivariate analysis has come to mean a statistical 
model that uses regression-type equations and distributional assumptions 
to link observed values of a response variable to values of various 
explanatory variables. Up to this point, the discussion in this section 
has centered around yes/no responses and explanatory variables that were 
either naturally discrete (sex, race, country, faculty) or forced to be 
discrete (age group, birthweight group). These types of data lend 
themselves to such straightforward tabulation and computation of 
standardized rates (a technique known as a stratified arzalysis) that 
one might rightly ask what is "multivariate" about the method other than 
the fact that it involves three or more variables. The answer is that by 
averaging results over a number of cells (strata), analysis techniques 
such as that of Mantel-Haenszel (used to combine data from several 2 X 2 
tables into a single summary) do, at least implicitly, assume that all 
tables are measuring a common odds ratio. If the underlying odds ratios 
are not the same in each table, then the single odds ratio produced by 
the Mantel-Haenszel technique measures a weighted average of these 
separate ratios, and since the weighting is related to the relative 
sizes of the separate tables, the average will be somewhat arbitrary. 
The same is true of rates that are computed with reference to some 
standard population�they depend on the assumed mix of categories in the 
model population. This emphasizes a central issue in all multivariate 
analyses: One cannot adjust or standardize a comparison without making 
certain assumptions. Probably the best way to view statistical models is 
as "a series of approximations to the truth": one can realize that the 
assumptions (model) used to adjust a comparison may not be entirely 
correct but proceed as best one can, or one can forego any adjustment 
because one did not realize the need or was afraid to make assumptions. 
It is a choice between the results being approximately correct and being 
precisely wrong!

To end this section, I discuss briefly situations in which the response 
variable is continuous rather than discrete (I shall discuss more 
complicated methods for standarizing rates, below), and address issues 
of matching and of adjustment by regression. In some experimental 
studies, it is possible to compare responses to two or more maneuvers 
applied to the same individual. The advantage of having each subject 
serve as his own control is obvious: the comparison is immediately fair 
with respect to an infinity of variables that could otherwise 
theoretically bias it. When this is not possible, the next best thing, 
using balancing or randomization (or both), to equalize the two groups 
receiving the different maneuvers, is often difficult. This is 
especially true if the numbers in the two groups are so small that it is 
impossible to balance them adequately, or if the study is an 
observational one and the groups have already been formed. For example, 
in a recent study (42) comparing the ventilatory function, as measured 
by forced expiratory volume (FEV), of workers who had worked in a 
vanadium factory for at least four months with that of an unexposed 
reference group, investigators matched the subjects for two variables 
known to influence lung function: age (to within two years) and 
cigarette smoking (to within five cigarettes daily). However, since the 
two groups differed by an average of 3.4 cm in height, a variable with a 
very strong relationship to FEV, some standardization or adjustment was 
required. The authors achieved this using the finding of Cole (17) that 
past age 20, the predicted FEV for a man of a certain age and height is 
approximately of the form

       FEV = height-squared x (a + b x age)

Both members of each matched pair were already concordant for age and 
smoking; thus, if one simply divided each man's recorded FEV by his 
squared height, the resulting paired values could be taken as FEY's that 
were adjusted for one member being taller or shorter than the other. 
Since the effect was as though the pairs had been also matched for 
height, the comparison was carried out using a straightforward paired 
t-test on the differences in the pairs of adjusted FEV's. Although the 
task will often be more difficult than in this elegant example, the 
principle generally remains the same: one calculates what each subject's 
response would be expected to be if all of the variables that distort or 
bias the comparison were held equal, say at the mean of each covariable. 
The term analysis of covariance (3, 4) has generally been applied to 
adjustments of a simple additive nature, but as we have just seen, if 
some other relationship more appropriately and more accurately describes 
the way in which the covariate(s) affect the response, and if it is easy 
to derive, it is certainly preferable. Usually this relationship between 
response and confounders is estimated "internally" from the data at 
hand, unless the study is small and some outside norms (e.g. weight and 
height charts, dental maturity curves) are deemed better. Researchers 
generally feel safer using internal standardization; by doing so, they 
avoid problems of different measurement techniques, inappropriate 
reference samples, etc. In the vanadium study just cited, one could 
actually test Cole's FEV internally in the group of nonexposed workers. 
If the study did not have a pure unexposed group, and relied instead on 
the withingroup variation in the amount of exposure, one would probably 
treat the exposure more as a continuous variable and use a multiple 
regression approach.