CHAPTER 1 Introduction 1.1 Problems of Comparative Studies: An Overview 1.2 Plan of the Book 1.3 Notes on Terminology This book is concerned with the design and analysis of research studies assessing the effect on human beings of a particular treatment. We shall assume that the researchers know what kinds of effects they are looking for and, more precisely, that there is a definite outcome of interest. Examples of such treatments and the corresponding outcomes include the administration of a drug (treatment) claimed to reduce blood pressure (outcome), the use of seat belts (treatment) to reduce fatalities (outcome) among those involved in automobile accidents, and a program (treatment) to improve the reading level (outcome) of first graders. As is seen from these examples, the word "treatment" is used in a very general sense. 1.1. PROBLEMS OF COMPARATIVE STUDIES: AN OVERVIEW It is useful to begin with what might at first sight appear to be an obvious question: What do we mean by the effect of a treatment? We would like to ascertain the differences between the results of two studies. In the first study we determine what happens when the treatment is applied to some group, in the second we determine what would have happened to the same group if it had not been given the treatment of interest. Whatever differences there may be between the outcomes measured by the two studies would then be direct consequences of the treatment and would thus be measures of its effect. This ideal experiment is, of course, impossible. Instead of doing the second study, we establish a standard of comparison to assess the effect of the treatment. To be effective, this standard of comparison should be an adequate proxy for the performance of those receiving the treatment‹the treatment group‹if they had not received the treatment. One of the objectives of this book is to discuss how to establish such standards of comparison to estimate the effect of a treatment. Standards of comparison usually involve a control or comparison group of people who do not receive the treatment. For example, to measure the effect of wearing seat belts on the chance of surviving an automobile accident, we could look at drivers involved in auto accidents and compare the accident mortality of those who wore seat belts at the time of the accident with the accident mortality of those who did not. Drivers who were wearing seat belts at the time of the accident would constitute the treatment group, those who were not would constitute the control group. Ideally, the accident mortality of the control group is close to what the accident mortality of the treatment group would have been had they not worn seat belts. If so, we could use the accident mortality of the control group as a standard of comparison for the accident mortality of the treatment group. Unfortunately, the use of a control group does not in itself ensure an adequate standard of comparison, since the groups may differ in factors other than the treatment, factors that may also affect outcomes. These factors may introduce a bias into the estimation of the treatment effect. To see how this can happen, consider the seat belt example in more detail. Example 1.1 Effect of seat belts on auto accident fatality: Consider a hypothetical study attempting to determine whether drivers involved in auto accidents are less likely to be killed if they wear seat belts. Accident records for a particular stretch of highway are examined, and the fatality rate for drivers wearing seat belts compared with that for drivers not wearing seat belts. Suppose that the numbers of accidents in each category was as given in Table 1.1. From Table 1. 1, the fatality rate among drivers who wore seat belts was 10/50 = 0.2 Table 1.1 Hypothetical Auto Accident Data Seat Belts Worn Not Worn Total Driver killed 10 20 30 Driver not killed 40 30 70 Total 50 50 100 Fatality rate 0.2 0.4 Table 1.2 Auto Accident Data Classified by Speed at Impact Low Impact Speed High Impact Speed Seat Belts Worn? Seat Belts Worn? Yes No Total Yes No Total Driver killed 4 2 6 6 18 24 Driver not killed 36 18 54 4 12 16 Total 40 20 60 10 30 40 Fatality rate 0.1 0.1 0.6 0.6 and the rate among those not wearing seat belts was 20/50 = 0.4. The difference of 0.4 ‹0.2 = 0.2 between the two rates can be shown by the usual chi-square test to be statistically significant at the .05 level. At first sight the study appears to demonstrate that seat belts help to reduce auto accident fatalities. A major problem with this study, however, is that it takes no account of differences in severity among auto accidents, as measured, for example, by the speed of the vehicle at impact. Suppose that the fatalities among accidents at low speed and at high speed were as given in Table 1.2. Notice that adding across the cells of Table 1.2 gives Table 1.1. Thus 10 = 6 + 4, 20 = 2 + 18, 40 = 36 + 4, and 30 = 18 + 12. However, Table 1.2 tells a very different story from Table 1.1. At low impact speed, the fatality rate for drivers wearing seat belts is the same as that for drivers not wearing seat belts, namely 0.1. The fatality rate at high impact speed is much greater, namely 0.6, but is still the same for belted and unbelted drivers. These fatality rates suggest that seat belts have no effect in reducing auto accident fatalities. The data of Example I.1 are hypothetical. The point of the example is not to impugn the utility of seat belts (or of well-conducted studies of the utility of seat belts) but to illustrate how consideration of an extra variable (speed at impact) can completely change the conclusions drawn. A skeptical reader might ask if there is a plausible explanation for the data of Table 1.2 (other than that the authors invented it). The crux of the example is that drivers involved in accidents at low speed are more likely to be wearing seat belts than those involved in accidents at high speed. The proportions, calculated from the third line of Table 1.2, are 40/60 and 10/40, respectively. Perhaps slow drivers are generally more cautious than are fast drivers, and so are also more likely to wear seat belts. We say that speed at impact is a confounding factor because it confounds or obscures the effect, if any, of the risk factor (seat belts, or the lack of them) on outcome (death or survival). In other words, the confounding factor results in a biased estimate of the effect. Fortunately, if (as in Example I. I ) the confounding factor or factors can be identified and measured, the bias they cause may be substantially reduced or even eliminated. Our purpose in this book is to present enough detail on the various statistical techniques that have been developed to achieve this bias reduction to allow researchers to understand when each technique is appropriate and how it may be applied. 1.2 PLAN OF THE BOOK In Chapters 2 and 3 we discuss the concepts of bias and confounding. In Chapter 3 we also consider the choice of the summary measure used to describe the effect of the treatment. In Example I. I we used the difference between the fatality rates of the belted and unbelted drivers to summarize the apparent effect of the treatment, but other choices of measure are possible, for example the ratio of these rates. The construction of standards of comparison is the subject of Chapter 4. As we have said, these usually involve a control or comparison group that does not receive the treatment. When the investigator can choose which subjects enter the treatment group and which enter the control group, randomized assignment of subjects to the two groups is the preferred method. Since randomization is often not feasible in studies of human populations, we discuss both randomized and nonrandomized studies. In nonrandomized studies statistical techniques are needed to derive valid standards of comparison from the control group, which, as we have seen in Example I . 1, may otherwise give misleading results. Although randomized studies are less likely to mislead, their precision can often be improved by the same statistical techniques. Chapter 5 discusses the choice of variables to be used in the analysis, a choice that must be related to the context and aims of the study. We also show how the specification of a mathematical model relating the chosen variables is crucial to the choice of an appropriate method of analysis and consider the effects of inadequacies in the model specification. Chapters 6 to 10 each consider one statistical technique for controlling bias due to confounding factors. These techniques fall into two major categories, matching and adjustment. In matching (Chapter 6), the members of the comparison group are selected to resemble members of the treatment group as closely as possible. Matching can be used either to assemble similar treatment and control groups in the planning of the study before the outcomes are determined, or to select comparable subjects from the two groups after a treatment has been given and outcomes measured. Unlike randomization, which requires control over the composition of both groups, matching can be used to construct a comparison group similar to a preselected or self- selected treatment group. The other major category, adjustment techniques, consists of methods of analysis which attempt to estimate what would have happened if the treatment and comparison groups had been comparable when in fact they were not. In other words, the estimate of the effect of the treatment is adjusted to compensate for the differences between the groups. These adjustment methods include standardization and stratification (Chapter 7), analysis of covariance (Chapter 8), logit analysis (Chapter 9), and log- linear analysis (Chapter 10). A common problem with longitudinal studies is that subjects may be lost to follow-up at the end of or during the course of the study. Chapter 11, on survival analysis, discusses the analysis of such studies, including the control of confounding factors. Chapter 12 discusses repeated measures designs, where the same subjects are assessed on the outcome variable before and after the intervention of a treatment. Two summary chapters conclude the book. Chapter 13 discusses the choice of statistical technique and shows how two techniques can sometimes be used together. Finally, Chapter, 14 presents criteria to consider in drawing causal inferences from a comparative study. The methodological Chapters (6 to 12) may be read in any order, but they all use material from Chapters 1 to 5. Chapter 13 refers in detail to Chapters 6 to 10. Chapter 14 may be read at any point. The book presents the general rationale for each method, including the circumstances when its use is appropriate. The focus throughout is on unbiased, or nearly unbiased estimation of the effect of the treatment. Tests of significance are given when these can be performed easily. Although we give many examples to illustrate the techniques, we do not dwell on computational details, especially when these can best be performed by computer. We shall assume throughout the book that the researchers have chosen a single outcome factor for study. For simplicity of presentation we often also restrict attention to the estimation of the effect of a single treatment in the presence of a single confounding factor, although extensions to multiple confounding factors are indicated. Some special issues that arise with multiple confounding factors are discussed in Chapter 5. Throughout the book the main concern will be internal validity‹ attaining a true description of the effect of the treatment on the individuals in the study. The question of external validity‹ whether the findings apply also to a wider group or population‹is not discussed in depth as it is primarily determined by the subject matter rather than by statistical considerations. 1.3 NOTES ON TERMINOLOGY Throughout, we shall refer to the effect of interest as the outcome factor. A common synonym is response factor. The agent whose effect on the outcome factor is being studied will be called the treatment, treatment factor, or risk factor. The word "treatment" is generally used to describe an agent applied specifically to affect the outcome factor under consideration (as was true for all the examples in the first paragraph of this chapter). The term "risk factor," borrowed from epidemiology, is used when exposure to the agent is accidental or uncontrollable, or when the agent is applied for some purpose other than to affect the specific outcome factor under consideration. An example would be the study of the effect of smoking on the incidence of lung cancer. The use of the term "risk factor" does not in itself imply that the agent is "risky" or in fact, that risk enters the discussion at all. We use whichever term ("treatment" or "risk factor") appears more natural in context. In later chapters we talk about quantities or labels that measure the presence, absence, level or amount of a risk factor, treatment, outcome factor, or confounding factor. Such quantities or labels will be termed variables. In studying the effect of seat belts on accident mortality (Example 1.1 ) we may define a risk variable taking the value l or 0, depending on whether or not the driver was wearing a seat belt at the time of the accident. The logical distinction between a factor and a variable which measures that factor is not always made in the literature, but it can be useful. The term "comparison group" is used interchangeably with the more familiar "control group." When the important comparison is between a proposed new treatment and the present standard treatment, the standard treatment (rather than no treatment) should be given to the comparison group. In dealing with risk factors it is natural to speak of "risk groups" or of "exposed" and "nonexposed" groups. We may have several different "exposed" or "treatment" groups, corresponding to different levels of the risk factor or treatment.