There are a number of ways in which the simulation exercises can
be useful in program evaluation contexts. First, they provide
a powerful teaching tool (Eamon, 1980; Lehman, 1980). Students
of program evaluation can explore the relative advantages of these
designs under a wide variety of conditions. In addition, the
simulations show the student exactly how an analysis of these
designs could be accomplished using real data. Second, the simulations
provide a way to examine the possible effects of evaluation implementation
problems on estimates of program effect (Mandeville, 1978; Raffeld
et al., 1979; Trochim, 1984). Just as NASA explores difficulties
in a space shuttle flight using an on-ground simulator, the data
analyst can examine the possible effects of attrition rates, floor
or ceiling measurement patterns, and other implementation factors.
Finally, simulations make it possible to examine the potential
of new data analysis techniques. When bias is detected in traditional
analysis and analytic solutions are forthcoming, simulations can
be a useful adjunct to statistical theory.
Simulations offer several advantages for teaching program evaluation.
First, students can construct as well as observe the simulation
program in progress and get an idea of how a real data analysis
might unfold. In addition, the simulation presents the same information
in a number of ways. The student can come to a better understanding
of the relationships between within-group pretest and posttest
means and standard deviations, bivariate plots of pre-and postmeasures
that also depict group membership, and the results of the ANCOVA
regression analyses. Second, the simulations illustrate clearly
some of the key assumptions that are made in these designs and
allow the student to examine what would happen if these assumptions
are violated. For instance, the simulations are based on the
assumption that within-group- pre-post slopes are linear and that
the slopes are equal between groups. The effects of allowing
the true models to have treatment interaction terms or nonlinear
relationships can be examined directly with small modifications
to the simulation program as Trochim (1984) illustrated for the
RD design. Third, the simulations demonstrate the importance
of reliable measurement. By varying the ratio of true score and
error term variances, the student can directly manipulate reliability
and show that estimates of effect become less efficient as measures
become less reliable. Finally, simulations are an excellent way
to illustrate that apparently sensible analytic procedures can
yield biased estimates under certain conditions. This is shown
most clearly in the simulations on the NEGD. Although the apparent
similarity between the design structures of the RE and NEGD might
suggest that traditional ANCOVA regression models are appropriate,
the simulations clearly show this to be false and thereby confirm
the statistical literature in the area (Reichardt, 1979).
The validity of estimates from the simulation exercises contained
in this manual depend on how well they are executed or implemented
in the field. There are many implementation problems occurring
in typical program evaluations--attrition problems, data coding
errors, floor and ceiling effects on measures, poor program implementation,
and so on--that degrade the theoretical quality of these designs
(Trochim, 1984). Clearly, there is a need for improved evaluation
quality control (Trochim and Visco, 1985), but when implementation
problems cannot be contained, it is important for the analyst
to examine the potential effects of such problems on estimates
of program gains. This application of simulation is analogous
to simulation studies that NASA conducts to try to determine the
effects of problems in the functioning of the space shuttle or
a communications satellite. There, an exact duplicate of the
shuttle or satellite is used to try to recreate the problem and
explore potential solutions. In a similar way, the program evaluator
can attempt to recreate attrition patterns or measurement difficulties
to examine their effects of the analysis and discover analytic
corrections that may be appropriate. The analyst can directly
manipulate the models of the problems in order to approximate
their reality more accurately and to examine the performance of
a design under more varied situations. Such simulations are useful
in that they can alert the analyst to potential bias and even
indicate the direction of bias under various assumptions.
One of the most exciting uses of simulation involves the examination
of the accuracy and viability of "new" statistical techniques
that are designed to address the deficiencies of previous models.
There are two reasons why simulations are particularly valuable
here. First, the conditions that the analysis will yield unbiased
estimates. Second, simulations allow the analyst to examine the
performance of the analysis under degraded conditions or conditions
that do not perfectly match the mathematical ideal. Thus, simulations
can act as a proving ground for new analyses that supplement and
extend what is possible through mathematical argument alone.
This application of simulations can be illustrated well by reflecting
on the NEGD simulations, where the estimates of program effect
were clearly biased. This bias is well know in the methodological
literature (Reichardt, 1979) and results from unreliability (measurement
error) in the preprogram measure under conditions of nonspecific
able group nonequivalence. One suggestion for addressing this
problem analytically is to conduct what is usually called a reliability-corrected
analysis of Covariance to adjust for pretest unreliability in
the NEGD. The analysis involves correcting the pretest scores
separately for each group using the following formula:
where:
Xadj = the adjusted or reliability corrected pretest
xmean = the within-group pretest mean
xi = pretest score for case i
rxx = an estimate of pretest reliability
The analyst must use an estimate of reliability and there is considerable
discussion in the literature (Reichardt, 1979; Campbell and Boruch,
1975) about the assumptions underlying various estimates (for
example, test-retest or internal consistency). The reader is
referred to this literature for more detailed consideration of
this issues. The choice of reliability estimate is simplified
in simulations because the analyst knows the true reliability
(as discussed earlier). This adjusted pretest is then used in
place of the unadjusted pretest for the NEGD simulations.
To illustrate the correction, simulations were conducted under
the same conditions in the NEGD exercises using the reliability-corrected
ANCOVA. It is clear that the reliability corrected NEGD analysis
yields unbiased estimates, thus lending support to the idea that
this correction procedure is appropriate, at least for the conditions
of these simulations.
Simulations have been used to explore and examine the accuracy
of a wide range of statistical analyses for program evaluation
including models for adjusting for selection biases in NEGD (Trochim
and Spiegelman, 1980; Muthen and Joreskog, 1984); for correcting
for misassignment with respect to the cutoff in RD designs (Campbell
et al., 1979; Trochim, 1984), and for assessing the effects of
attrition in evaluations (Trochim, 1982).