where each O indicates an observation or measure on a group of
people, the X indicates the implementation of some treatment or
program, separate lines are used to depict the two groups in the
study, the C indicates that assignment to either the treatment
or control group is done using a cutoff score on the pretest assignment
measure, and the passage of time is indicated by moving from left
to right. We will assume that we are comparing a program and
comparison group (rather than a relative comparison of two programs
or different levels of the same program).
The regression-discontinuity design is a type of nonequivalent
group design that is characterized by its method of assigning
persons to groups using a cutoff score on an assignment measure
-- all persons who score above the cutoff are assigned to one
group while those scoring on the other side are assigned to the
other. Two things need to be decided when selecting a cutoff
value. First we need to decide whether the high or low pretest
scorers will receive the program. We might give the program to
the high pretest scorers if we are studying the effects of scholarships
(high achievement), awards (high performance), novel medical treatments
or therapies (high on measures of illness) and so on. We might
give the program to the low pretest scorers when studying compensatory
education (low achievement), poverty (low income), and so on.
In this exercise we will simulate a program given to the low
pretest scorers. Second, we need to decide the specific value
of the pretest cutoff. In the real world, cutoff values are selected
in a number of ways. When there are a limited number of program
openings, the cutoff score can be selected so that exactly the
desired number of persons score either above or below it (depending
on whether the program goes to high or low scorers). In other
situations, some theoretical value is appropriate for the cutoff.
For example, the pretest average might be chosen as the cutoff
because in a particular context it makes sense to give the program
to those who are "below average" or "above
average". In this exercise we arbitrarily use a cutoff
equal to the theoretical pretest average.
You will again make use of the pretest and posttest scores that
you generated in the first exercise. If you recall that the pretest
scores (as generated in the first exercise) can range from 4 to
24, it should be clear that the expected pretest average is 14
units. Thus, we will assign all cases having a pretest score
less than or equal to 14 units to the program group and all others
to the comparison group (remember that in this simulation the
program is given to the low pretest scorers). The assignment
strategy can be summarized as follows:
Z = 1 if X <= 14
= 0 otherwise
where Z is the 0,1 "dummy" assignment variable.
You will generate the data for this exercise using Table 4-1.
In the first column of Table 4-1 you should copy the pretest
scores (Table 1-1, column 5) from the first exercise into column
2 of Table 4-1. Now, examine the pretest score for person 1.
If it is less than or equal to 14, enter a '1' in
Column 3 of Table 4-1 labeled Group Assignment (Z). If it is
15 or higher, enter a '0'. Continue doing this for
all 50 persons. When you have finished, notice that the next
column, labeled "Hypothetical Program Effect" consists
entirely of '7's, that is, the program will increase
the posttest scores of each program participant by 7 units. But
not everyone gets the program and so not everyone should get the
effect of 7 units. You only want those persons who have a Z =
1 (program persons) to get the 7.
An easy way to accomplish this is to multiply the assignment variable
(Column 3) by the effect size (Column 4) and put the result in
Column 5, labeled "Effect of Program". So, the
fourth column should have '7's for all program persons
and '0's for all comparison group persons. Next,
you should copy the posttest scores from the first exercise (Column
6 of Table 1-1) into column 6 of Table 4-1. Finally, to get the
posttest scores with the program effect included you simply add
the "Effect of Program" (Column 5) and posttest
scores (Column 6) and place the result in Column 7 of Table 4-1
labeled "Posttest (Y) for Regression-Discontinuity Design."
It is useful at this point to stop and consider what you have
done. In the first exercise you generated the pretest according
to
and in this exercise you constructed the program assignment variable
Z using a cutoff rule. Then, using a hypothetical program effect
of G = 7 units (G for Gain), you constructed the effect of the
program by multiplying GZ. You then copied the posttest from the
first exercise and you should recall that it was generated by
the model:
Finally, you added the effect of the program to this posttest
value and obtained the posttest for this exercise:
Again, it is always important to examine the data visually and
so should graph the univariate pretest distribution in Figure
4-1 and the univariate posttest distribution in Figure 4-2. As
in previous exercises, be sure to use a different colored pen
or pencil for the program and comparison groups. Also, estimate
the central tendency for each group on both graphs using either
the counting method or by computing the averages. You should
also graph the bivariate distribution as you did before, remembering
to keep the marks for the two groups distinct both in color and
symbol. Also, estimate the line that fits through the bivariate
data.
Let's consider the univariate distributions. Clearly,
the pretest distribution in this exercise is identical to the
pretest distribution of the first exercise. The only difference
is that the program group has scores of 14 or less and the comparison
group has scores of 15 or more. Notice that because of this the
pretest averages for the two groups are very different. This
is what we mean when we say that the regression-discontinuity
design induces maximal pretest differences between the groups.
Now look at the posttest distribution. If this was all the information
you had (i.e., you did not know the pretest information) you would
probably conclude that the program and comparison groups don't
differ much --that is, the program is not effective. It is only
when you consider how different they are on the pretest that you
can see there is a program effect, that is, the program group
did much better than would have been expected on the basis of
their pretest scores.
Next, look at the bivariate distribution. As in the previous
exercise, you visually fit separate lines for the program and
comparison groups. Let's use these jagged lines to try to estimate
a straight line that fits through the data. You will have to
do this visually. The figure below shows the dots estimated in
each column from a hypothetical example and the lines connecting
them. It also shows the straight line that we visually estimated
to fit through the jagged one. You should estimate the line for
the program and comparison groups separately. The program group
line should be to the left of column 14 (and include it) and the
comparison group one should be to the right.
You can easily estimate the slopes of these lines. First, take
the program group line. Place a dot somewhere on this line at
a point where one of the column lines intersects the straight
line. Now move exactly two columns to the right and place a dot
where the straight line intersects the column line. At this point,
you should have something that resembles the following:
You know that the horizontal line is exactly 2 units wide. Measure
the vertical distance between the two dots in your graph. Be
sure that you measure this distance in terms of the units of the
graph. Let's say that you find that it is about 1-1/2
units high. To estimate the slope, you simply construct a ratio
where the vertical distance is the numerator and the horizontal
distance is the denominator. In this example, you would calculate:
slope = (1-½)/2
= 1.5/2
= .75 or ¾
The slope enables us to say how much change in the vertical direction
we get for each 1-unit change in the horizontal direction. In
this example, for every increase of 1 unit in the X direction
we get an increase of .75 units or 3/4 unit in the Y direction.
The estimates of slope for the program and comparison group lines
should be very similar.
Now let's estimate the size of the program effect. First, draw a vertical line through the entire bivariate distribution at the cutoff point (i.e., X = 14). Place a dot where the program group straight line intersects the cutoff line. Similarly, place a dot where the comparison group line intersects the cutoff line. Now count the number of vertical units between these two dots. This is the regression-discontinuity estimate of the program effect. You should find that this estimate is about 7 units which is, of course, what you put in. This is illustrated in the figure below.
After completing the previous exercise, you should be convinced
of the following:
Although in these dice rolling simulations we have avoided presenting
statistical terminology as much as possible, our discussion of
the regression-discontinuity design would not be complete without
it. After all, the first half of the name of this design is "regression."
It should be no surprise that when we statistically analyze this
design in the real world, we use regression analysis. Here, we
consider some of the major issues involved in such an analysis.
A crucial step in the analysis of data from the regression-discontinuity
design involves guessing the true shape of the regression line.
In our example this is easy to do because we created the data
and we know that the true shape is a straight line in each group.
This is because the pretest and posttest both share the same
true score. In real life, we don't often know what the
true regression shape is, and we have to guess at it. Thus, if
you were conducting a real data analysis, you might try a variety
of regression lines until you were confident that you had captured
this true shape.
Since we know that the true shape in this case is linear, we could
construct the appropriate regression model as follows:
where:
Y = the posttest
X* = the pretest minus the cutoff value (i.e., X - 14)
Z = the 0,1 group assignment variable
b0 = the intercept, that is, the y value at which the comparison group regression line meets the cutoff line
b1 = the slope (we assume it's the same in both groups)
b2 = the program effect, that is, the amount you must add or subtract to b0 in order to find where the program group regression line meets the cutoff line.
eY = random error
In regression-discontinuity analysis, we usually subtract the
cutoff value from each pretest score before the analysis so that
the cutoff is at a value of X = 0 which is the intercept in the
model. Notice that the term b2Z
is simply the program effect b2
times the assignment variable (Z) which is exactly what we put
in as GZ. You should be able to estimate all of the b's
in the formula above from the bivariate graph. First, b0,
is the posttest value for the point you marked on the cutoff line
where the comparison group line intercepts it. Second, b1,
is the estimate of the slope. If your program and comparison
group slope estimates differed considerably take the average of
the two. Finally, b2,
is the program effect -- the posttest (Y) distance between the
two regression lines at the cutoff. Let's say that you
estimate b0 = 14, b1 = .5 and b2 = 7. You could then write out the regression formula as:
(We drop the eY term out because
that describes deviations from the regression lines.) Basically,
when you run a regression-discontinuity analysis you enter in
the values for Y, X* (remember to subtract the cutoff from each
X) and Z and the regression program gives you the estimates of
b0,
b1,
and b2.
You should be convinced that this single formula describes the
regression lines for both groups as well as the program effect.
To see this, you can construct the formula for each regression
line separately. First, construct the formula for the program
group line (substituting your own estimates instead of these)
by setting Z = 1 (remember this is the program group):
YP = 14 + .5X* + 7(1)
YP = 14 + .5X* + 7
YP = 21 + .5X*
Now you can construct the formula for the comparison group line
by substituting Z = 0.
YC = 14 + .5X* + 7(0)
YC = 14 + .5X*
Now, to convince yourself that the program effect is correctly
estimated, construct the program effect at the cutoff. Remember
that we subtracted the cutoff from each pretest value and so the
cutoff is at X* = 0. Therefore, the Y estimate for the program
group value at the cutoff in this example would be
YP = 21 + .5(0)
YP = 21
and the comparison group y value at the cutoff would be
YC = 14 + .5(0)
YC = 14
and, therefore, the program effect would be the difference between
the two groups or
YP - YC = 21 -14
YP - YC = 7
You should get a value close to the value of 7 units which is,
of course, what you put in when you constructed the data. It
should also be clear that when a dichotomous dummy variable (e.g.,
Z) is used in a regression equation, you are essentially telling
the analysis that you want to fit two lines, one for each group
having each value of Z.