Regression Artifacts


In this exercise we are going to look at the phenomenon of regression artifacts or "regression to the mean." First, you will use the data from the original simulation and create nonequivalent groups just like you did in the Nonequivalent Group Design exercise. Then you will "match" persons from the program and comparison groups who have the same pretest scores, dropping out all persons for whom there is no match. You do this because you are concerned that the groups have different pretest averages, and you would like to obtain "equivalent" groups. Second, you are going to regraph the data for all 50 persons from the Generating Data (GD) exercise, to gain a deeper understanding of regression artifacts.

To begin, review what you did in the NEGD exercise. Starting with 50 pretest and posttest scores (each composed of a common true score and unique error components), you first made the groups nonequivalent on the pretest by adding 5 to each program person's pretest value. This initial difference was the same on the posttest, and so you added the same 5 points there. Finally, you included a program effect of 7 points, added to each program person's posttest score.

In this exercise, you will start with the data in the GD exercise, and will do the same thing you did in the NEGD exercise except that we will not add in a program effect. That is, in this simulation we assume that the program either was never given or did not work (i.e., the null case). The first thing you need to do is to copy the pretest scores from column 5 of Table 1-1 into column 2 of Table 5-1. Now, you have to divide the 50 participants into two nonequivalent groups. We can do this in several ways, but the simplest would be to consider the first 25 persons as being in the program group and the second 25 as being in the comparison group. The pretest and posttest scores of these 50 participants were formed from random rolls of pairs of dice. Be assured, that on average these two subgroups should have very similar pretest and posttest means. But in this exercise we want to assume that the two groups are nonequivalent and so we will have to make them nonequivalent. The easiest way to make the groups nonequivalent on the pretest is to add some constant value to all the pretest scores for persons in one of the groups. To see how you will do this, look at Table 5-1. You should have already copied the pretest scores (X) for each participant into column 2. Notice that column 3 of Table 5-1 has a number "5" in it for the first 25 participants and a "0" for the second set of 25 persons. These numbers describe the initial pretest differences between these groups (i.e., the groups are nonequivalent on the pretest). To create the pretest scores for this exercise, add the pretest scores from column 2 to the constant values in column 3 and place the results in column 4 of Table 5-1 under the heading "Pretest (X) for Regression Artifacts". Note that the choice of a difference of 5 points between the groups was arbitrary. Also note that in this simulation we have let the program group have the pretest advantage of 5 points.

Now you need to create posttest scores. You should copy the posttest scores from column 6 of Table 1-1 directly into column 5 of Table 5-1. In this simulation, we will assume that the program either has no effect or was never given, and so you will not add any points to the posttest score for the effect of the program. But we assume that the initial difference between the groups persists over time, and so you will add to the posttest the 5 points that describes the nonequivalence between groups. In Table 5-1, the initial group difference (i.e., 5 points difference) is listed again in column 6. Therefore, you get the final posttest score by adding the posttest score in column 5 and the group differences in column 6. The sum should be placed in column 7 of Table 5-1 labeled "Posttest Y for Regression Artifacts".

Now, just as you have done in previous exercises, plot the pretest and posttest frequency distributions in Figures 5-1 and 5-2, being sure to use different colors for the program (persons 1-25) and comparison (persons 26-50) groups. Also, estimate the central tendency for each group on both the pretest and posttest. You should notice that the average of the program groups is about 5 points higher than the average of the comparison group on both measures.

If you were conducting a nonequivalent group design quasi-experiment and obtained the pretest distribution in Figure 5-1, you would rightly be concerned that the two groups differ prior to getting the program. To remedy this, you might think it is a good idea to look for persons in both groups who have similar pretest scores, and use only these matched cases as the program and comparison groups. You might conclude that by only using persons "matched" on the pretest you can obtain "equivalent" groups.

You will match persons on their pretest scores, and put the matched cases in Table 5-2. To do this, first look at the pretest frequency distribution in Figure 5-1. Notice again that the comparison group tended to score lower. Beginning at the lowest pretest score and moving upwards, find the lowest pretest score at which there are both program and comparison persons. Most likely there will be more comparison persons than program ones at the first score that has both. For instance, let's imagine that the pretest score of 9 is the first score that has persons from both groups and that at this value there are two cases from the comparison group and one from the program group. Obviously you will only be able to find one matched pair--you will have to throw out the data from one of the comparison group person because there is only a single program group case available for matching. Since the dice used to generate the data yield random scores, you can simply take the first person in the comparison group (Table 5-1, persons 26-50) who scored a 9 on the pretest. Record that person's ID number in column 1 of Table 5-2, their pretest in column 2 and their posttest score in column 3. Next, find the program person (in Table 5-1, persons 1-25) who also scored a 9 on the pretest and enter that person's ID number in column 4 of Table 5-2, their pretest in column 5 and their posttest score in column 6. Then move to the next highest pretest score in Figure 5-1 for which there are persons from both groups. Again, find matched pairs, and enter them into Table 5-2. Continue doing this until you have obtained all possible matched pairs. Notice that you should never use the same person more than once in Table 5-2.

At this point, you have created two groups matched on the pretest. To do so, you had to eliminate persons from the original sample of 50 for whom no pretest matches were available. You may now be convinced that you have indeed created "equivalent" groups. To confirm this, you might calculate the pretest averages of the program and comparison groups. They should be identical.

Have you in fact, created "equivalent" groups? Have you removed the selection bias (of 5 points) by matching on the pretest? Remember that you have not added in a program effect in this exercise. If you successfully removed the selection difference on the pretest by matching, you should find no difference between the two groups on the posttest (because you only put in the selection difference between the two groups on the posttest). Calculate the posttest averages for the program and comparison groups in Table 5-2. What do you find?

Most of you will find that on the posttest the program group scored higher on average than the comparison group did. If you were conducting this study, you might conclude that although the matched groups start out with equal pretest averages, they differ on the posttest. In fact, you would be tempted to conclude that the program is successful because the program group scored higher than the comparison group on the posttest. But something is obviously wrong here--you never put in a program effect! Therefore, the posttest difference that you are finding must be wrong.

To discover what is wrong you will plot the data in Table 5-2 in a new way. Look at Figure 5-4 labeled "Pair-Link Diagram". Starting with only the comparison persons in Table 5-2, draw a straight line between the pretest and posttest scores of each person. Do the lines tend to go up, down, or stay the same from pretest to posttest? Next, using a different colored pen, draw the lines for the program group persons in Table 5-2. In which direction do these lines go? You should find that most of the program group lines go down while most of the comparison group lines go up from pretest to posttest. As a result of what you have seen, you should be convinced of the following:


Why do regression artifacts occur? We can get some idea by looking at a pair-link diagram for the entire set of 50 persons in the original Generating Data exercise. Draw the pair-links for each of the 50 persons of Table 1-1 on Figure 5-5. Recall that for this original set of data we had only one group (i.e., no program and comparison group), no selection biases and no program effects. You should be convinced of the following:


Regression Artifacts
Table 5-1
1
2
3
4
5
6
7
Person
Pretest
X
from
Table 1-1
Pretest
Group
Difference
Pretest
(X) for
Regression Artifacts
Posttest
Y
from
Table 1-1
Posttest
Group
Difference
Posttest
(Y) for
Regression Artifacts
1
 
5
  
5
 
2
 
5
  
5
 
3
 
5
  
5
 
4
 
5
  
5
 
5
 
5
  
5
 
6
 
5
  
5
 
7
 
5
  
5
 
8
 
5
  
5
 
9
 
5
  
5
 
10
 
5
  
5
 
11
 
5
  
5
 
12
 
5
  
5
 
13
 
5
  
5
 
14
 
5
  
5
 
15
 
5
  
5
 
16
 
5
  
5
 
17
 
5
  
5
 
18
 
5
  
5
 
19
 
5
  
5
 
20
 
5
  
5
 
21
 
5
  
5
 
22
 
5
  
5
 
23
 
5
  
5
 
24
 
5
  
5
 
25
 
5
  
5
 

Regression Artifacts
Table 5-1
(cont.)
1
2
3
4
5
6
7
Person
Pretest
X
from
Table 1-1
Pretest
Group
Difference
Pretest
(X) for
Regression Artifacts
Posttest
Y
from
Table 1-1
Posttest
Group
Difference
Posttest
(Y) for
Regression Artifacts
26
 
0
  
0
 
27
 
0
  
0
 
28
 
0
  
0
 
29
 
0
  
0
 
30
 
0
  
0
 
31
 
0
  
0
 
32
 
0
  
0
 
33
 
0
  
0
 
34
 
0
  
0
 
35
 
0
  
0
 
36
 
0
  
0
 
37
 
0
  
0
 
38
 
0
  
0
 
39
 
0
  
0
 
40
 
0
  
0
 
41
 
0
  
0
 
42
 
0
  
0
 
43
 
0
  
0
 
44
 
0
  
0
 
45
 
0
  
0
 
46
 
0
  
0
 
47
 
0
  
0
 
48
 
0
  
0
 
49
 
0
  
0
 
50
 
0
  
0
 

Regression Artifacts
Figure 5-1

Regression Artifacts
Figure 5-2

Regression Artifacts
Figure 5-3

Table 5-2
Matched Cases from Table 5-1
1
2
3
4
5
6
Comparison Group Person Number
Pretest X from Table 5-1
Posttest Y from Table 5-1
Program Group Person Number
Pretest X from Table 5-1
Posttest Y from Table 5-1
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
 
Comparison Group
Pretest Average=
Comparison Group
Posttest Average=
 
Program
Group Pretest
Average=
Program
Group Posttest
Average=

Regression Artifacts
Figure 5-4
Pair-Link Diagram

Regression Artifacts
Figure 5-5
Pair-Link Diagram


Simulation Home Page
Copyright © 1996, William M.K. Trochim