In this exercise you will simulate a simple pretest-posttest randomized
experimental design. This design is of the form
and thus has a pretest, a posttest, and two groups that have been
randomly assigned. Note that in randomized designs a pretest
is technically not required although one is often included as
a covariate to increase the precision of program effect estimates.
We will assume that we are comparing a program and comparison
group (instead of two programs or different levels of the same
program),
To begin, get into MINITAB in the usual manner. You should see
the MTB prompt (which looks like this MTB>). Now you are ready
to enter the following commands.
You will create two hypothetical tests as in previous exercises.
Here, one test will be considered the pretest, the other the
posttest. Assume that both tests measure the same true ability
and that they each have their own unreliability or error:
MTB> Random 500 C1;
SUBC> Normal 50 5.
MTB> Random 500 C2;
SUBC> Normal 0 5.
MTB> Random 500 C3;
SUBC> Normal 0 5.
Here C1 represents the true ability on the tests for 500 people.
C2 and C3 represent random error for the pretest and posttest
respectively. Notice that the mean ability score for the tests
will initially be set to 50 test score units. Next, construct
the observed test scores:
MTB> Add C1 C2 C4.
MTB> Add C1 C3 C5.
You should notice that each test has about equal amounts of true
score and error (because all three Random/Normal statements above
use a 5 unit standard deviation). Now, name the columns:
MTB> Name C1 = 'true' C2 ='x error' C3 ='y error' C4 = 'pretest' C5 = 'posttest'
So far you have created a pretest and post for 500 hypothetical
persons. Next, you need to randomly assign half of the people
to the treated group and half to the control. One way to do this
is to create a new random number for each individual. You will
then use this variable to assign cases randomly. Since we want
equal size groups (250 in each) you can assign all persons less
than or equal to the median on this random number to one group,
and all above the median to the other. Here is the way to do
this:
MTB> random 500 C6;
SUBC> normal 0 5.
creates the random assignment number
MTB> let k1=min(C6)
MTB> let k2=median(C6)
MTB> let k3=max(C6)
gets the minimum, median and maximum values on this random assignment
number. And
MTB> code (k1:k2) 0 (k2:k3) 1 c6 c7
creates the two equal size groups. To confirm that they are equal
in size, do
MTB> table c7
and you should see that there are 250 0's and 1's.
Now, to be consistent with other exercises and to get rid of the
unnecessary variable, put C7 into C6 and erase C7
MTB> let C6=C7 MTB> erase C7
Then, name C6
MTB> name C6='group'
Try the following three statements to verify that you have two
groups of 250 persons:
MTB> Sign C6
MTB> Histogram 'Group'
Each of these presents slightly different information but both
verify that you have two equal sized groups.
Now that you have created two groups, let's say that your treatment
had an effect. To put in an effect you have to create a posttest
score that has something added into it for those people who received
the treatment, and does not add this in for the control cases.
Remember that to create the posttest originally, you just added
together the True Score and Posttest Error for each individual.
To create the posttest with a 10-point treatment effect built
in, you would use the following formula
where Z is the 0,1 group variable (C6) you just created. To do
this in MINITAB do
MTB> let c7=c1 + c3 + (10*c6)
MTB> name c7='postgain'
Now, c5 is the posttest when there is no treatment effect and
c7 is the posttest when there is a 10-point treatment effect.
At this point, it's worth stopping and thinking about what you've
done. You created a random True Score (C1) and added it to independent
error (C2) to create a pretest (C4) and to other independent error
(C3) to create a posttest (C5). Then you randomly assigned half
of the people to a treatment (C6=1) and to a control (C6=0) condition.
Finally, you created a posttest that has a 10-point treatment
effect in it (C7). If this were a real study (and not a simulation),
you would observe only three variables: the pretest (X, C3), the
group (Z, C6) and the posttest with a treatment effect in it (Y,
C7).
Let's imagine how we might analyze the data using these three
variables, in order to see whether the treatment has an effect.
One of the first things we might do is to look at some simple
distributions for the pretest and posttest. First, look at some
histograms:
MTB> Histogram 'pretest'.
MTB> Histogram 'postgain'.
MTB> Histogram 'pretest';
SUBC> MidPoint;
SUBC> Bar 'group'.
MTB> Histogram 'postgain';
SUBC> MidPoint;
SUBC> Bar 'group'.
The first two commands show the histograms for all 500 cases while
the last two show histograms for the two groups separately. Can
you see that the two groups differ on average on the posttest?
Now, look at the bivariate distribution
MTB> Plot 'postgain' * 'pretest';
SUBC> Symbol 'group'.
You should see that the treated group has lots more high posttest scorers than the control
group.
Now, look at some descriptive statistics tables.
MTB> Table 'Group';
SUBC> Means 'pretest' 'postgain';
SUBC> StDev 'pretest' 'postgain';
SUBC> N 'pretest' 'postgain'.
Here you should see clearly that while the two groups are very
similar in average value on the pretest, they differ by nearly
10 points on the posttest.
In a randomized experiment, you technically don't need to measure
a pretest. You could have the design:
If you did, all you would be able to do to look for treatment
effects is to compare the groups on the posttest. This might
best be accomplished by conducting a simple t-test on the posttest
MTB> TwoT 95.0 c7 c6;
SUBC> alternative 0.
You can get the same result by using regression analysis with
the following formula
where
Y = posttest
Z = the 0,1 assignment variable
b0 = posttest mean of the comparison group
b1 = difference between the program and comparison group posttest means
eY = random error
This model can be run in MINITAB using
MTB> Regress 'postgain' 1 'Group'.
This regresses the posttest score onto the 0,1 group variable
Z. The results for both the t-test and regression versions should
be identical, but you have to know where to look to see this.
In the t-test results, the last line will say in it 'T=' and
report a t-value. The way you set up the simulation, this t-value
should be negative in value (because it tests the control-treatment
group difference which should be negative because the treatment
group mean is larger by about ten points). Now look at the regression
table under the heading 't-ratio'. The t-ratio for Group should
be the same as the t-test result (except that the sign is reversed).
In general, the regression analysis method of testing for differences
easier to use and interpret than the t-test results. In the regression
results, b0
is the coefficient for the Constant and b1
is the coefficient for Group. The b0
in this case is actually the average posttest value for the control
group. The b1
is the amount you add to the control group average to get the
treatment group posttest average, that is, the estimate of the
difference between the two groups on the posttest. This should
be somewhere around 10 points. Both coefficients are tested with
a t-test. The p-value tells you the probability that the estimated
coefficient was obtained by chance.
So far, all you've done is to look at the difference between groups
on the posttest. But you also have a pretest measured. How does
this pretest help in analyzing the data? In a randomized experiment,
the pretest (or any other covariate) is used to reduce variability
in the posttest that is unrelated to the treatment. If you reduce
posttest variability in this way, it should be easier to see a
treatment effect. In other terms, for the very same posttest,
including a good pretest should yield a higher t-value associated
with the coefficient for differences between groups. To see this,
you have to run a regression model that includes the pretest values
in it. This model is:
where
Y = the posttest
X = the pretest
Z = the assignment variable
b0 = the intercept of the comparison group line
b1 = slope of regression lines
b2 = the program effect
eY = random error
You can run this in MINITAB by doing:
MTB> Regress 'postgain' 2 'pretest' 'Group'.
Now, if you look at the t-ratio associated with the Group variable
you should see that it is higher than it was in the original regression
equation you ran. Even though you used the exact same posttest
variable, you are able to see the treatment effect more clearly
(i.e., got a higher t-value) because you included a good covariate
(the pretest) that reduced some of the noise in the posttest that
might obscure the treatment effect.
At this point you should be convinced of the following:
YP = b0 + b1(1)
YP = b0 + b1
YC = b0 + b1(0)
YC = b0
You should be convinced that this is the difference between the
posttest means for the two groups.