Part II — Computer Simulations · R

The Nonequivalent Group Design — Part I

Download the complete R script for this exercise to run it in RStudio.

⇓ Download R Script

Overview

The nonequivalent group design (NEGD) is one of the most common quasi-experimental designs in applied research. Like the randomized experiment, it involves a program group and a comparison group with pretest and posttest measurements. The critical difference is that assignment to groups is not random — instead, groups differ systematically on the pretest. This pre-existing difference is called selection bias.

The standard method for handling selection bias in the NEGD is Analysis of Covariance (ANCOVA) — regressing the posttest on both the group indicator and the pretest. In this exercise you will see that ANCOVA recovers the true treatment effect when the pretest is measured perfectly. Part II will show what happens when measurement error is present.

Step 1 — Generate Data

library(psych) library(ggplot2) T <- rnorm(500, mean = 50, sd = 5) eX <- rnorm(500, mean = 0, sd = 5) eY <- rnorm(500, mean = 0, sd = 5) X <- T + eX # pretest Y <- T + eY # posttest base

Step 2 — Non-Random Assignment

Instead of random assignment, we use a variable that is correlated with true ability. Participants above the median of RandomAssign go to the program group (Z = 1). This creates a selection advantage: the program group has, on average, 5 points higher true ability than the comparison group.

RandomAssign <- rnorm(500, mean = 0, sd = 5) Z <- ifelse(RandomAssign > median(RandomAssign), 1, 0) # Add a 5-point selection advantage to the program group on both pre and post Xnegd <- X + (5 * Z) # program group has higher pretest Ynegd <- Y + (5 * Z) # same selection carried into posttest NEGDData <- data.frame(Xnegd, Ynegd, Z) describeBy(NEGDData, group = NEGDData$Z, fast = TRUE)

Confirm that the program group has a higher mean on both the pretest and posttest (even before any treatment is added) — this is the selection bias.

Step 3 — Add the Treatment Effect

Now add a 10-point treatment effect to the program group's posttest score. The final posttest reflects both the selection advantage (5 points) and the treatment effect (10 points).

Ynegdgain <- Ynegd + (10 * Z) NEGDDataGain <- data.frame(Xnegd, Ynegdgain, Z) describeBy(NEGDDataGain, group = NEGDDataGain$Z, fast = TRUE)

The posttest difference between groups should be about 15 points (5 selection + 10 treatment). A naive comparison of posttest means would overestimate the treatment effect because it ignores the pre-existing group difference.

Step 4 — Visualize

ggplot(NEGDDataGain, aes(x = Xnegd)) + geom_histogram(bins = 30, fill = "#0E7C6A", color = "white") + facet_wrap(~ Z, labeller = labeller(Z = c("0" = "Comparison", "1" = "Program"))) + theme_minimal() + labs(title = "Pretest Distribution by Group", x = "Pretest") ggplot(NEGDDataGain, aes(x = Xnegd, y = Ynegdgain, colour = factor(Z), shape = factor(Z))) + geom_point(alpha = 0.4, size = 2) + scale_colour_manual(values = c("0" = "#888", "1" = "#0E7C6A"), labels = c("Comparison", "Program")) + scale_shape_manual(values = c("0" = 1, "1" = 16), labels = c("Comparison", "Program")) + theme_minimal() + labs(title = "NEGD: Posttest vs. Pretest by Group", x = "Pretest", y = "Posttest", colour = "Group", shape = "Group")

Step 5 — ANCOVA

Fit an ANCOVA model with the pretest as covariate. Because the pretest is measured perfectly in this simulation (no measurement error in the covariate), ANCOVA should give an unbiased estimate of the true treatment effect of 10 points.

ModelANCOVA <- lm(Ynegdgain ~ Z + Xnegd, data = NEGDDataGain) summary(ModelANCOVA) ggplot(NEGDDataGain, aes(x = Xnegd, y = Ynegdgain, colour = factor(Z), shape = factor(Z))) + geom_point(alpha = 0.4, size = 2) + geom_line(aes(y = predict(ModelANCOVA)), linewidth = 1) + scale_colour_manual(values = c("0" = "#888", "1" = "#0E7C6A"), labels = c("Comparison", "Program")) + scale_shape_manual(values = c("0" = 1, "1" = 16), labels = c("Comparison", "Program")) + theme_minimal() + labs(title = "ANCOVA Fit: NEGD", x = "Pretest", y = "Posttest", colour = "Group", shape = "Group")

The coefficient for Z should be close to 10. The bivariate plot should show two parallel lines separated by approximately 10 units. When the pretest covariate is measured without error, ANCOVA "adjusts away" the selection bias and recovers the treatment effect.

Part II of this exercise will introduce measurement error into the pretest, demonstrating why reliability of the covariate matters for ANCOVA in quasi-experimental designs.

Reflections & Variations

  1. Change selection advantage. Replace 5 * Z with 10 * Z or 1 * Z for the selection. Does the magnitude of selection bias affect ANCOVA's ability to recover the true effect?
  2. Disadvantaged group. Make the program group the lower-scoring group by using -5 * Z as the selection advantage. This simulates a compensatory program where low-scorers are treated.
  3. No treatment effect. Set the treatment effect to zero. Does ANCOVA correctly estimate zero, or does it over- or under-correct for selection bias?
  4. Negative treatment effect. Use -10 * Z. How does the bivariate plot change? What would this look like in a real evaluation scenario?
← Back to Simulation Home