The Regression-Discontinuity Design
Overview
The regression-discontinuity design (RDD) is a quasi-experimental design in which participants are assigned to groups solely on the basis of whether their pretest score falls above or below a predetermined cutoff value. This strict assignment rule creates a sharp discontinuity in the pre-post bivariate distribution at the cutoff — and that discontinuity is the estimate of the treatment effect.
Because assignment is deterministic given the cutoff, the RDD can yield a valid causal estimate under assumptions that are weaker than those required for the nonequivalent group design. The key analytic challenge is that the bivariate function between pretest and posttest must be correctly specified. This exercise demonstrates a systematic over-specification strategy: fit increasingly complex models and verify that the treatment effect estimate is stable.
We use the following notation for the design:
where C indicates cutoff-based assignment and X the treatment.
Step 1 — Generate Data
Step 2 — Assign Groups by Cutoff
In a compensatory program, participants who score at or below the cutoff receive the
treatment. Here the cutoff is 50 (approximately the pretest mean). Participants with
pretest ≤ 50 are in the program group (Z = 1).
Step 3 — Build the Posttest
The posttest is constructed from true score plus posttest error plus a 10-point treatment effect for program group members.
The program group should start with a lower pretest mean (selected from the lower half of the distribution) but, because of the 10-point effect, should end up with a posttest mean that is close to or exceeds the comparison group's posttest mean.
Step 4 — Visualize the Discontinuity
You should see a visible jump at the cutoff of 50. The bivariate distribution "steps up" at that point for the program group — that step is the treatment effect.
Step 5 — Prepare Analysis Variables
Center the pretest at the cutoff. This ensures the regression estimates the program effect at the cutoff point (where the pretest equals zero after centering), rather than at an arbitrary pretest value of zero.
Step 6 — Progressive Regression Analysis
Fit a series of regression models, adding higher-order terms at each step. For each model, the coefficient for Z is the estimate of the treatment effect. In a correctly specified model it should be close to 10 in all steps.
Model 1 — linear, equal slopes (standard ANCOVA):
Model 2 — linear, unequal slopes (add linear interaction I1):
Model 3 — quadratic, unequal linear slopes (add quadratic term pre2):
Model 4 — quadratic, unequal slopes in both linear and quadratic (add quadratic interaction I2):
As more terms are added, unnecessary coefficients should be near zero and the treatment effect estimate should remain stable near 10. The standard error will increase slightly with each added term (precision is lost when superfluous predictors are included), but the estimate stays unbiased.
In a real study you would fit these models to assess whether a linear fit is adequate or whether curvature is needed. The stability of the treatment estimate across models is evidence that your specification is correct.
Reflections & Variations
-
Change pretest reliability. Vary the ratio of
sd(T)tosd(eX). Try a perfectly reliable pretest (omiteXentirely:pretest <- T). How does pretest reliability affect the treatment effect estimate? -
Change posttest reliability. Vary
sd(eY). How does posttest reliability affect the precision of the estimate? -
Elite program. Reverse the cutoff rule so that the highest
pretest scorers receive the program (
Z <- ifelse(pretest > cutoff, 1, 0)). In what real-world situations would high scorers receive a new program? -
Negative treatment effect. Use
-10 * Z. How does the direction of the discontinuity change in the bivariate plot?