Computer Simulations for Research Design

William M.K. Trochim1

A complete online workbook introducing the use of computer simulations in applied social research and program evaluation. Two versions of each simulation are provided: a manual version (using dice) and a computer version using R. The major two-group program-comparison designs — the randomized experiment, regression-discontinuity design, and nonequivalent group design — are each simulated, and regression artifacts are explored.

About This Workbook

This workbook was originally developed to help students and researchers understand common quasi-experimental and experimental designs through hands-on simulation. The manual simulations in Part I use dice and hand calculations to illustrate core statistical ideas. The computer simulations in Part II use R, a free and widely used statistical programming environment, to explore the same designs more efficiently and with greater depth.

Because the data are simulated, users know exactly how the data were generated. This makes it possible to evaluate whether an analytical method recovers the true effect that was built in — a powerful way to understand both the strengths and limitations of each design and analysis approach.

Front Matter

Acknowledgments — Credits and thanks to those who contributed to the development of this workbook.

Introduction to Simulations — An overview of simulation methods in social research: what they are, why they are useful, and how the exercises in this workbook are structured.

Part I — Manual Simulations

The manual simulations use dice to generate data by hand. Working through these exercises builds intuition about random variation, group assignment, and how statistical analyses behave before moving to computer-generated data.

Introduction to Manual Simulations — Describes the dice-based data-generation process and how to use the simulation tables.

Generating Data — Roll dice to create a pretest and posttest for 50 hypothetical participants, recording observations in the standard data table.

The Randomized Experimental Design — Randomly assign participants to program and comparison groups, add a treatment effect, and analyze the results.

The Nonequivalent Group Design — Create two pretest-nonequivalent groups, add a treatment effect, and examine whether ANCOVA recovers the true effect.

The Regression-Discontinuity Design — Assign participants by a cutoff score and analyze the discontinuity at the threshold.

Regression Artifacts — Illustrates regression to the mean by selecting extreme scorers and tracking their performance on a second measure.

Part II — Computer Simulations (R)

The R simulations replicate and extend the manual exercises. Each exercise provides a complete, downloadable R script along with commentary explaining the statistical model and how to interpret the results.

Introduction to R — A brief orientation to R and RStudio for users new to the environment, including installation instructions and package setup for these exercises.

Generating Data — Use R to generate true scores and error terms, construct X and Y test scores, and explore their distributions and bivariate relationships.

The Randomized Experimental Design — Simulate a pretest-posttest randomized experiment with a 10-point treatment effect; analyze with t-test, ANOVA, and ANCOVA.

The Nonequivalent Group Design, Part I — Create nonequivalent groups with a selection advantage and a treatment effect; run ANCOVA and examine whether the estimate is biased.

The Nonequivalent Group Design, Part II — Apply a reliability-corrected ANCOVA to the NEGD data and compare estimates from the adjusted and unadjusted analyses.

The Regression-Discontinuity Design — Assign participants by a pretest cutoff, build in a 10-point treatment effect, and fit a series of progressively over-specified regression models.

Regression Artifacts — Demonstrate regression to the mean by selecting below-average and above-average scorers and observing their scores on a parallel measure.

Sampling Distribution Simulation — Repeatedly draw samples from a population and plot the resulting sampling distribution of the mean; illustrate the standard error formula empirically.

Scales and Scaling — Generate multi-item scale data with known true scores; compute reliability (Cronbach's alpha), explore inter-item correlations, and examine factor structure.

Part III — Back Matter

Applications of Simulations in Social Research — Discusses three broad uses of simulation: as a teaching tool, as a way to study design implementation problems, and as a proving ground for new analytical methods.

Conclusion — Summarizes the assumptions underlying simulation work and suggests extensions to surveys, MIS data, and other research contexts.

References — Full citations for works referenced throughout the workbook.

1 A previous version of this work was issued as: Trochim, W.M.K. & Davis, S. (2006). Computer Simulations for Research Design. Retrieved from billtrochim.net/simul/simul.htm on April 1, 2026.