Part II — Computer Simulations · R

Scales and Scaling

Download the complete R script for this exercise to run it in RStudio.

⇓ Download R Script

Overview

Most measures in social research are multi-item scales — questionnaires or tests in which several questions are combined into a single composite score. This exercise generates a simulated multi-item scale with a known latent true score, then uses standard psychometric methods to examine reliability and factor structure.

Because you construct the data yourself, you know exactly how many latent factors underlie the items and exactly how reliable the scale should be. This lets you evaluate how well each psychometric method recovers the known properties.

Step 1 — Generate a One-Factor Scale

We create a single latent true score T and six observed items, each of which measures T with independent error. Items will be correlated with each other only through their shared true score.

library(psych) library(ggplot2) library(GPArotation) set.seed(42) # for reproducibility; remove to get fresh data each run n <- 500 T <- rnorm(n, mean = 0, sd = 3) # one latent true score # Generate 6 items: each = T + unique error items <- data.frame( item1 = T + rnorm(n, 0, 2), item2 = T + rnorm(n, 0, 2), item3 = T + rnorm(n, 0, 2), item4 = T + rnorm(n, 0, 2), item5 = T + rnorm(n, 0, 2), item6 = T + rnorm(n, 0, 2) ) describe(items, fast = TRUE)

All six items should have means near zero. The standard deviations will be slightly larger than 2 because each item contains both true score variance and error variance.

Step 2 — Inter-Item Correlations

Examine the correlation matrix of the six items. Because all items measure the same true score, you expect all pairwise correlations to be positive and of similar magnitude.

round(cor(items), 2) # Visualize with a correlation heatmap corrplot_data <- cor(items) corrplot_long <- data.frame( row = rep(rownames(corrplot_data), each = ncol(corrplot_data)), col = rep(colnames(corrplot_data), nrow(corrplot_data)), r = as.vector(corrplot_data) ) ggplot(corrplot_long, aes(x = col, y = row, fill = r)) + geom_tile(color = "white") + scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0, limits = c(-1, 1)) + theme_minimal() + labs(title = "Inter-Item Correlation Matrix", x = "", y = "", fill = "r")

The expected inter-item correlation can be predicted from the reliability formula. With sd(T) = 3 and sd(error) = 2, the true reliability of each item is var(T) / var(item) = 9 / (9 + 4) ≈ 0.69, and the expected inter-item correlation is approximately that value.

Step 3 — Scale Score and Reliability

Create a scale score by summing all six items. Then compute Cronbach's alpha — the most widely used measure of internal consistency reliability.

ScaleScore <- rowSums(items) alpha_result <- alpha(items) print(alpha_result$total) cat("Cronbach's alpha:", round(alpha_result$total$raw_alpha, 3), "\n") # True reliability of the 6-item scale (Spearman-Brown prediction): rxx_item <- var(T) / var(items$item1) k <- 6 rxx_scale_true <- (k * rxx_item) / (1 + (k - 1) * rxx_item) cat("Theoretical reliability (Spearman-Brown):", round(rxx_scale_true, 3), "\n")

Cronbach's alpha should be close to the theoretical reliability derived from the Spearman-Brown formula. Higher alpha means a more reliable composite scale.

Step 4 — Exploratory Factor Analysis

Factor analysis attempts to recover the latent structure underlying the items. Because we built in one true score, we expect one dominant factor.

# Scree plot — eigenvalues of the correlation matrix fa.parallel(items, fa = "fa", fm = "ml", main = "Parallel Analysis Scree Plot") # Extract 1 factor fa1 <- fa(items, nfactors = 1, fm = "ml", rotate = "none") print(fa1$loadings, cutoff = 0) cat("Factor 1 variance explained:", round(fa1$Vaccounted[2, 1] * 100, 1), "%\n")

The factor loadings for all six items should be substantial and positive (roughly equal to the square root of the item reliability). The first factor should explain a large proportion of the total variance.

Step 5 — Two-Factor Scale

Now generate a two-factor scale to see how factor analysis detects a more complex structure. Items 1–3 load on Factor 1; items 4–6 load on Factor 2.

T1 <- rnorm(n, mean = 0, sd = 3) # first factor T2 <- rnorm(n, mean = 0, sd = 3) # second factor (independent) items2 <- data.frame( item1 = T1 + rnorm(n, 0, 2), item2 = T1 + rnorm(n, 0, 2), item3 = T1 + rnorm(n, 0, 2), item4 = T2 + rnorm(n, 0, 2), item5 = T2 + rnorm(n, 0, 2), item6 = T2 + rnorm(n, 0, 2) ) round(cor(items2), 2) fa2 <- fa(items2, nfactors = 2, fm = "ml", rotate = "oblimin") print(fa2$loadings, cutoff = 0.3)

Items 1–3 should load strongly on one factor and near zero on the other; items 4–6 should show the opposite pattern. This confirms that factor analysis can correctly identify the two-factor structure we built in.

Reflections & Variations

  1. Vary reliability. Change the ratio of sd(T) to the error sd to create high-reliability (alpha > .90) and low-reliability (alpha < .50) scales. How does item reliability affect factor loadings and alpha?
  2. Number of items. Change the number of items from 6 to 3 or 12. Use the Spearman-Brown formula to predict how alpha changes, then verify empirically.
  3. Correlated factors. Generate T1 and T2 with a positive correlation (e.g., T2 <- 0.5*T1 + rnorm(n, 0, sd = sqrt(1 - 0.25)*3)). How does factor correlation affect the oblique rotation results?
  4. Cross-loading. Add a small contribution of T1 to items 4–6 (e.g., item4 = T2 + 0.3*T1 + rnorm(n, 0, 2)). Can factor analysis detect the cross-loading?
← Back to Simulation Home