Curious Now

Story

Synthetic Control Misconceptions: Recommendations for Practice

Computing

Key takeaway

Researchers find that a common method for estimating real-world impacts, called synthetic control, often produces misleading results. This is important because policymakers rely on these techniques to evaluate the effectiveness of new programs.

Read the paper

Quick Explainer

Synthetic control (SC) methods are a powerful approach for causal inference, allowing researchers to estimate the impact of a policy or intervention by constructing a synthetic comparison group. The key idea is to identify a weighted combination of control units that closely matches the treated unit's pre-intervention characteristics, and then use this synthetic control as the counterfactual. However, contrary to common misconceptions, the authors show that the performance of SC methods is highly sensitive to implementation choices, such as the optimization algorithm and handling of categorical covariates. They also find that including relevant covariates can improve accuracy, and that pre-treatment imbalance is a weak predictor of post-treatment performance. These insights provide practical guidance for researchers applying SC methods in applied settings.

Deep Dive

Technical Deep Dive

Overview

This technical deep-dive summarizes the key findings and methods of a set of studies exploring misconceptions and best practices around the use of synthetic control (SC) methods for causal inference. The studies aim to debunk several commonly held beliefs about SC implementation and performance, and provide practical guidance for researchers.

Problem & Context

Synthetic control methods have seen a rapid rise in popularity over the past 20 years, leading to the proliferation of various SC techniques and implementation approaches. However, the theoretical literature on SC has outpaced empirical evaluations of how these methods perform in applied settings. The authors identify three main misconceptions that have arisen as a result:

  1. SC is robust to various implementation choices, such as the choice of optimization algorithm and the handling of categorical/compositional covariates.
  2. Covariates are unnecessary when using SC, especially if a close pre-treatment match can be found.
  3. Lower pre-treatment outcome imbalance (RMSPE) suggests lower absolute bias in the estimated treatment effect.

Methodology

The authors conduct a simulation study grounded in an empirical case study of the Alaska Permanent Fund Dividend to evaluate the validity of these misconceptions. They generate synthetic data across a range of scenarios that vary in the degree of overlap between the treated unit (Alaska) and control units. They then compare the performance of several SC implementations, including standard Synth, Augmented Synth, Generalized Synth, and Bayesian Structural Time Series.

Data & Experimental Setup

The simulation data is calibrated using 10 years of data from the Current Population Survey (1977-1986), which includes the implementation of the Alaska Permanent Fund Dividend in 1982. The authors model the data generating process for covariates (industry, education, race, wages) and outcomes (proportion employed part-time) using a combination of Dirichlet regressions and linear models fit to the empirical data. They then generate 100 years of synthetic data under different assumptions about the overlap between Alaska and control states.

Results

The key findings from the simulation study include:

  1. Implementation Choices Matter: The choice of optimization algorithm (nested vs. regression weights) and the handling of categorical/compositional covariates (omitting reference categories) can have substantial impacts on the performance of SC methods, particularly when there is limited overlap between treated and control units.
  2. Covariates Can Improve Performance: Contrary to the misconception, including relevant covariates can improve the accuracy of SC estimates, especially when the pre-treatment time series is short. Overfitting to the pre-treatment period can be a bigger issue than the asymptotic irrelevance of covariates.
  3. Pre-Treatment Imbalance is a Weak Predictor: The relationship between pre-treatment outcome imbalance (RMSPE) and absolute bias in the estimated treatment effect is weak and inconsistent across methods and scenarios. Relying too heavily on pre-treatment imbalance can lead to overfitting and poor post-treatment performance.

Interpretation

The authors conclude that the accepted "truths" about SC methods are actually myths that are not well-supported by empirical evidence. They provide several practical recommendations for researchers:

  • Avoid using regression weights and prefer nested optimization or Augmented Synth.
  • Consider interactive fixed effects models (GSynth) as a simpler alternative to localized SC.
  • Include relevant covariates when feasible, even if they do not improve pre-treatment fit.
  • Be cautious about relying on pre-treatment imbalance as a guide for model selection or performance.

Limitations & Uncertainties

The authors acknowledge that their simulations, while grounded in an empirical case study, may not be representative of all possible real-world data generating processes. They call for further research exploring a broader range of scenarios, as well as the development of new diagnostics beyond pre-treatment RMSPE to assess SC model performance.

What Comes Next

The findings of this study underscore the need for a more cautious and nuanced approach to the use of SC methods, moving beyond the misconceptions that have proliferated in applied research. The authors hope their work will spur further empirical evaluations and the development of improved guidance for researchers applying SC techniques.

Source