Two-Sample Instrumental Variables under Population Mismatch: A Transportability Framework with Bias Diagnostics

Wait 5 sec.

Instrumental variable (IV) methods are widely used in health and social sciences to estimate causal treatment effects among compliers. In certain research settings, the instrument-treatment association (first stage) and the instrument-outcome association (reduced form) are each estimated from a different dataset. Two-Sample Instrumental Variables (TSIV), proposed by Angrist and Krueger (1992), addresses this by combining first-stage and reduced-form estimates from separate data sources into a single causal effect estimate. However, TSIV identification requires that instrument compliance behavior be consistent across the two samples, a condition that is rarely verified in practice. We show mathematically and empirically that when compliance differs between samples, the raw TSIV estimator does not converge to the true Local Average Treatment Effect (LATE) and instead attenuates toward a predictably biased limit proportional to the ratio of first-stage compliance rates between the two samples. To address this, we formalize a framework for estimating LATE with TSIV under two key assumptions: (1) Covariate Overlap, requiring that the two samples share sufficient common support in their covariate distributions, and (2) Compliance Transportability, requiring that compliance behavior is identical across populations after conditioning on observed covariates. We consider a setting in which a health policy instrument and outcomes are recorded in administrative claims while treatment and covariates are collected in a survey. We use a C-statistic derived from pooled covariates to detect population mismatch and an Inverse Probability Weighting (IPW) correction that reweights the first-stage sample to approximate the administrative covariate distribution. In Monte Carlo simulations across eight scenarios calibrated to a survey-Medicaid setting, IPW-TSIV reduces bias in estimating the LATE, achieving 88% reduction in the primary scenario, 82% under severe selection, and 79% when state-level expansion policy drives compliance heterogeneity. We further validate this framework using the Oregon Health Insurance Experiment, where partitioning the public-use lottery data (N = 24,646) into two non-overlapping samples with substantively meaningful compliance heterogeneity yields a verifiable benchmark against the true causal effect. IPW-TSIV reduces mean absolute bias by 71.6% relative to the oracle S2-specific LATE across 10 independent replications (C-statistic = 0.78), outperforms naive TSIV in all 10 splits, and reduces mean bias relative to the full-data LATE from +0.016 to +0.008. This framework provides applied researchers with actionable diagnostic thresholds to detect sample mismatch, validate transportability assumptions, and determine when structural TSIV estimation is reliable.