Bayesian Propagation of Record Linkage Uncertainty into Population Size Estimation with Application to Human Rights Violations
Posted: 10/18/2018 (CSDE Seminar Series)
Mauricio Sadinle, Department of Biostatistics, School of Public Health, UW
Multiple-systems or capture–recapture estimation are common techniques for population size estimation, particularly in the quantitative study of human rights violations. These methods rely on multiple samples from the population, along with the information of which individuals appear in which samples. The goal of record linkage techniques is to identify unique individuals across samples based on the information collected on them. Linkage decisions are subject to uncertainty when such information contains errors and missingness, and when different individuals have very similar characteristics. Uncertainty in the linkage should be propagated into the stage of population size estimation. We propose an approach called linkage-averaging to propagate linkage uncertainty, as quantified by some Bayesian record linkage methodologies, into a subsequent stage of population size estimation. Linkage-averaging is a two-stage approach in which the results from the record linkage stage are fed into the population size estimation stage. We show that under some conditions the results of this approach correspond to those of a proper Bayesian joint model for both record linkage and population size estimation. The two-stage nature of linkage-averaging allows us to combine different record linkage models with different capture–recapture models, which facilitates model exploration. We present a case study from the Salvadoran civil war, where we are interested in estimating the total number of civilian killings using lists of witnesses’ reports collected by different organizations. These lists contain duplicates, typographical and spelling errors, missingness, and other inaccuracies that lead to uncertainty in the linkage. We show how linkage-averaging can be used for transferring the uncertainty in the linkage of these lists into different models for population size estimation.
Time: 12:30-1:30 PM
Location: 121 Raitt