Population Research Discovery Seminars
Bayesian Propagation of Record Linkage Uncertainty into Population Size Estimation with Application to Human Rights Violations
Mauricio Sadinle, Department of Biostatistics, School of Public Health, UW
12:30-1:30 PM PT
Multiple-systems or capture–recapture estimation are common techniques for population size estimation, particularly in the quantitative study of human rights violations. These methods rely on multiple samples from the population, along with the information of which individuals appear in which samples. The goal of record linkage techniques is to identify unique individuals across samples based on the information collected on them. Linkage decisions are subject to uncertainty when such information contains errors and missingness, and when different individuals have very similar characteristics. Uncertainty in the linkage should be propagated into the stage of population size estimation. We propose an approach called linkage-averaging to propagate linkage uncertainty, as quantified by some Bayesian record linkage methodologies, into a subsequent stage of population size estimation. Linkage-averaging is a two-stage approach in which the results from the record linkage stage are fed into the population size estimation stage. We show that under some conditions the results of this approach correspond to those of a proper Bayesian joint model for both record linkage and population size estimation. The two-stage nature of linkage-averaging allows us to combine different record linkage models with different capture–recapture models, which facilitates model exploration. We present a case study from the Salvadoran civil war, where we are interested in estimating the total number of civilian killings using lists of witnesses’ reports collected by different organizations. These lists contain duplicates, typographical and spelling errors, missingness, and other inaccuracies that lead to uncertainty in the linkage. We show how linkage-averaging can be used for transferring the uncertainty in the linkage of these lists into different models for population size estimation.
Mauricio Sadinle is the Genentech Distinguished Assistant Professor in the Department of Biostatistics at the University of Washington. Previously, he was a Postdoctoral Associate in the Department of Statistical Science at Duke University and the National Institute of Statistical Sciences, working under the mentoring of Jerry Reiter. He completed his PhD in the Department of Statistics at Carnegie Mellon University, where his advisor was Steve Fienberg. His undergraduate studies are from the National University of Colombia, in Bogota, where he majored in statistics. In his research he develops methodology for a variety of applied and data-driven problems. Thus far he has worked on: (1) Record linkage techniques to combine datafiles that contain information on overlapping sets of individuals but lack unique identifiers; (2) Nonignorable missing data modeling, and the usage of auxiliary information to identify nonignorable missing data mechanisms; and (3) Classification techniques that output sets of plausible labels for ambiguous sample points. He also has experience working with social network models for valued ties, and capture-recapture models in the context of human rights violations.