*New* CSDE Computational Demography Working Group: Kentaro Hoffman on Inference on Predicted Data and its Implications for Demography (04/01/26)
Posted: 3/19/2026 (Local Events)

Abstract: Machine learning is increasingly used in demography to predict quantities that were once directly observed. Yet predictions are often treated as data, a practice that can lead to biased estimates and misleading uncertainty. This talk introduces Inference on Predicted Data (IPD), a framework for conducting valid statistical inference when outcomes are generated by black-box prediction models rather than measured directly.
I illustrate IPD through an application to verbal autopsies, where causes of death are inferred from free-text narratives using modern NLP methods, including large language models. While these models can achieve high predictive accuracy, naïvely using predicted causes of death in downstream analyses produces distorted demographic patterns. IPD-based corrections leverage a small amount of labeled data to recover valid estimates and uncertainty, even under prediction error and distribution shift.
The results highlight a key lesson for computational demography: accurate predictions alone are not enough for reliable population inference.
Date: 04/01/2026
Time: 10-11 AM
Location: Raitt 223 and on zoom