*New* CSDE Computational Demography Working Group (CDWG): Kentaro Hoffman (01/28/26)
Machine learning is increasingly used in demography to predict quantities that were once directly observed. Yet predictions are often treated as data, a practice that can lead to biased estimates and misleading uncertainty. This talk introduces Inference on Predicted Data (IPD), a framework for conducting valid statistical inference when outcomes are generated by black-box prediction models rather than measured directly.
I illustrate IPD through an application to verbal autopsies, where causes of death are inferred from free-text narratives using modern NLP methods, including large language models. While these models can achieve high predictive accuracy, naïvely using predicted causes of death in downstream analyses produces distorted demographic patterns. IPD-based corrections leverage a small amount of labeled data to recover valid estimates and uncertainty, even under prediction error and distribution shift.
The results highlight a key lesson for computational demography: accurate predictions alone are not enough for reliable population inference.
Kentaro Hoffman (University of Washington) is a statistician whose research focuses on inference with AI-generated and predicted data, uncertainty quantification, and responsible machine learning. He was previously a postdoctoral scholar at the University of Washington and Johns Hopkins University, working with Tyler McCormick, Peter Searson, and Scott Zeger. His work lies at the intersection of statistics, machine learning, and computational demography, with applications including verbal autopsies, global mortality estimation, electronic medical records, and active learning.
CSSS Seminar: Jevin West on “Epistemic Diversity Across Language Models Mitigates Knowledge Collapse” (01/28/26)
When: January 28, 2026 at 12:30 pm
Where: 409 Savery Hall and on Zoom
Title: Epistemic Diversity Across Language Models Mitigates Knowledge Collapse” (Jevin West)
The growing use of artificial intelligence (AI) raises concerns of knowledge collapse, i.e., a reduction to the most dominant and central set of ideas. Prior work has demonstrated single-model collapse, defined as performance decay in an AI model trained on its own output. Inspired by ecology, we ask whether AI ecosystem diversity, that is, diversity among models, can mitigate such a collapse. We build on the single-model approach but focus on ecosystems of models trained on their collective output. We find that increased epistemic diversity mitigates collapse, but, interestingly, only up to an optimal level. In the context of AI monoculture, our results suggest the need to monitor diversity across AI systems and to develop policies that incentivize more domain- and community-specific models.
Jevin West is the co-founder of the new Center for an Informed Public at UW aimed at resisting strategic misinformation, promoting an informed society and strengthening democratic discourse. His research and teaching focus on the impact of data and technology on science and society, with a focus on slowing the spread of misinformation. He is the co-author of the new book, Calling Bullshit: The Art of Skepticism in a Data-Driven World, which helps non-experts question numbers, data, and statistics without an advanced degree in data science.
SoDa Symposium on Privacy: Balancing Statistical and Non-Statistical Uses of Federal Data: Privacy, Governance, and Public Trust (01/28/26)
The University of Maryland’s Social Data Science (SoDa) Center invites you to join a symposium in celebration of privacy week. The symposium on January 28, 2026 at 11 am PT will feature two presentations followed by Q&A. The registration link for this free webinar can be found here.
The U.S. federal government has long maintained a clear line between the statistical and non-statistical uses of the public’s information. The former includes purposes such as producing the Consumer Price Index; the latter includes determinations, such as about a specific household eligibility for a program. This functional separation has guided federal data practice for 50 years, and this safeguard is encoded in federal laws such as U.S. Code Title 13 and Confidential Information Protection and Statistical Efficiency Act (CIPSEA). This Privacy Day webinar examines the origins of this boundary (statistical vs. non-statistical purposes), how it is enforced today, and what it will take to preserve this crucial principle in an evolving federal data landscape.
1) The Evolution and Interpretation of “Statistical Purposes”
Sallie Ann Keller
Chief Scientist and Associate Director for Research and Methodology
U.S. Census Bureau
and
Michael B. Hawes
Senior Statistician for Scientific Communication
U.S. Census Bureau
Abstract: Data subjects are often told their information will be used for “statistical purposes,” and statistical agencies are legally required to use these data “for statistical purposes only,” but what does this actually mean? In today’s data-driven world, statistics is a far-reaching and expansive discipline, actively used across virtually all scientific fields, and extensively leveraged, with myriad daily implications, both large and small, for the average person. With statistics being so broad a discipline, one might expect that the term “statistical purposes” (intuitively, those actions taken in pursuit of the generation, use, or interpretation of statistics), would be similarly expansive. We find in practice the definition has both been narrowed and expanded over time. In this presentation, we explore how the term “statistical purposes” has been ambiguously defined in law and regulation, and how it has been interpreted in practice by the U.S. federal statistical system. We will analyze how the legal and ethical guardrails of “statistical purposes” align with the core objectives and mission of a statistical agency.
2) The (Real and Imagined) Bounds of Statistical Purpose
Alexandra Wood
Visiting Assistant Professor of Artificial Intelligence, Policy, and Society
Department of Political Science
Purdue University
Abstract: Statistical purpose is a fundamental yet under-explored concept embedded in regulatory frameworks for privacy, data protection, and statistical confidentiality. As a special case of purpose limitation, it bounds the scope of permissible processing activities and implicates regulatory requirements distinct from those applicable to processing for other purposes. Originally intended to protect statistical integrity in the context of official statistics, the concept of statistical purpose has increasingly been applied in broader contexts. However, there is a notable lack of a consistent definition or clear guidance for determining when information is being processed for statistical purposes. In the absence of such guidance, a very wide range of interpretations of statistical purpose has emerged, often founded in differing assumptions about informational risk. This talk examines these interpretations, the assumptions they rely on, and their implications for policy and practice. Drawing on insights from statistical policy, ethics, and privacy research, it offers recommendations for clarifying and strengthening statistical purpose provisions in law and guidance.
Postdoctoral Research Associate, Dr. Nina Brooks, Environment and Sustainability – University of Michigan (01/30/26)
Post-Doctoral Associate Division of Social Science, Dr. Stephanie Helleringer – NYU Abu Dhabi (02/01/26)
Professors of Practice (2) – Washington University in St. Louis (02/01/26)
PAA Webinar: Immigration’s Role in Workforce Sustainability (01/30/26)
Join this webinar on Friday, January 30, at 12:00pm ET, with an expert panel of scientists providing a demographic overview of the U.S. immigrant population. The panel will discuss how immigrants contribute to the U.S. workforce and the solvency of social insurance programs, including Medicare and Social Security. Participants will also learn how some recent policy changes are impacting specific industries, including the technology and agricultural sectors. Read more and register.
Panelists include:
- Matthew Hall, Cornell University
- Catalina Amuedo-Dorantes, University of California-Merced
- Phillip Connor, Princeton University
- Chloe East, University of Colorado-Boulder
- Irma Elo, PAA Past President, University of Pennsylvania
IPUMS 2026 Data Intensive Research Conference – Minneapolis, MN (Apply by 01/30/26)
Abstract submissions are now open for the 2026 Data-Intensive Research Conference. The 2026 conference theme is Novel Data Linkages and Innovative Life Course Research. Enriching population data through data linkage creates novel data sources that can shed light on life course processes. Linking across time allows for the examination of transitions and trajectories and linking to contextual information situates the experiences of individuals and populations in their environments. Review the call for proposals and submit an abstract.