*New* CSDE Computational Demography Working Group (CDWG): Dr. Ilan Strauss and Sruly Rosenblat (11/19/25)
Posted: 11/13/2025 (Local Events)

When: November 19, 2025 from 10 – 11 am
Where: Raitt 223 and on Zoom
On November 19, CSDE’s Computational Demography Working Group will host Dr. Ilan Strauss and Sruly Rosenblat from the AI Disclosures Project housed at the Social Science Research Council. Strauss and Rosenblat will present on, “Can Membership Inference Attacks Detect Paywalled Content in LLM Training Data? Lessons and limitations.” Using a legally obtained dataset of 34 copyrighted O’Reilly Media books, Strauss and Rosenblat apply the DE-COP membership inference attack method to investigate whether OpenAI’s large language models were trained on copyrighted content without consent. Results show that GPT-4o, OpenAI’s more recent and capable model, demonstrates strong recognition of paywalled O’Reilly book content (AUROC = 0.82, 95% bootstrapped CI: 0.60–0.96). GPT-4o Mini, as a much smaller model, shows no knowledge of public or non-public O’Reilly Media content (AUROC ≈ 0.50). Testing multiple models, with the same cutoff date, helps account for potential language shifts over time that might bias our findings. These results highlight the urgent need for increased corporate transparency regarding pre-training data sources as a means to develop formal licensing frameworks for AI content training.
_______________
Sruly Rosenblat is an LLM researcher for the AI Disclosures Project housed at the Social Science Research Council. He graduated with a degree in computer science from Hunter College.
Dr. Ilan Strauss is co-director of the AI Disclosures Project in New York City. He is an Honorary Senior Fellow at the UCL Institute for Innovation and Public Purpose (London) and a Visiting Associate Professor at the University of Johannesburg. He holds a Ph.D in economics from the New School for Social Research.