Fragile Families Challenge
Posted: 4/11/2017 (Funding)
The Fragile Families Challenge is a mass collaboration that will combine predictive modeling, causal inference, and in-depth interviews to yield insights that can improve the lives of disadvantaged children in the United States. By working together we can discover things that none of us can discover individually.
The Fragile Families Challenge is based on the Fragile Families and Child Wellbeing Study, which has followed thousands of American families for more than 15 years. During this time, the Fragile Families study collected information about the children, their parents, their schools, and their larger environments.
These data have been used in hundreds of scientific papers and dozens of dissertations, and insight from these studies are routinely shared with policy makers around the world through the Future of Children, which is jointly published by Princeton University and Brookings Institution. Your challenge is to use this data in a new way. Given all the background data from birth to year 9 and some training data from year 15, how well can you infer six key outcomes in the year 15 test data?
The Fragile Families Challenge is our attempt to create a new way of doing social research, one that is much more open to the talents and efforts of everyone. We expect that by combining ideas from social science and data science, we can—together—help address important scientific and social problems. And, we expect that through a mass collaboration we will accomplish things that none of us could accomplish individually.
The Fragile Families Challenge will involve two steps. In the first step, described above, participants will build statistical and machine learning models of several important outcomes in the lives of the children. Participants will then submit their code, their model outputs, and a narrative explanation of their modeling strategy. Then, we will use the unreleased test set to evaluate each model. This first step is an example of the common task method, which David Donoho (2015) has called the “secret sauce” of machine learning. At the end of the first step, we will optimally combine all the individual models into a community model. A variety of results about ensemble methods in machine learning suggest that this community model will perform better than the best individual model.
In the second step, we will use the individual models and the community model to conduct substantive and methodological research.