The Social Genome Model (SGM) is a lifecycle model that uses data from three longitudinal surveys to track a matched panel of individuals from birth to age 30, with projected estimates of earnings through age 65. The goal is to understand how private and public policy interventions could improve lifetime outcomes of children and young adults. The model also allows researchers to track patterns of development across different gender and racial or ethnic groups.

This technical document outlines the process of creating the SGM Early Childhood version of the model. This version of the model is an alternative to the SGM version 2.1 as it includes additional data for the early childhood life stages. The early childhood data are estimated using the Early Childhood Longitudinal Study – Birth Cohort (ECLS-B), which is a restricted-use database that requires that all users have a license to use the data.  Analyses using these data, and therefore the SGM Early Childhood version, must be conducted in an approved secure data room, and the results of analyses must be cleared by the Institute of Education Sciences before they can be shared with anyone who does not have the necessary license.

First, we provide an overview of the conceptual framework underlying the model, then describe the three datasets used: the Early Childhood Longitudinal Study, Birth Cohort (ECLS-B); the Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLS-K); and the National Longitudinal Survey of Youth 1997 (NLSY). Next, we explain the process of matching observations across the ECLS-K and NLSY datasets to create the matched panel and the validation for our matching approach. We then explain the process of imputing early childhood data for the matched panel data set. We then show summary statistics for the variables included in our final dataset and discuss the parameterization of the model.