Santiago Pérez

Department of Economics
University of California at Davis
One Shields Avenue
Davis, CA 95616

NBER Program Affiliations: DAE
NBER Affiliation: Faculty Research Fellow
Institutional Affiliation: University of California at Davis

NBER Working Papers and Publications

May 2019Automated Linking of Historical Data
with Ran Abramitzky, Leah Platt Boustan, Katherine Eriksson, James J. Feigenbaum: w25825
The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. We evaluate different automated methods for record linkage, performing a series of comparisons across methods and against hand linking. We have three main findings that lead us to conclude that automated methods perform well. First, a number of automated methods generate very low (less than 5%) false positive rates. The automated methods trace out a frontier illustrating the tradeoff between the false positive rate and the (true) match rate. Relative to more conservative automated algorithms, humans tend to link more observations but at a cost of higher rates o...
April 2018The Long-Term Spillover Effects of Changes in the Return to Schooling
with Ran Abramitzky, Victor Lavy: w24515
We study the short and long-term spillover effects of a pay reform that substantially increased the returns to schooling in Israeli kibbutzim. This pay reform, which induced kibbutz students to improve their academic achievements during high school, spilled over to non-kibbutz members who attended schools with these kibbutz students. In the short run, peers of kibbutz students improved their high school outcomes and shifted to courses with higher financial returns. In the medium and long run, peers completed more years of postsecondary schooling and increased their earnings. We discuss three main spillover channels: diversion of teachers’ instruction time towards peers, peer effects from improved schooling performance of kibbutz students, and the transmission of information about the retur...
February 2018Linking Individuals Across Historical Sources: a Fully Automated Approach
with Ran Abramitzky, Roy Mill: w24324
Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to find the correct match with certainty. In the first part of the paper, we suggest a fully automated probabilistic method for linking historical datasets that enables researchers to create samples at the frontier of minimizing type I (false positives) and type II (false negatives) errors. The first step guides researchers in the choice of which variables to use for linking. The second step uses the Expectation-Maximization (EM) algorithm, a standard tool in statistics, to compute the probability that each two records correspond to the same individual. The third step suggests how to use these es...
NBER Videos

