Rajeev H. Dehejia and Sadek Wahba
"Causal Effects in Non-Experimental Studies: Reevaluating the Evaluation of Training Programs,"
Journal of the American Statistical Association, Vol. 94, No. 448 (December 1999), pp. 1053-1062.
"Propensity Score Matching Methods for Non-Experimental Causal Studies,"
Review of Economics and Statistics, Vol. 84, (February 2002), pp. 151-161.
The data is drawn from a paper by Robert Lalonde,
"Evaluating the Econometric Evaluations of Training Programs," American
Economic Review, Vol. 76, pp. 604-620. We are grateful to him for allowing
us to use this data, assistance in reading his original data tapes, and
permission to publish it here.
NSW_TREATED.TXT (297 observations)
NSW_CONTROL.TXT (425 observations)
nsw.dta NSW treated and control observations in Stata format
NSW Data Files (Dehejia-Wahha Sample) Based on pre-intervention variables, we extract a further subset of Lalonde's NSW experimental data, a subset containing information on RE74 (earnings in 1974):
NSWRE74_CONTROL.TXT (260 observations)
NSWRE74_TREATED.TXT (185 observations)
The variables from left to right are: treatment indicator (1 if treated, 0 if not treated), age, education, Black (1 if black, 0 otherwise), Hispanic (1 if Hispanic, 0 otherwise), married (1 if married, 0 otherwise), nodegree (1 if no degree, 0 otherwise), RE74 (earnings in 1974), RE75 (earnings in 1975), and RE78 (earnings in 1978).
nsw_dw.dta NSW treated and control observations (Dehejia-Wahba Sample) in Stata format
PSID and CPS Data Files These six files contain the non-experimental comparison groups constructed by Lalonde from the Population Survey of Income Dynamics and the Current Population Survey, and the further subsets he created from the two basic comparison groups. CPS2 and CPS3 are very similar to, but not exactly the same as, as Lalonde's subsets; for CPS, we were unable to re-create his subsets exactly. The variables from left to right are: treatment indicator (1 if treated, 0 if not treated), age, education, Black (1 if black, 0 otherwise), Hispanic (1 if Hispanic, 0 otherwise), married (1 if married, 0 otherwise), nodegree (1 if no degree, 0 otherwise), RE74 (earnings in 1974), RE75 (earnings in 1975), and RE78 (earnings in 1978).
PSID controls (2490 observations): psid_controls.txt (text format), psid_controls.dta (Stata format)
PSID2 controls (253 observations): psid_controls.txt (text format), psid_controls2.dta (Stata format)
PSID3 controls (128 observations): psid_controls.txt (text format), psid_controls3.dta (Stata format)
CPS controls (15,992 observations): cps_controls.txt (text format), cps_controls.dta (Stata format)
CPS2 controls (2,369 observations): cps2_controls.txt (text format), cps_controls2.dta (Stata format)
CPS3 controls (429 observations): cps3_controls.txt (text format), cps_controls.dta3 (Stata format)
Correction
Finally, note that in Table 1 of Dehejia and Wahba (1999) the mean
of Hispanic for PSID3 is mis-stated. It should be 0.12, not 0.18.