Rajeev H. Dehejia and Sadek Wahba
"Causal Effects in Non-Experimental Studies: Reevaluating the Evaluation of Training Programs,"
Journal of the American Statistical Association, Vol. 94, No. 448 (December 1999), pp. 1053-1062.

The data is drawn from a paper by Robert Lalonde, "Evaluating the Econometric Evaluations of Training Programs," American Economic Review, Vol. 76, pp. 604-620. We are grateful to him for allowing us to use this data, assistance in reading his original data tapes, and permission to publish it here.
 
 

Data Files

NSW_TREATED.TXT (297 observations)

NSW_CONTROL.TXT (425 observations)

These files contain the treated and control units from the male sub-sample from the National Supported Work Demonstration as used by Lalonde in his paper. These are text files. The order of the variables from left to right is: treatment indicator (1 if treated, 0 if not treated), age, education, Black (1 if black, 0 otherwise), Hispanic (1 if Hispanic, 0 otherwise), married (1 if married, 0 otherwise), nodegree (1 if no degree, 0 otherwise), RE75 (earnings in 1975), and RE78 (earnings in 1978). The last variable is the outcome; other variables are pre-treatment.

PSID_CONTROLS.TXT (2490 observations)
PSID2_CONTROLS.TXT (253 observation)
PSID3_CONTROLS.TXT (128 observations)

CPS_CONTROLS.TXT (15,992 observations)
CPS2_CONTROLS.TXT (2,369 observations)
CPS3_CONTROLS.TXT (429 observations)

These six files contain the non-experimental comparison groups constructed by Lalonde from the Population Survey of Income Dynamics and the Current Population Survey, and the further subsets he created from the two basic comparison groups. CPS2 and CPS3 are very similar to, but not exactly the same as, as Lalonde's subsets; for CPS, we were unable to re-create his subsets exactly.

Finally, based on pre-intervention variables, we extract a further subset of Lalonde's NSW experimental data, a subset containing information on RE74 (earnings in 1974):

NSWRE74_CONTROL.TXT (260 observations)

NSWRE74_TREATED.TXT (185 observations)

The variables from left to right are: treatment indicator (1 if treated, 0 if not treated), age, education, Black (1 if black, 0 otherwise), Hispanic (1 if Hispanic, 0 otherwise), married (1 if married, 0 otherwise), nodegree (1 if no degree, 0 otherwise), RE74 (earnings in 1974), RE75 (earnings in 1975), and RE78 (earnings in 1978).

Finally, note that in Table 1 of published paper the mean of Hispanic for PSID3 is mis-stated. It should be 0.12, not 0.18.