You may be interested in the CPS statistical imputation of capital gains and itemized deductions but there are also more mundane issues when using the CPS. Here is the word from Census on the calculation of taxes in the March CPS. The numbered questions are mine, the response is below each question.

Daniel Feenberg

From amy.b.ohara@census.gov Thu Feb 14 14:33:45 2008 Date: Thu, 14 Feb 2008 15:33:05 -0500 From: amy.b.ohara To: feenberg, david.s.johnson Cc: charles.t.nelson Subject: Tax model 1) What is FILESTAT for the spouse of a taxpayer? Married taxpayers will have FILESTAT <= 3 but only one of the spouses will have tax values. 2) What is DEP-STAT for the spouse of a taxpayer? In the March 2007 file (tax year 2006), DEP_STAT was blank for the spouse, but in March 2006 (tax year 2006), it pointed to the spouse with the tax values. I would prefer to give you the count of exemptions from the unit formation logic rather than use DEP_STAT. 3) Are the dollar amounts present on all person records, or only the taxpayer record? Tax model amounts (e.g., AGI, taxable income) are only on the taxpayer record. 4) Is the taxpayer record always before the spouse and dependent records? Not necessarily, person identifiers (A_LINENO or PPPOS) are already on the file when the units are created. 5) Where does CAP-GAIN come from? CAP_GAIN and CAP_LOSS are imputed from IRS SOI public use data. 6) Are the tax amounts calculated from the top-coded incomes? The internal file is used for the tax calculator and some of the resulting tax values are topcoded. 7) Are there imputations for deductions? Yes, itemized deductions are imputed from IRS SOI public use data. 8>It may be possible to create a tax-unit within family value equal to DEP-STAT for dependents, PPPOS for the taxpayer and A-SPOUSE for the spouse. For a given tax unit, all three of those values should be the same. But can I tell which is the taxpayer and which is the spouse from the person record itself (and not needing to refer to other nearby records)??? The information in the data dictionary doesn't seem to say. The key is to be able to create the tax-unit variable without reference to other records, which would make the work much more complicated. Once one has the tax-unit id, then packages such as SAS or Stata provide easy procedures for summation over the members of the tax unit. FILESTAT is a recode of an internal variable called FILEST which can take values of 1=single, 2=married, 4=head for taxpayers. For records with FILEST, total exemptions and income are summed. I just looked over the list of fields that the internet interface of Taxsim uses and I believe the tax unit setup program generates all the required variables. An internal extract of all cases where FILEST ne 0 should run. Is it necessary to have wages, interest, dividends, etc entered separately? I have a rollup called TOTINC that covers up to line 22 of the 1040. If we substituted that for wages and zero out the other income sources, would that work?