Preliminary Results: Please Do Not Cite or Quote The Impact of Health Plan Report Cards on Managed Care Enrollment July 12, 2000 Dennis P. Scanlon Assistant Professor Department of Health Policy & Administration The Pennsylvania State University Michael Chernew Associate Professor Department of Health Management & Policy Department of Economics Department of Internal Medicine The University of Michigan Catherine McLaughlin Associate Professor Department of Health Management & Policy The University of Michigan Gary Solon Professor Department of Economics The University of Michigan Funding: This work was supported by a grant from the Agency for Health Care Quality Research, grant # 1-R01-HS10050. Acknowledgments: We are grateful to Tom Cragg and Bruce Bradley for providing the data for this study. We also acknowledge comments received from Will Manning and seminar participants at the University of Chicago, Penn State University, and the 11th Annual Health Economics Meeting. Finally, we appreciate the capable programming assistance of Joe Vasey. Preliminary Results: Please Do Not Cite or Quote Abstract Belief in the importance of information in today’s health insurance marketplace has led to the development and dissemination of health plan report cards by employers, the media, and consumer advocacy organizations. These report cards attempt to measure plan performance along various dimensions and are often based on the Health Plan Employer Data and Information Set (HEDIS). We examine how the release of health plan performance ratings influences health plan enrollment. Our analysis is based on the health plan choices of employees of General Motors (GM) Corporation for two open enrollment periods. Specifically, for 1997 enrollment, GM produced a HEDIS-based report card for all HMOs they contract with nationally. The report card rated plans on a three-point scale in six domains of performance (e.g., preventive care, surgical care). The report card was disseminated for the first time during the 1997 open enrollment period to non-union employees. In accordance with earlier research, our results confirm that employees are less likely to enroll in plans requiring relatively high out-of-pocket contributions. The point estimates suggest that a 10% increase in the relative price of a plan would generate about a 3% drop in its relative market share. The results with respect to the report card ratings are sensitive to the exact specification and sample, perhaps due to multicollinearity among report card rating variables. In most models the results are equivocal. The hypothesis that employees shifted towards the higher-rated plans is as likely to be rejected as accepted, though most of the estimated coefficients on ratings are not statistically significant. However, a specification that aggregates the ratings provides some evidence that employees avoided plans with many below-average ratings. I. Background For nearly two decades, many scholars and policymakers have advocated a competitive health insurance market in which plans compete for enrollees on the basis of price and quality. The managed competition model requires that consumers choose from among several competing plans and pay the incremental cost associated with the plan they choose. Because managed care plans integrate the financing and delivery of health care, the quality of health care plans should be broadly interpreted to include aspects of provider networks and clinical care as well as traditional measures of plan quality such as customer service. When choosing among competing plans, consumers must actively consider the trade-off between price and quality. Advocates of managed competition argue that the competitive model will promote efficient market outcomes. A cornerstone of the managed competition model is the requirement that consumers be sufficiently informed about quality. Towards this end, employers and other organizations are increasingly compiling and releasing information about dimensions of plan performance thought to be related to plan quality. The information on plan performance is based largely on standardized data systems, such as the Health Plan Employer Data and Information Set (HEDIS) and the Consumer Assessment of Health Plan Survey (CAHPS). Together these systems measure over 100 aspects of plan performance, which are typically aggregated into a more manageable number of ‘domains’ of performance, such as ‘preventive care’ and ‘satisfaction.’ This paper examines the impact of the release of a health plan report card on the health plan choices of employees at General Motors (GM) Corporation, whose benefit program resembles the managed competition model. The firm provided a fixed contribution to employees to subsidize all benefit choices, including health insurance. Health insurance benefit packages were standardized across plans. The firm created a HEDIS-based health plan ‘report card’ that was disseminated to non-union employees for the first time during the 1997 open enrollment period. By using longitudinal data to examine the impact of the report card on plan enrollments while controlling for out-of- pocket price, this paper empirically assesses the impact of report card information on plan enrollment. 1 II. Existing Literature Although a substantial existing literature examines factors that influence health insurance enrollment, much of this literature pertains to indemnity plan enrollment and considers out-of-pocket price, deductibles, co-insurance rates, and benefit structure as the key determinants of plan choice (Scanlon, Chernew, and Lave, 1997). This literature consistently finds an inverse relationship between plan price and plan enrollment, congruent with economic theory. Feldman et al. (1989) include a measure of the ability of enrollees to choose their physician in their model of plan choice, capturing an important aspect of how managed care plans differentiate themselves from one another. Other studies, such as Buchmueller and Feldstein (1996), also recognize the importance of the type of health plan in influencing plan choice. Chernew and Scanlon (1998) examine the cross-sectional relationship between report card ratings of plan performance and health plan choice in a setting where employees were given report cards during open enrollment. The cross-sectional design prevents identification of the role of report card information per se, because individuals may have been aware of plan performance in the absence of the report cards. The authors are unable to detect the hypothesized relationship between the report card ratings and enrollment. In several instances Chernew and Scanlon find counterintuitive relationships between ratings and enrollment that may be the result of correlation between the ratings and unmeasured plan attributes. A follow-up study, using the detailed performance measures, yields results consistent with the hypotheses regarding correlation between the ratings and unobserved plan traits (Scanlon and Chernew, 1999). The authors conclude that unobserved plan traits are probably important determinants of plan choice. The omission of important plan traits from the analysis could bias coefficient estimates due to the correlation between the unobserved attributes and the ratings. Moreover, there is likely a correlation among the health plan choices of individuals in the same market. Thus analyses conducted at the individual level will underestimate the standard errors if independence of the individual observations is falsely assumed (Moulton, 1986). Farley et al. (1999) study the impact of report card information on plan choice among Medicaid enrollees. After randomly distributing a CAHPS-based Medicaid managed care report card, these authors examine the impact of the report card on 2 enrollment for new Medicaid beneficiaries in New Jersey. The authors find that the report card had little effect on enrollment patterns in aggregate. When survey results are examined to identify report card ‘users,’ they find a stronger link between enrollment and the ratings for this sub-sample. The commercially insured population may differ from the Medicaid population in their response to report cards for a variety of reasons. In general commercially insured individuals are better educated, which may increase their responsiveness to information, but they may also be more informed from other sources and thus less likely to respond to health plan report card data. III. Experimental Setting and Data Experimental Setting Our analysis is based on the health plan choices of employees at GM for two open enrollment periods, 1996 and 1997. GM employed a flexible benefits system in which employees and retirees received a fixed amount of ‘flex dollars’ that could be allocated across several benefit categories (e.g., health insurance, life insurance, disability insurance, and dental insurance). Within each benefit category there were various options, each with firm specified prices. If the cost of one’s total benefit elections exceeded one’s allotted flex benefit dollars, the difference was paid out of pocket. If the total cost was less than the allotted amount, the difference was received as taxable income. The firm determined the set of health insurance plans from which enrollees could choose as well as the prices they were charged for each plan. We define the price as the employee out-of-pocket premium. During the 1997 open enrollment period, which occurred in the fall of 1996, the firm, for the first time, provided health plan performance ratings for all available HMOs to non-union employees as part of the open enrollment materials. The performance ratings were based on aggregated HEDIS data. No performance ratings were released for traditional fee-for-service (FFS) or preferred provider (PPO) plans. The release of ‘report card’ information provides the fundamental natural experiment that is the foundation of the analyses reported in this paper. Our analysis identifies the impact of the report card on plan choice using the observed health plan choices both before and after the firm released plan ratings to non- union employees. Fortunately, the set of health plans offered was very stable during the 3 study period. Because GM has employees in many markets, this is analogous to analyzing many different natural experiments. Because the union employees were not provided the report card for 1997 open enrollment, we also use changes in union enrollment patterns to control for unobserved, time-varying plan factors that may have affected health plan choice (e.g., changes in provider panels). HMO Ratings Developed by GM HMOs were rated ‘below expected performance,’ ‘average performance,’ or ‘superior performance’ along six domains, labeled by GM as: • preventive health care services • medical and surgical care • women’s health issues • access to care • patient satisfaction • operational performance The ratings for one domain, operational performance, were based on plan site visits by GM staff. The other five ratings were based on HEDIS results. The HEDIS measures comprising the five domains are listed in Table 1. The firm used individual HEDIS measures to create ratings via a Z-score methodology described in detail in Appendix A. In addition to these HEDIS-based ratings, GM also reported whether the National Committee for Quality Assurance (NCQA) accredited each HMO, and whether GM considered the HMO a ‘benchmark’ plan. 1 During the open enrollment period in 1996 (for 1997 enrollment), non-union employees were given an information sheet on each of the HMOs from which they could choose. The information sheet designated each plan as 1, 2, or 3 diamonds for each domain to represent the plan ratings. An example is provided in Figure 1, though no employee received the exact sheet represented in Figure 1 because no employees were offered all of the plans displayed. Analytic Sample The firm covered over 1.6 million active employees, retirees, and dependents. The analysis is based on the health plan choices of the approximately 96,000 active 1 GM considered a combination of factors in choosing which plans were labeled benchmark plans. These factors included premiums, quality, and geographic location of the HMO. Although the designation of 4 employees based in the U.S. that chose HMO coverage (Table 2). About 29,000 of these employees were salaried (non-union) and thus were given the report card information. Dependents were not analyzed separately because they almost always made the same choice as the employee with GM coverage eligibility. Retirees were excluded because they are frequently Medicare-eligible, making the nature of plan choice different than for the non-Medicare population. Employees who enrolled in FFS or preferred provider organizations (PPOs) were excluded because ratings were not provided for these plans.2 Hence, the analysis pertains to the impact of report card information on HMO choice, conditional on HMO enrollment. We also excluded plans with zero enrollments in either year. In most cases, this would be due to offering of plans that were not realistic choices. In a few cases, plans that were not offered in one of the years were dropped from the analysis. Approximately 27,000 employees in HMOs included in our study. About 25% of the 1997 employees enrolled in HMOs were either in a different HMO, enrolled in another plan type, or not receiving benefits from GM in 1996. For our analysis, employees are assigned to markets based on the set of health plans from which they could choose. All employees that share a common set of plan choices are grouped into the same geographic market. The firm determined the set of plans from which each employee could choose based on the employee’s zip code of residence. Markets are mutually exclusive, but plans may serve multiple markets. For example, plan A could be offered in San Francisco and south to Santa Cruz, and plan B could be offered in San Francisco and north through Marin County. This would result in three markets. Market 1 would represent Santa Cruz, with only plan A offered. Market 2 would represent Marin County, with only plan B offered. Market 3 would be San Francisco, with both plans offered. Other plans may serve only one market. In addition to the market distinction, employees within a market could chose from four different coverage categories: single, employee and spouse, employee and children, and employee and family. Coverage category could affect plan preferences in a variety of ways. For example, employees with children may be more interested in the set of available pediatricians and plan performance in the area of pediatric care. benchmark was based on these factors, the final decision was determined by a ‘qualitative’ judgment rather than a score resulting from a numerical algorithm. 2 Plan ratings were not provided for FFS or PPO plans because the HEDIS data that were used to construct the ratings were collected only for HMOs. 5 For our analysis we define a ‘cell’ as a particular market/coverage category combination. After excluding cells with fewer than 5 employees and markets with only 1 plan, we have observations on 69 plans spread across 183 market/coverage category cells. On average, cells have 3.19 plans (minimum=2, maximum=6). In 1996, the mean number of choosers per cell was 178.78 (minimum = 5, maximum = 3,299). In 1997, the cells were slightly larger (mean=188.29, minimum=5, maximum=3,462). Descriptive statistics for the 69 plans are in Table 3. The descriptive statistics on annual price reported in this table reflect the difference between the out-of-pocket price and the allotted flex dollars to standardize across coverage categories. IV. Econometric Methods Let U ijct represent person i ’s utility from health plan j in market/coverage cell c in year t . This utility depends partly on observable characteristics of the health plan, such as its out-of-pocket premium and its report card ratings. It also depends on other characteristics that researchers do not observe, such as the popularity of physicians in the health plan’s provider network, the convenience associated with using the plan providers, confidence that the plan will approve requests for care in unusual circumstances, and amenities of the hospitals affiliated with the plan. These unmeasured variables may differ by market (as in the case of provider location or quality) or coverage category (as in the case of quality of pediatric services). Individuals may observe these attributes through interactions with family, friends, co-workers, and health care professionals, regardless of whether the firm reports plan performance measures. Finally, different individuals have different valuations of the same plan because of idiosyncratic differences in individual preferences or circumstances. We formalize these considerations with the random-utility model (1) U ijct = β′X jct + γ jc + εijct . The vector X jct contains the measured health plan characteristics, and β is the associated coefficient vector. The γ term represents the average individual’s valuation of the unmeasured characteristics of the plan. The error term εijct reflects the individual’s idiosyncratic deviation from that average valuation. 6 Each individual chooses the health plan that maximizes his or her utility. Therefore, given the X ’s and the γ ’s, the probability π jct that an individual in cell c chooses plan j in year t is (2) π jct = Prob (U ijct > U ikct ) for all k ≠ j = Prob [εijct − εikct > β ′( X kct − X jct ) + (γ kc − γ jc )] for all k ≠ j . Specifying a functional form for the choice probability π jct requires a distributional assumption for the ε ’s. Following McFadden (1973, 1974), if the ε ’s in year t follow independent Type I extreme value distributions,3 then π jct takes the logit form (3) π jct = exp( β′X jct + γ jc ) / Dct Nc where Dct = ∑ exp( β ′X kct + γ kc ) and N c is the number of plans offered in cell c . k =1 If all the γ ’s were zero – that is, if unmeasured plan traits were of absolutely no consequence – then the model in equation (3) would simplify to the standard conditional logit model, which could be estimated by applying the conventional maximum likelihood estimator to individual-level data. This approach is common in the existing literature (Short and Taylor, 1989; Feldman et al., 1989; Garnick et al., 1989). But the assumption that consumers are indifferent to unmeasured factors, such as popularity of physicians and locational convenience, is quite implausible, and false imposition of this assumption creates two serious econometric problems. First, assuming the γ ’s are zero when they really are not overlooks correlation across individual observations in the same cell. This oversight can lead to gross underestimation of standard errors (Moulton, 1986). Second, and more importantly, if some of the unobservables underlying γ jc are correlated with 3 The assumption that the ε ’s follow independent Type I extreme value distributions imposes independence of irrelevant alternatives (IIA). IIA implies that the probability of choosing one plan relative to a second plan is unaffected by changes in attributes of other plans in the individual's choice set. If some plans are closer substitutes to a given plan than others, IIA will be violated. 7 some of the observables in X jct , failure to control for those unobservables generates an omitted-variables inconsistency in the estimation of β . To avoid these problems, we employ a methodology that does account for the γ ’s.4 Let S jct denote the market share of plan j in market/coverage cell c in year t , which is simply the fraction of sample individuals in the cell that choose that plan. The expected value of S jct is the choice probability π jct , but with finite samples the realized value of S jct deviates from π jct because of random sampling error. Thus, (4) S jct = π jct + v jct = exp( β ′X jct + γ jc ) / Dct + v jct where E (v jct ) = 0 and Var (v jct ) varies inversely with the number of sample individuals in the cell. Taking natural logarithms yields (5) ln( S jct ) = ln[exp( β′X jct + γ jc ) + Dct v jct ] − ln( Dct ) , and a first-order Taylor series expansion of equation (5) around v jct = 0 yields (6) ln( S jct ) ≅ β′X jct + γ jc − ln( Dct ) + v jct / π jct . It follows that the difference between the log market share of plan j and that of an arbitrarily selected reference plan r in the same cell is (7) ln( S jct ) − ln( S rct ) ≅ β′( X jct − X rct ) + (γ jc − γ rc ) + ( v jct / π jct − vrct / πrct ) . Inspection of equation (7) makes clear that, if one were to perform least squares estimation of the cross-sectional regression of the difference in log market shares on the differences in measured plan traits, the failure to control for the difference in the unobserved γ ’s would generate an omitted-variables bias in the estimation of β . 8 Fortunately, our access to longitudinal data from both 1996 and 1997 enables us to “difference out” the γ ’s. Differencing equation (7) between the two years yields (8) ∆ ln( S jc ) − ∆ ln( S rc ) ≅ β′( ∆X jc − ∆X rc ) + [ ∆( v jc / π jc ) − ∆( vrc / πrc )] where the ∆ notation denotes a change from 1996 to 1997. Least squares estimation of equation (8) escapes the omitted-variables bias because the longitudinal differencing causes the γ ’s to drop out of the equation. While the effects of observable, but time-invariant plan traits cannot be identified in this approach, we can identify the effects of two key types of traits – price and report card ratings. The out-of-pocket premia for different plans did change differentially between the two years, so we identify the price sensitivity of consumer choices off of the association between year-to-year changes in market shares and year-to-year changes in relative premia. GM did not provide report card ratings to its employees in 1996, but it introduced that information in 1997, so we identify the impact of report card ratings by relating the 1996-to-1997 changes in market shares to differences between plans in their newly introduced report card ratings. With N c plans offered in cell c and one plan used as the reference plan, cell c contributes N c − 1 observations to the regression sample. Ordinary least squares estimation of equation (8) would be inefficient because the error term is nonspherical. The error term is heteroskedastic because v is inversely related to the number of individuals within the cell, so we weight each observation by the square root of the number of employees in that cell in 1997. We also perform a generalized-least-squares correction that accounts for five varieties of correlation across observations: (1) between observations for the same plan but different coverage categories in the same cell; (2) between observations for different plans in the same cell, which share the same reference plan; (3) between observations for the same plan in different cells; (4) between observations for different plans in different cells that share the same reference plan; and (5) between observations in different cells where the same plan is the reference plan in one cell and a plan observation in the other cell.5 4 This approach is similar to that of Berry (1994). 5 We estimated the model using the REML variation of the PROC MIXED procedure in SAS. 9 Our claim that our longitudinal estimation approach avoids omitted-variables bias depends on our assumption that the plan/cell unobservables represented by the γ ’s are time-invariant. What if these “fixed effects” are not really fixed, i.e., what if there are important changes between 1996 and 1997 in factors like a plan’s locational convenience and the popularity of its physicians? Then equation (8) changes to (9) ∆ ln( S jc ) − ∆ ln( S rc ) ≅ β ′(∆X jc − ∆X rc ) + ( ∆γ jc − ∆γ rc ) + [ ∆( v jc / π jc ) − ∆(v rc / πrc )] . If the changes in the γ ’s are correlated with the changes in the X ’s, then our estimation may be subject to omitted-variables bias after all. To treat this possibility, we will perform a supplementary analysis that exploits additional information on the plan choices of union employees. Report card ratings were not provided to union employees, and the out-of-pocket premia for union employees were zero in both 1996 and 1997. Therefore, if the choices of union employees obey the same model we have specified for non-union employees, the union version of equation (9) is ∆ ln( S u ) − ∆ ln( S rc ) ≅ (∆γ jc − ∆γ rc ) + [ ∆(v u / π u ) − ∆( v u / π u )] u (10) jc jc jc rc rc where the u superscript signifies a union variable. The union counterpart to the dependent variable in equation (9) therefore can be viewed as a proxy for the ( ∆γ jc − ∆γ rc ) term we wish to control for in equation (9). It differs from that term only because of the sampling error involving the v ’s. Unfortunately, the union information we have is not as finely detailed as our information on non-union employees. Our market-share information on union employees consists of state-level data aggregated over all coverage categories and markets. Nevertheless, as a step towards attempting to control for time-varying unobservables, we will try using that aggregated version of the left side of equation (10) as an additional control variable in our regression model for the changes in relative market shares among non-union employees. 10 V. Results The first column of Table 4 reports the results for the base model. Consistent with economic theory and the vast majority of existing literature, out-of-pocket price is inversely related to enrollment. The magnitude of the estimated coefficient suggests that a 10% increase in relative price would generate approximately a 3% decrease in relative market share. Accreditation status is estimated to be positively associated with enrollment changes. Plans that became NCQA-accredited gained relative market share. The results for the rating variables are equivocal. Of the twelve estimated coefficients on the superior or below-average ratings, only seven are of the hypothesized sign. Of the six domains of performance, only two, women’s health and access to care, have positive estimated coefficients on the superior rating and negative estimated coefficients on the below-average rating. None of those estimated coefficients is statistically significant. Two of the twelve relevant coefficient estimates are statistically significant, but neither has the hypothesized sign. Despite the lack of significance of most rating coefficient estimates, the hypothesis that all rating coefficients equal zero can be rejected at p<0.01. The conclusions are robust to several specifications of the random effects. To test whether time-varying factors might be influencing the results, we included the change in union share variable to capture unmeasured, time-varying plan effects (Table 4, column 2). The estimated coefficient of this variable had the hypothesized sign, but was not statistically significant. Perhaps this was because unobserved plan traits were relatively stable over this two-year study period or perhaps the estimate reflects downward bias due to measurement error for the union market share. Regardless of the reason why the estimated coefficient on the change in union share is not statistically significant, its inclusion does not alter any of the conclusions regarding the ratings; they remain equivocal with frequently counterintuitive signs. Two specifications explored the sensitivity of the results to outliers. First, we raised the size of the cell required for inclusion in the model from 5 to 10 employees (Table 4, column 3). Market shares in small markets may be relatively unstable so this restriction should eliminate some noise in the data. The results from this exercise reveal no substantive difference from the base results. The estimated coefficients on price and accreditation are virtually unchanged and the estimated coefficients on the rating variables remain equivocal. 11 Second, we omitted outliers from the base model (Table 4, column 4). Outliers were defined as observations with studentized residuals greater than 2 (Belsley, Kuh & Welsch, 1980).6 The estimated coefficient on price remains stable. However, the estimated coefficient on the accreditation variable, though still positive, drops about 70 percent and is no longer statistically significant. Although several estimated rating coefficients switch signs in this specification, and several lose statistical significance, the qualitative conclusions regarding the ratings remain unchanged. The estimated coefficients frequently have counterintuitive signs and only one domain, operational performance, has both coefficients estimated with the hypothesized sign. Taken as a group, the results reported in Table 4 do not provide support for the notion that employees responded to the report card ratings. Most of the estimated coefficients on ratings were not statistically significant. Those that were significant often exhibited counterintuitive signs. Table 5 reports results from specifications that aggregate the rating variables into summary measures. There are several reasons why aggregation might be useful. First, individuals may not be able to process information from all the domains. They may adopt simplifying decision rules such as focusing only on selected domains, or selecting plans with the most superior ratings or fewest below-average ratings (Hibbard, 1997). Second, the number of parameters in the base model is relatively large compared to the sample size, and there is some collinearity among the explanatory variables in the base specification. The counterintuitive signs and general lack of statistical significance could reflect these issues. The first two columns of Table 5 report specifications that include only the superior or below-average ratings. Both specifications are easily rejected relative to the base model (p<0.01). Nevertheless, the estimated price coefficient remains negative and statistically significant, though in the specification with only below-average ratings its magnitude increases by about a third. The estimated accreditation coefficient remains positive, but is much smaller than in the base model and not statistically significant. The estimated coefficients on the rating variables when only superior ratings are included are of the hypothesized sign in five of six cases, though none is statistically significant. When only below-average ratings are included, the estimated coefficients 6 Outliers were identified using the studentized residual in OLS estimation of the model. 12 have counterintuitive signs in three of the six cases, though the one statistically significant coefficient estimate (women’s health) is of the hypothesized negative sign. The third column of Table 5 imposes the restriction that employees respond to the sum of superior ratings and the sum of below-average ratings. This set of restrictions is rejected, relative to the base model, at the 0.10 level but not at the 0.05 level. The estimated price coefficient remains statistically significant and about -0.3, and the estimated accreditation coefficient remains positive, though not statistically significant. In this specification, both of the estimated rating coefficients have the hypothesized sign. The estimated coefficient on the sum of below-average ratings is negative and statistically significant, and the estimated coefficient on the sum of superior ratings is positive, though not statistically significant. Though this model is rejected at p<0.10 relative to the base model, it provides the strongest support for the impact of ratings on enrollment. An alternative approach for limiting the parameters in the model is to selectively exclude various domains. Domains excluded were selected in a stepwise fashion. Exclusion of each domain was tested relative to the base model, and ‘Prevention’ was the domain whose exclusion had the least effect on the likelihood function. The hypothesis that all of the prevention coefficients were zero could not be rejected (p=0 .39). With ‘Prevention’ excluded, we sequentially tested exclusion of each of the remaining five domains against the base model and against the model with prevention ratings excluded. The exclusion of ‘Access to Care’ had the smallest impact on the likelihood function, and the hypothesis that all the ‘Prevention’ and ‘Access’ coefficients equal zero could not be rejected (p=0.197 vs. base). An analogous process led to the further, sequential exclusion of the ‘Satisfaction’ domain (p=0.182 vs. base), ‘Operations’ (p=0.106 vs. base), and ‘Women’s Health’ (p=0.124 vs. base). This left only ‘Medical/Surgical Care’ in the model. The final column of Table 5 reports results excluding all domains except ‘Medical/ Surgical Care.’ Given the multicollinearity, these estimated coefficients are capturing any unmeasured effects from the omitted domains. As with all other models, the estimated price coefficient remains negative and statistically significant, with a magnitude of approximately -0.3. The accreditation coefficient estimate remains positive, but not statistically significant. The model provides some support for the hypothesis that ratings matter because the estimated coefficient on the below-average 13 rating is negative and statistically significant. However, the coefficient on the superior rating is also negative, though smaller in absolute value, and statistically significant, at least at the p<0.10 level. Similar qualitative conclusions would be drawn from any of the estimates based on selectively omitted domains: some coefficient estimates support the hypothesis that ratings matter, but others have statistically significant counterintuitive signs. VI. Discussion Considerable resources have been devoted to collecting health plan performance measures. Many organizations, including GM, have spent substantial additional resources to construct and disseminate health plan report cards based on that information. Evidence from focus groups and surveys regarding whether employees would use report card data in making their plan selections is mixed. Several studies report interest in measures of access, satisfaction, technical quality, and use of preventive services (Robinson and Brodie, 1997; Tumlinson et al., 1997; Hibbard and Jewitt, 1996). Despite interest in such information, some focus-group evidence suggests that most employees do not use the information when it is provided (Robinson and Brodie, 1997; Meyer et al., 1998). The work by Meyer et al. is particularly pertinent because the focus groups in that study, though small relative to our sample size, were conducted in 1997 using salaried employees at GM. Evidence from focus groups should be supplemented with evidence based on actual plan choices because researchers have found a discrepancy between what individuals say and what they do (Hibbard and Jewitt, 1996). The findings from this work confirm that price is inversely related to plan choice. The estimated price coefficient is very stable across specifications. All specifications also indicate that NCQA accreditation is positively related to enrollment. Other research suggests that employees are distrustful of employer-sponsored information. The positive relationship between enrollment and accreditation could reflect greater trust in information generated by organizations other than the firm, such as NCQA. However, the magnitude of the estimated accreditation effect varies dramatically by specification and is not statistically significant in models with less than the full set of rating variables. The results regarding the impact of ratings on plan enrollment are equivocal. Estimates that relate plan enrollment to ratings for each domain (Table 4) or to subsets of domains (Table 5, column 4) reject the hypothesis that all of the ratings coefficients equal 14 zero. Yet most of the estimated coefficients are statistically insignificant and they often have counterintuitive signs. Given the multicollinearity in the rating data and our sample size, we are unable to achieve precise identification of the effects of ratings for specific domains. The specification most favorable to rating effects includes only aggregated ratings variables, including the count of superior ratings and the count of below-average ratings. This specification suggests that individuals avoided plans with a large number of below- average ratings. This result is consistent with recent findings from focus groups that individuals are more likely to avoid bad plans than select good ones (Hibbard et al., 2000). Given the instability of estimated coefficients across specifications, one should be wary of over-interpreting the findings. The restrictions on parameters that generate the most favorable specification for the ratings are rejected at p<0.10, suggesting that more information is contained in the ratings than we have identified. Several limitations are worth noting. First, in our base specification, the ratio of observations to parameters is relatively small. This is because we aggregate the data to market/coverage category cells. The aggregation does not discard information because our model includes only plan attributes and the aggregation is important because of salient unobserved plan traits. Yet this process illustrates that the ability to investigate the questions we pose requires more than simply a large number of employees (we had 27,000 non-union employees in our sample). It also requires many plan/market combinations. The large number of employees helps, of course, by increasing the extent to which the observed market shares approximate what would be the true market shares with infinite cell sizes. In short, the sample size within each cell and the number of cells relative to parameters are both important. As the number of cells decreases, time varying unobserved plan-specific factors could increasingly influence the findings. To some extent our inclusion of changes in the union shares will capture some of these unobserved, time-varying factors. Yet this control is imperfect both because our union data are at the state, not market level, and union workers could react differently than salaried workers. Hence, some of the counterintuitive signs likely reflect the low ratio of observations to parameters. Second, it may be the case that certain subsets of employees are more influenced by the ratings than others. Our estimated coefficients could be dampened by substantial 15 inertia among employees. However, most report card efforts, including competitive model proposals, are aimed at similar audiences that are likely to have quite a bit of inertia. Moreover, this inertia does not prevent identification of hypothesized effects for the price variable. In addition, earlier work that examined the behavior of new hires in a cross-sectional setting found little difference in the conclusions relative to the full sample of workers (Chernew and Scanlon, 1998). Many other studies also focus on existing workers (Feldman et al., 1989; Buchmueller and Feldstein, 1996). Nevertheless, further work examining subgroups would be valuable. Finally, the efforts to construct and disseminate report cards may have merit beyond influencing employee behavior. The plan performance information underlying report cards may be important for employers in selecting plans with which to contract, and report cards may push plans to improve their performance. Moreover, as employees become more familiar with report cards, the impact of such information may grow. Given our findings, continued study of how health plan ratings influence markets is important for assessing the responsiveness of consumers to commonly used plan performance measures. 16 Figure 1 Example Information Sheet 17 Table 1 Health Plan Performance Domains and Their Measures Domain Measures Prevention Childhood Immunization Rate Cholesterol Screening Rate Prenatal Visit Rate in the First Trimester* Cervical Cancer Screening Rate* Diabetic Retinal Examination Rate Mammography Rate* Medical/Surgical Care C-Section Rate Cardiac Catheterization Rate Coronary Artery Bypass Graft Rate Coronary Angioplasty (PTCA) Rate Laminectomy Rate Prostatectomy Rate Inpatient Admission Rate for Asthma Women’s Care Prenatal Visit Rate in the First Trimester Cervical Cancer Screening Rate Hysterectomy Rate Mammography Rate Access Follow-up after a Major Mental Health Disorder Inpatient Readmission Rate Open Panel Primary Care Physician Turnover Rate Percent of Enrollees with a Primary Care Visit Satisfaction Enrollee Satisfaction Survey ** * Less weight was placed on the Prenatal Visit Rate, Cervical Cancer Screening Rate and Mammography Rate in the prevention domain because these measures were also used in the women’s care domain. ** The satisfaction rating was modified to reflect GM preference for large sample sizes and phone vs. mail administration of the survey. 18 Table 2 Active Employees by Plan Type (1997) (rounded to nearest 1000) HMO PPO FFS Total Salary (non- 29,000 11,000 32,000 72,000 union) Union (hourly) 67,000 72,000 90,000 229,000 Total Active 96,000 83,000 122,000 301,000 19 Table 3 Descriptive Statistics: Means and Frequencies N Mean Std. Min Max Annual Price* (1996) Single 66+ 434.36 163.90 84 708 Coverage Annual Price* (1996) Family 69+ 1213.91 463.19 240 1956 Coverage Annual Price* (1997) Single 66+ 446.91 184.40 108 732 Coverage Annual Price* (1997) Family 69+ 1238.55 499.52 300 2004 Coverage NCQA Accredited 69 .70 Benchmark Status 69 .15 N Superior Average Below No Average Data Operational Performance 69 29 22 18 0 Preventive Care 69 27 19 15 8 Medical and Surgical Care 69 18 26 18 7 Access 69 27 18 17 7 Women’s Health 69 24 23 20 2 Patient Satisfaction 69 27 17 15 10 + The sample sizes vary for price because markets with less than five employees were excluded from the regression analysis. * The annual prices reported in Table 3 reflect the difference between the out-of-pocket price and the allotted flex dollars in order to standardize price across coverage categories. 20 Table 4 Estimates of Model in Equation (8) (t-stats in parentheses) Base Base with Excluding Omitting Model Union Cells<10 Outliers Ln(Price) -0.290 ** -0.275 * -0.291 ** -0.309 *** (-2.15) (-1.90) (-2.10) (-2.89) Union Share 0.04519 (0.57) Accreditation 0.761 *** 0.7656 *** 0.770 *** 0.209 (3.22) (2.70) (3.21) (1.50) Operational Superior 0.307 0.2372 0.318 0.117 Performance (1.51) (1.00) (1.52) (0.87) Below Average 0.707 *** 0.6171 *** 0.734 *** -0.065 (3.29) (2.34) (3.28) -(0.48) Preventive Care Superior 0.051 0.07794 0.030 -0.013 (0.22) (0.28) (0.13) -(0.08) Below Average 0.164 0.08059 0.150 -0.036 (0.65) (0.28) (0.58) -(0.25) Medical/ Superior -0.472 ** -0.4844 *** -0.443 ** -0.150 Surgical Care (-2.51) (-2.39) (-2.27) (-1.18) Below Average -0.282 -0.3405 -0.249 -0.183 (-1.19) (-1.24) (-1.02) (-1.33) Women's Superior 0.389 0.3912 0.405 0.293 Health (1.31) (1.12) 1.34 (1.42) Below Average -0.257 -0.1802 -0.253 0.074 (-1.21) (-0.69) (-1.17) (0.51) Access to Care Superior 0.042 -0.00562 0.038 -0.123 (0.24) (-0.03) (0.21) -(1.38) Below Average -0.279 -0.2591 -0.289 -0.205 (-1.14) (-0.96) (-1.16) (-1.50) Patient Superior -0.342 -0.3530 -0.324 -0.216 Satisfaction (-1.60) (-1.37) (-1.50) (-1.77) * Below Average 0.069 0.02748 0.108 -0.178 (0.29) (0.10) (0.44) (-1.10) “Missing data” yes yes yes yes dummy variables included N 274 274 242 264 *significant at p=0.10, ** significant at p=0.05, *** significant at p=0.01 21 Table 5 Restricted Models (t-stats in parentheses) Only Only Below Summing Excluding Superior Average Ratings Domains Ln(Price) -0.282 ** -0.396 *** -0.299 ** -0.338 (-2.14) (-3.37) (-2.44) (-3.13) *** Accreditation 0.2144 0.241 0.062 0.236 (1.16) (1.13) (0.38) (1.51) Operational Superior 0.032 Performance (0.19) Below Average 0.161 (0.85) Preventive Care Superior 0.153 (0.75) Below Average -0.006 (-0.03) Medical/Surgical Superior -0.030 -0.280 Care (-0.20) (-1.93) * Below Average 0.012 -.4695 (0.07) (-2.86) *** Women's Health Superior 0.412 (1.62) Below Average -0.357 (-2.25) ** Access to Care Superior 0.081 (0.58) Below Average -0.216 (-1.29) Patient Superior 0.037 Satisfaction (0.25) Below Average 0.012 (0.06) Number of Superior Ratings 0.013 (.21) Number of Below Average Ratings -0.126 (-2.29) ** Missing Data No No Summed For Only Dummies Med/Surg N 274 274 274 274 # of Restrictions 11 11 14 14 p-value (vs. base) 0.002 0.001 0.071 0.124 *significant at p=0.10, ** significant at p=0.05, *** significant at p=0.01 22 Appendix A: Construction of Plan Ratings Ratings for all of the domains except operational performance were based on a subset of HEDIS, version 2.5, measures (Table 1). For each measure, each HMO was assigned a Z-score, which was computed as Z ij = ( X ij − X i ) / σ i where X ij = score for plan j on HEDIS measure i , X i = average score on HEDIS measure i for a sample of managed care health plans, including all of those offered by the firm and about 80 other plans offered by other firms that used the same methodology (about 200 plans in total), and σ i = standard deviation of HEDIS measure i for the sample of managed care health plans, including all of those offered by the firm and about 80 other plans offered by other firms that used the same methodology. Within each domain, the Z-scores for each measure were averaged for each plan to compute a domain-level score. If fewer than half of the data elements were missing in a given domain for a given plan, the average was computed over the measures provided. If more than half of the data elements were missing for a specific domain, the plan was given a rating of ‘No Data’ for that domain. For each domain of performance, the top third of the HMOs (including those not offered by this firm) were rated ‘superior performance,’ the middle third were rated ‘average performance,’ and the bottom third were rated ‘below expected performance.’ The ‘operational performance’ rating, which captured non-clinical aspects of plan management such as claims payment and customer service, was based on evaluations from site visits to the plans by GM staff. 23 References Belsey, DA, Kuh E , and R.E. Welch. Regression diagnostics: Identifying Influential Data and Sources of Collinearity, Wiley, New York. 1980. Berry ST. Estimating discrete-choice models of product differentiation. RAND Journal of Economics. 1994; 25:242-262. Buchmueller TC, Feldstein PJ. The effect of price on switching among health plans. Journal of Health Economics 16. 1997;231-247. Chernew, M.E., and D.P. Scanlon. 1998. Health Plan Report Cards and Insurance Choice. Inquiry. 35 (Spring):9-22. Farley, D.O., Short, P.F., Elliot, M., Kanouse, D., Brown, J., and R.D. Hays. 1999 Use of CAHPS Information in Health Plan Choices By New Jersey Medicaid Beneficiaries, Unpublished Manuscript (draft). Santa Monica, CA. RAND Corporation Feldman R, Finch M, Dowd B, Cassou S. The Demand for Employment-Based Health Insurance Plans. The Journal of Human Resources. 1989;XXIV:115-142. Garnick DW, Lichtenberg E, Phibbs CS, Luft HS, Peltzman DJ, McPhee SJ. The Sensitivity of Conditional Choice Models for Hospital Care to Estimation Technique. Journal of Health Economics. 1989;8:377-397. Hibbard, JH. 2000. Presentation at the Quality and the Consumer Perspective Research Meeting, March 10, Columbia, MD. Hibbard JH, Jewett JJ. What Type of Quality Information Do Consumers Want in a Health Care Report Card? Medical Care Research and Review. 1996; 53:28-47. Hibbard, JH, Slovic, P, and JJ Jewett. 1997. "Informing Consumer Decisions in Health Care: Implications from Decision-Making Research". Milbank Quarterly, Vo. 75(3), 395-414. McFadden DL. Conditional Logit Analysis of Qualitative Choice Behavior. Frontiers In Econometrics. New York: Academic; 1973. McFadden DL. The Measurement of Urban Travel Demand. Journal of Public Economics. 1974;3:303-328. Meyer, J.A., E.K. Wicks, L.S. Rybowski, and M.J. Perry. March 1998. Report on Report Cards. Economic and Social Research Institute, Washington, D.C. Moulton, B.R. 1986. Random Group Effects and the Precision of Regression Estimates. Journal of Econometrics. 32(3), 385-97 Robinson S, Brodie M. Understanding the Quality Challenge for Health Consumers: The Kaiser/AHCPR Survey. Journal on Quality Improvement. 1997;23:239-244. 24 Scanlon, D.P. and M.E. Chernew. 1999. HEDIS Measures and Managed Care Enrollment. Medical Care Research and Review. Scanlon, DP, Chernew, ME, and J Lave. "Consumer Health Plan Choice: Current Knowledge and Future Directions". Annual Review of Public Health. 1997, 18:507- 28. Short, PF. "Early Lessons from the CAHPS Demonstrations and Evaluations." CAHPS User's Meeting Sponsored by the Agency for Health Care Policy and Research, Baltimore, MD, October 15-16, 1998. Short PF, Taylor AK. Premiums, Benefits, and Employee Choice of Health Insurance Options. Journal of Health Economics. 1989;8:293-311. Tumlinson A, Bottigheimer H, Mahoney P, Stone EM, Hendricks A. Choosing a Health Plan: What Information Will Consumers Use? Health Aff (Millwood). 1997;16:229- 238. 25