Tests of Equal Forecast Accuracy and Encompassing for Nested Models Todd E. Clark and Michael W. McCracken* Federal Reserve Bank of Kansas City and Louisiana State University June 2000 Abstract We examine the asymptotic and finite-sample properties of tests for equal forecast accuracy and encompassing applied to 1-step ahead forecasts from nested linear models. We first derive the asymptotic distributions of two standard tests and one new test of encompassing and provide tables of asymptotically valid critical values. Monte Carlo methods are then used to evaluate the size and power of tests of equal forecast accuracy and encompassing. The simulations indicate that post-sample tests can be reasonably well sized. Of the post-sample tests considered, the encompassing test proposed in this paper is the most powerful. We conclude with an empirical application regarding the predictive content of unemployment for inflation. Keywords: causality, forecast accuracy, forecast encompassing JEL Nos.: C53, C12, C52 * Clark (corresponding author): Economic Research Dept., Federal Reserve Bank of Kansas City, 925 Grand Blvd., Kansas City, MO 64198, todd.e.clark@kc.frb.org. McCracken: Dept. of Economics, Louisiana State University, 2107 CEBA Building, Baton Rouge, LA 70803, mmccrac@unix1.sncc.lsu.edu. The helpful comments of Charles Engel, Lutz Kilian, Norm Swanson, Dek Terrell, Ken West, and seminar participants at UCSD, the University of Kansas, the University of Michigan, and Penn State University are gratefully acknowledged. The views expressed herein are solely those of the authors and do not necessarily reflect the views of the Federal Reserve Bank of Kansas City or the Federal Reserve System. 1. Introduction Since the influential work of Meese and Rogoff (1983, 1988), it has become common to use comparisons of out-of-sample forecasts to determine whether one variable has predictive power for another.1 Typically, this out-of-sample comparison is made in two stages. First, forecasts of the variable of interest are constructed once using a model that includes a variable with putative predictive content and then a second time excluding that variable. Second, given the two sequences of forecast errors, tests of equal forecast accuracy or forecast encompassing are conducted. This out-of-sample approach is explicitly advocated by Ashley, Granger, and Schmalensee (1980), who argue that it is more in the spirit of the definition of Granger causality to employ post-sample forecast tests than to employ the standard full-sample causality test. Although post-sample tests of this type are increasingly used, little is known about their effectiveness. Most evidence on the asymptotic and finite-sample behavior of tests of equal forecast accuracy and encompassing pertain to forecasts from non-nested models. Diebold and Mariano (1995), West (1996, 2000a, b), Harvey, Leybourne, and Newbold (1997, 1998), West and McCracken (1998), Clark (1999), Corradi, Swanson, and Olivetti (1999), and McCracken (2000) each present results for non-nested forecasts. With nested models, however, test properties are likely to differ because, under the null, the forecast errors are asymptotically the same and therefore perfectly correlated. Only two extant studies focus on results for nested models. McCracken (1999) derives the asymptotic distributions of several tests of equal forecast accuracy between two nested models. Chao, Corradi and Swanson (2000) develop an out-of-sample test of causality that resembles an encompassing test applied to forecasts from nested models. 1 In this paper we first derive the limiting distributions of tests for encompassing applied to 1-step ahead forecasts from nested linear models. The encompassing tests are those proposed by Ericsson (1992) and Harvey, Leybourne and Newbold (1998) and a new statistic developed in this paper. As in West (1996, 2000a, b), West and McCracken (1998), Chao, Corradi, and Swanson (2000), Corradi, Swanson, and Olivetti (1999), and McCracken (2000), the limiting distributions explicitly account for the uncertainty introduced by parameter estimation. In our results, when the number of observations used to generate initial estimates of the models and the number of forecast observations increase at the same rate, the limiting distributions of the tests are non-standard. We provide numerically-generated critical values for these distributions. However, when the number of forecasts increases at a slower rate than the number of observations used in the initial model estimates, the Ericsson (1992) and Harvey, Leybourne, and Newbold (1998) statistics are limiting standard normal. We then use Monte Carlo simulations to examine the finite-sample size and size-adjusted power of the encompassing tests, as well as a set of equal mean square error (MSE) tests. These Monte Carlo experiments show that, in most settings, each of the post-sample tests is reasonably well sized when the statistics are compared against the asymptotic critical values provided in this paper and in McCracken (1999). However, comparing the post-sample forecast statistics against standard normal critical values usually makes the tests undersized. The Monte Carlo simulations also show that the powers of the tests permit some simple rankings, in which the new encompassing statistic proposed in this paper is most powerful in small samples. Finally, to illustrate how the tests perform in practical settings, each test is used to determine whether the unemployment rate has predictive content for inflation in quarterly U.S. 1 Examples of studies using this methodology include Diebold and Rudebusch (1991), Amano and van Norden (1995), Chinn and Meese (1995), Mark (1995), Krueger and Kuttner (1996), Bram and Ludvigson (1998), 2 data. We find the evidence mixed, but suggestive of a relationship. While each of the equal MSE tests fail to reject the null that unemployment has no predictive content for inflation, each of the encompassing tests indicates that unemployment does have predictive power. Section 2 introduces the notation, the forecasting and testing setup, and the assumptions. Section 3 defines the forecast encompassing tests considered and provides the null asymptotic results. In the interest of brevity, proofs are provided in Clark and McCracken (2000). In Section 4 we present a Monte Carlo evaluation of the finite-sample size and power of tests of forecast encompassing and equal MSE. Section 5 uses the tests to determine whether the unemployment rate has predictive power for inflation. 2. Setup The sample of observations {y t , x '2,t }T=+11 includes a scalar random variable yt to be t predicted and a (k1 + k2 = k×1) vector of predictors x2,t = (x1,t , x '22,t )' . The sample is divided into ' in-sample and out-of-sample portions. The in-sample observations span 1 to R. Letting P denote the number of 1-step ahead predictions, the out-of-sample observations span R + 1 through R + P. The total number of observations in the sample is R + P = T + 1. Forecasts of yt+1, t = R,…,T, are generated using two linear models of the form x 'i,t +1β* , i i = 1,2, each of which is estimated. Under the null, model 2 nests the restricted model 1 and hence model 2 includes k2 excess parameters. Without loss of generality, let β* = (β1 1×k1 , 01×k2 )' . 2 *' Under the alternative hypothesis, the k2 restrictions are not true, and model 2 is correct. The forecasts are recursive, 1-step ahead predictions. Under the recursive scheme, each model’s parameters, β* i = 1,2, are estimated with added data as forecasting moves forward i Berkowitz and Giorgianni (1999), and Kilian (1999). 3 through time: for t = R,…,T, model i’s prediction of yt+1, x 'i,t +1βi,t , is created using the parameter ˆ estimate β i, t based on data from 1 to t. The largest number of observations used to estimate the ˆ model parameters is then T = R + P − 1. Asymptotic results for forecasts based on rolling and fixed schemes are provided in Clark and McCracken (2000).2 We focus on 1-step ahead forecasts because, for multi-step forecasts, the asymptotic distributions of the tests generally appear to depend on the parameters of the data-generating process.3 For practical purposes, such dependence eliminates the possibility of using asymptotically pivotal approximations to test for equal accuracy or encompassing. Given that most forecast comparisons include 1-step ahead results, our asymptotic results should be useful in many settings. For those researchers interested in multi-step horizons, bootstrap procedures, such as those developed in Ashley (1998) and Kilian (1999), may yield accurate inferences. We denote the 1-step ahead forecast errors as u1,t +1 = y t +1 − x1,t +1β1,t and u 2,t +1 = ˆ ' ˆ ˆ y t +1 − x '2,t +1β 2,t for models 1 and 2, respectively. The forecast encompassing tests are formed ˆ using these two sequences of P forecast errors. In all cases the out-of-sample statistics rely on sums of functions of these forecast errors. To simplify notation, for any variable zt+1 we let å t z t +1 denote the summation åT= R z t +1 . t Before moving to the assumptions some final notation is needed. For i = 1,2 let h i,t +1 (βi ) = (y t +1 − x 'i,t +1βi )x i,t +1 , h i,t +1 = h i,t +1 (β* ) , qi,t+1 = x i,t +1x 'i,t +1 and Bi = (Eq i,t +1 ) −1 . For any (m×n) i matrix A with elements ai,j and column vectors aj, let vec(A) denote the (mn×1) vector 2 West and McCracken (1998) discuss the rolling and fixed forecasting schemes. Clark and McCracken (2000) show that while the ENC-T and ENC-REG tests (considered below) have non-standard distributions for π > 0 under the rolling scheme, the statistics are asymptotically standard normal for π > 0 under the fixed scheme. 3 Lutkepohl and Burda (1997) note similar difficulties associated with in-sample causality tests involving multi-step horizons. 4 [a 1 , a '2 ,..., a 'n ] ' . Let W(s) denote a (k2×1) vector standard Brownian motion. Finally, under the ' null, u1,t = u2,t ≡ ut. Given the definitions and forecasting schemes described above, the following assumptions are used to derive the limiting distributions in Theorems 3.1-3.3. The assumptions are also sufficient for the results of McCracken (1999) when MSE is the measure of predictive ability. The assumptions are intended to be only sufficient, not necessary and sufficient. Assumption 1: The parameter estimates β i, t , i = 1,2, t = R,…,T, satisfy β i , t − β * = B i ( t )H i ( t ) ˆ ˆ i where B i ( t )H i ( t ) equals (t −1 å tj=1 q i, j ) −1 (t −1 å tj=1 h i, j ) . Our first assumption is that the parameters must be estimated by OLS. This restriction is imposed to ensure that the statistics in Theorems 3.1-3.3 are asymptotically pivotal. As in McCracken (1999), achieving a limiting distribution that does not depend upon the data- generating process requires that the loss function used to estimate the parameters be the same as the loss function used to measure predictive ability. Each of the statistics in Theorems 3.1-3.3 are functions of squared forecast errors. To achieve an asymptotically pivotal statistic the parameters must then be estimated using a squared error loss function. Assumption 2: Let Ut = [u t ,x '2,t -Ex '2,t ,h '2,t ,vec(h 2,t h '2,t -Eh 2,t h '2,t )' ,vec(q 2,t -Eq 2,t )' ]' . (a) EUt = 0, (b) Eq2,t < ∞ is p.d., (c) For some r > 4 Ut is uniformly Lr bounded, (d) For all t, Eu 2 = σ2 < ∞, (e) t  For some r > d > 2, Ut is strong mixing with coefficients of size −rd/(r − d), (f) Let U t denote the   vector of nonredundant elements of Ut. Then limT→∞T −1E( åT=1 U j )( åT=1 U j )' = Ω < ∞ is p.d.. j j Assumption 3: (a) Eh 2, t h '2, t = σ 2 Eq 2,t , (b) E (h 2, t | h 2, t − j , q 2, t − j , j = 1,2,...) = 0. 5 Assumptions 2 and 3 allow the application of an invariance principle and are sufficient for joint weak convergence of partial sums and averages of these partial sums to Brownian motion and integrals of these Brownian motion. Assumption 2 is directly comparable to the assumptions in Hansen (1992) and hence we are able to apply his Theorems (2.1) and (3.1). Assumption 3 is also used to ensure that the limiting distribution does not depend upon the underlying data-generating process. Assumption 4: lim P,R →∞ P / R = π, 0 < π < ∞, λ ≡ (1 + π) −1 . Assumption 4′: lim P,R →∞ P / R = π = 0, λ = 1. Assumptions 4 and 4′ introduce the alternative means by which the asymptotics are achieved. As in Ghysels and Hall (1990), West (1996), and White (1999) the limiting distribution results are derived by imposing a slightly stronger condition than simply that the sample size, T+1, becomes arbitrarily large. Here we impose the additional condition that either the numbers of in-sample (R) and out-of-sample (P) observations become arbitrarily large at the same rate (i.e. P/R → π > 0) or the number of in-sample observations become arbitrarily large relative to the number of out-of-sample observations (i.e. P/R → 0). As shown below, the asymptotics depend critically upon the value of π – that is, whether Assumption 4 or 4′ is made. 3. Tests and Asymptotic Distributions We consider two standard forecast encompassing tests – those proposed by Ericsson (1992) and Harvey, Leybourne, and Newbold (1998) – as well as one new test. West and McCracken (1998) show that another standard test, proposed by Chong and Hendry (1986), can be asymptotically normal when applied to either nested or non-nested forecasts. In our Monte 6 Carlo simulations, however, the power of the Chong and Hendry test was dominated by that of the tests described below. A related test, the out-of-sample causality statistic developed by Chao, Corradi, and Swanson (2000), is also asymptotically normal. 3.1 The ENC-T Test Drawing on the methodology of Diebold and Mariano (1995), Harvey, Leybourne, and Newbold (1998) propose a test of encompassing that uses a t-statistic for the covariance between u 1, t +1 and u 1, t +1 − u 2, t +1 . Let ct+1 = u 1, t +1 (u 1, t +1 − u 2, t +1 ) = u 1, t +1 − u 1,t +1 u 2, t +1 and c = P −1 å t c t . ˆ ˆ ˆ ˆ ˆ ˆ ˆ2 ˆ ˆ Their encompassing test, denoted ENC-T, is formed as c P −1 å t (u 1, t +1 − u 1, t +1 u 2, t +1 ) ˆ2 ˆ ˆ ENC-T = (P − 1) 1/ 2 = (P − 1) 1/ 2 . (1) P −1 å t (c t +1 − c ) 2 P −1 å t (u 1, t +1 − u 1, t +1 u 2, t +1 ) 2 − c 2 ˆ2 ˆ ˆ The term in front is (P − 1)1/ 2 rather than P1/2 because we calculate the test using standard regression methods (we regress ct+1 on a constant). Under the null that model 1 forecast encompasses model 2, the covariance between u1,t and u1,t − u2,t will be less than or equal to 0. Under the alternative that model 2 contains added information, the covariance should be positive. Hence the ENC-T test, and the other encompassing tests described below, are one-sided. Theorem 3.1: (a) Let Assumptions 1-4 hold. For ENC-T defined in (1), ENC-T →d χ1 /(χ 2 )1 / 2 where χ1 = òλ s −1 W ' (s)dW (s) and χ 2 = òλ s −2 W ' (s) W (s)ds . (b) Let Assumptions 1-3 and 4′ 1 1 hold. ENC-T →d N(0, 1). While West (1996) shows that the ENC-T statistic can be asymptotically normal for any value of π ≥ 0 when applied to non-nested forecasts, this is not the case when the models are nested. In Theorem 3.1 (a), we show that if π > 0, the ENC-T statistic has a nonstandard limiting 7 distribution. Although this null limiting distribution does not depend upon the data-generating process, it does depend on two parameters. The first is the number of excess parameters k2, which arises because the vector Brownian motion, W(s), is (k2×1). The second parameter is π, which affects the range of integration on each of the stochastic integrals through λ. In Theorem 3.1 (b), however, we show that if π = 0, the ENC-T statistic is limiting standard normal. We provide a selected set of numerically-generated asymptotic critical values for the ENC-T statistic, when π > 0, in the upper panel of Table 1.4 The reported critical values are percentiles of 5000 independent draws from the distribution of χ1 /(χ 2 )1 / 2 for a given value of k2 and π. In generating these draws, the necessary k2 Brownian motions are simulated as random walks each using an independent sequence of 10,000 i.i.d. N(0,T-1/2) increments, and the integrals are emulated by summing the relevant weighted quadratics of the random walks. The asymptotic critical values for π > 0 clearly differ from the standard normal critical values that are appropriate when π = 0. For example, with π = 1 and k2 = 1, the 90th percentile of the asymptotic distribution is 0.955, compared to 1.282 for the standard normal distribution. As π declines, the asymptotic critical values rise gradually, but remain somewhat different from standard normal values. With π = 0.2 and k2 = 1, for instance, the 90th percentile of the asymptotic distribution is 1.002. 3.2 The ENC-REG Test The forecast encompassing test proposed by Ericsson (1992) is a regression-based variant of the ENC-T test. The test statistic, denoted ENC-REG, is the t-statistic associated with the coefficient α1 from the OLS regression u 1, t +1 = α 1 (u 1, t +1 − u 2, t +1 ) + error term, which can be ˆ ˆ ˆ 4 Clark and McCracken (2000) provides more detailed tables, covering additional values of k2 and π as well as the rolling and fixed forecasting schemes. 8 expressed as P −1 å t u1,t +1 (u1,t +1 − u 2,t +1 ) ˆ ˆ ˆ ENC-REG = (P − 1) 1/ 2 . (2) P −1 å t (u1,t +1 − u 2,t +1 ) 2 (P −1 å t u1,t +1 ) − c 2 ˆ ˆ ˆ2 Theorem 3.2: Let Assumptions 1-3 and either 4 or 4′ hold. For ENC-REG defined in (2) and ENC-T defined in (1), ENC-REG − ENC-T = op(1). While the ENC-REG statistic, like the ENC-T statistic, can be asymptotically normal for any value of π ≥ 0 when applied to non-nested forecasts, this is not the case when the models are nested. Theorem 3.2 states that, with nested models, regardless of whether π > 0 or π = 0, ENC- REG and ENC-T are asymptotically equivalent under the null.5 Therefore, the asymptotic distribution of ENC-REG is non-standard when π > 0 and standard normal when π = 0. 3.3 A New Encompassing Test Because the population prediction errors from models 1 and 2 are exactly the same under the null (making c t +1 , in population, identically 0) the sample variances in the denominators of the ENC-T and ENC-REG statistics (1) and (2) are, heuristically, 0. This feature of the ENC-T and ENC-REG statistics may adversely affect the small-sample properties of the tests. Therefore, we propose a variant of the ENC-T and ENC-REG statistics in which c (the covariance between u 1, t +1 and u 1, t +1 − u 2, t + 2 ) is scaled by the variance of one of the forecast ˆ ˆ ˆ errors rather than an estimate of the variance of c. This statistic, which we refer to as the ENC- NEW test, takes the form c P −1 å t (u1,t +1 − u1,t +1u 2,t +1 ) ˆ2 ˆ ˆ ENC-NEW = P ⋅ = P⋅ . (3) MSE 2 P −1 å t u 2,t +1 ˆ2 9 Theorem 3.3: (a) Let Assumptions 1-4 hold. For ENC-NEW defined in (3) and χ1 defined in Theorem 3.1, ENC-NEW →d χ1 . (b) Let Assumptions 1-3 and 4′ hold. ENC-NEW →p 0. As with the ENC-T and ENC-REG statistics, if π > 0 the limiting distribution of ENC- NEW is non-normal when the forecasts are nested under the null. The limiting distribution of ENC-NEW is also asymptotically pivotal and dependent on the parameters k2 and π. We provide a selected set of asymptotic critical values for the ENC-NEW statistic in the lower panel of Table 1. These values were generated numerically using the limiting distribution in Theorem 3.3 (a). Theorem 3.3 (b) shows that, if π = 0, the limiting distribution of the ENC-NEW statistic is degenerate – not standard normal as in the case of the ENC-T and ENC-REG tests. As noted by Chong and Hendry (1986), when π = 0 the parameter estimates are essentially ‘known’ before the out-of-sample period begins. We would then expect the numerator of the ENC-NEW statistic to behave like its population counterpart, which is 0. The same logic does not apply to the ENC-T and ENC-REG statistics because both the numerator and the denominator of these statistics are converging to zero at the same rate.6 4. Monte Carlo Results The small-sample properties of the encompassing tests described in Section 3, as well as some tests of equal MSE, are evaluated using simulations of bivariate data-generating processes. The equal MSE tests are those for which McCracken (1999) derives the asymptotic distributions: an F-type test proposed by McCracken (MSE-F), a t-test proposed by Diebold and Mariano 5 Similarly, McCracken (1999) shows that the MSE-REG and MSE-T tests included in our Monte Carlo simulations are asymptotically equivalent. 6 Clark and McCracken (2000) show that, if π = 0, the numerator and denominator terms of the ENC-T and ENC- REG statistics are each op((P/R)1/2). 10 (1995) (MSE-T), and the Granger and Newbold (1977) t-test (MSE-REG).7 While the analysis is focused on testing ex-ante forecasts for equal accuracy and encompassing, for the sake of comparison we also provide results for the standard full-sample F-test of Granger causality (GC). In these simulations, we compare the predictive ability of an AR model (model 1) with that from a VAR model (model 2). The presented results are based on data generated using standard normal disturbances. The results are essentially unchanged when the disturbances are drawn from the heavier-tailed t(6) distribution considered by Diebold and Mariano (1995), Harvey, Leybourne, and Newbold (1997, 1998), and Clark (1999). The forecasts in our presented results are recursive; using rolling and fixed forecasts generally produces the same results.8 4.1 Experiment Design In the presented results, data are generated using two different models. The first, denoted DGP-I, takes the form æ y t ö æ 0.3 b öæ y t −1 ö æ u y , t ö ç ÷=ç ç x ÷ ç0 ÷ç ÷+ç ÷. (4) è tø è 0.5 ÷ç x t −1 ÷ ç u x , t ÷ øè ø è ø The second, denoted DGP-II, takes the form æ y t ö æ 0.3 b öæ y t −1 ö æ 0.3 0 öæ y t − 2 ö æ u y, t ö ç x ÷ ç 0.7 − 0.5 ÷ç x ÷ + ç 0.3 0 ÷ç x ÷ + ç u ÷ . ç ÷=ç ÷ç ÷ ç ÷ç ÷ ç ÷ (5) è tø è øè t −1 ø è øè t − 2 ø è x , t ø In both cases, yt is the predictand, xt is an auxiliary variable, and the disturbances are i.i.d. standard normal random variates. To evaluate size, the coefficient b is set at 0. In this case, the 7 Because the models are nested, the null hypothesis is Eu 12, t +1 ≤ Eu 2 , t +1 and the alternative is Eu 1, t +1 > Eu 2 , t +1 . The 2 2 2 alternative is one-sided because, if the restrictions imposed on model 1 are not true, there is no reason to expect forecasts from model 1 to be superior to those from model 2. 8 The key exception is that, with fixed forecasts, comparing the ENC-T and ENC-REG tests against standard normal critical values does not produce undersized tests. With fixed forecasts, the ENC-T and ENC-REG statistics are asymptotically standard normal for both π > 0 and π = 0. 11 AR and VAR models have equal MSE and forecasts from the AR model encompass those from the VAR. To evaluate power, b is set at 0.1 and 0.2. In these power experiments, the VAR forecasts of yt+1 have lower MSE than the AR forecasts, and the AR forecast does not encompass the VAR forecast. Simulations based on several other DGPs, including the empirical inflation and unemployment model considered in Section 5, produced similar results. In each Monte Carlo simulation we generate R + P + 4 observations. The additional four observations allow for data-determined lag lengths in the forecasting models. After drawing initial observations from the unconditional normal distribution implied by the DGP, the remaining observations are constructed iteratively using the autoregressive model structure and draws of the error terms from the standard normal distribution. After reserving observations 1 through 4 to allow for a maximum of four data-determined lags, the in-sample period spans observations 5 through R + 4. The estimated forecasting models are used to form P 1-step ahead, recursive predictions, spanning observations R + 5 through R + P + 4. We have generated results based on a variety of methods for determining the lag lengths of the estimated models. Specifically, we consider simply fixing the lag length at the true order of the DGP as well as using AIC, SIC, and a last significant lag criterion to determine the optimal lag.9 In employing the data-based methods, we use a particular criterion to determine the optimal lag length for the estimated VAR model and then impose the same order on the estimated AR model.10 The lag lengths of the models used for forecasting were determined using only the in-sample portion of the data. However, the estimated model underlying each GC test uses a lag length determined from the full sample of R + P observations. 9 We have also examined results based on simply fixing the lag length at 4, which yields similar results, except that all tests have lower power. Our last significant lag criterion is the general-to-specific Wald test described in Hall (1994) and Ng and Perron (1995). 10 Setting the lag length based on just the equation for yt yields similar results. 12 In most, although not all, instances, our basic results are not sensitive to the lag selection method. Accordingly, we focus the discussion on results based on setting the lag lengths at the true order. We then include a brief discussion comparing results across lag selection methods. Because the AIC and last significant lag criteria yield similar results, this brief discussion is focused on the performance of AIC and SIC. In our Monte Carlo experiments, the ENC-NEW and MSE-F test results are based on comparing the statistics against the asymptotic critical values provided in Table 1 and McCracken (1999), respectively. For the ENC-T, ENC-REG, MSE-T and MSE-REG tests, we report two sets of results: one based on the asymptotic critical values reported in Table 1 or McCracken (1999) and another based on standard normal critical values. For these four tests, the true asymptotic critical values are standard normal only when π = 0. In our experiments, π ≡ ˆ P/R is non-zero, but sometimes small. Our experiments address whether using a standard normal approximation is accurate when π is small. ˆ Results are reported for empirically relevant combinations of P and R such that π takes ˆ the values 0.1, 0.2, 0.4, 1.0, 2.0, 3.0, or 5.0. Specifically, we use R = 50 with P = 100, 150, and 200; R = 100 with P = 10, 20, 40, 100, and 200; and R = 200 with P = 20, 40, 80, and 200. 4.2 Size Results Table 2 presents the empirical sizes of Granger causality, equal forecast accuracy, and forecast encompassing tests for data from DGP-I and DGP-II, using a nominal size of 10%. Table 3 presents a selected set of comparable results based on data-determined lag lengths. The results are generally the same at a nominal size of 5%. Three general results are evident. Size result 1. In most settings the post-sample tests have reasonable finite-sample size properties when compared against asymptotic critical values for π = π ≡ P / R . Specifically, the ˆ 13 MSE-F, MSE-REG, ENC-NEW, and ENC-REG tests perform well, suffering only slight size distortions in finite samples. For example, with DGP-I, R = 100, and P = 20, these four tests have empirical sizes of 10.7%, 11.7%, 11.0%, and 11.8% respectively. While the MSE-T and ENC-T statistics also perform reasonably well, when P is small the tests suffer slightly greater distortions than do the MSE-REG and ENC-REG tests. For instance, using DGP-I, R = 100, and P = 10, the MSE-T test has an actual size of 15.4% while MSE-REG has an actual size of 13.0%. The better performance of MSE-REG and ENC-REG likely stems from the regression forms of the tests using more precise variance estimates.11 For example, the variance term in the denominator of the ENC-REG test (2) uses a product of second moments, whereas the ENC-T test (1) uses a sample fourth moment. In general, given R, the size distortions of the post-sample tests fall as P rises. For instance, when data are generated using DGP-I with R = 100 and P = 10, actual size ranges from 11.3% to 15.4%. When P increases to 100, actual size ranges from 10.4% to 11.0%. Note that the absence of any size distortions in the results for R = 50 reflects the fact that P is large. Size result 2. Comparing the MSE-T, MSE-REG, ENC-T, and ENC-REG tests against standard normal critical values generally leads to too-infrequent rejections. The problem is most severe for the MSE-T and MSE-REG tests. For instance, under DGP-II with R = 100, P = 20 and when standard normal critical values are used, the MSE-T and MSE-REG tests yield sizes of 5.6% and 4.7%, respectively. In accordance with the theory, for a given R, the tests become more undersized as P rises. In the same example, but with P = 100, the size of the MSE-T and MSE-REG tests fall to 1.4% and 1.3%, respectively. For small values of π ≡ P / R , whether the asymptotic critical values for π = π or ˆ ˆ 11 We also find that MSE-REG and ENC-REG have better size than MSE-T and ENC-T in simulations with t(6)- distributed innovations. 14 standard normal critical values associated with π = 0 provide better finite-sample results is largely a matter of individual judgment.12 As the results in Table 2 show, when P is relatively small, using the critical values in Table 1 and McCracken (1999) yields slightly over-sized tests, while using standard normal critical values yields slightly under-sized tests. Again, though, standard normal critical values are a better approximation for the empirical distributions of the ENC-T and ENC-REG statistics than of the MSE-T and MSE-REG statistics. This second size result, as well as the first, continues to hold when data-based methods are used to determine the lag length, as long as lag selection is reasonably accurate. For example, with DGP-I and R = 100, AIC and SIC select the true lag about 86% and 99.8% of the time, respectively. Accordingly, as evident from a comparison between the data-determined lag length results in Table 3 and the fixed-lag results in Table 2, with DGP-I there are few differences in the sizes of the forecast tests across lag selection methods. Similarly, with DGP-II and R = 200, AIC is sufficiently accurate that the sizes of the forecast tests are essentially the same when the lag is data-determined as when it is fixed at the true order. Our final size result addresses some differences that arise when lag selection is less accurate. Size result 3. When data-based lag selection is sufficiently imprecise, size performance deteriorates. In the case of DGP-II, the true model for yt is an AR(2). However, because the population correlation between xt-1 and yt-2 is large (0.57), data-based procedures often select a lag of 1 (for an estimated VAR in xt and yt). For example, when R = 100, AIC selects a lag of 1 in about 13% of the DGP-II simulations, while SIC selects a lag of 1 with a frequency of 67%. When R = 200, AIC selects a lag of 1 with a frequency of just 0.6%, while SIC selects a lag of 1 12 Some unreported results show that while the 90th and 95th percentiles of the empirical distribution very roughly approximate the corresponding percentiles of the standard normal distribution, the null of normality of the empirical 15 with a frequency of 24%. The difficulties in selecting the lag length in DGP-II simulations create modest-to- substantial size distortions in the forecast tests, with SIC producing the largest distortions.13 Table 3 shows that, when R = 100 and P = 40, using AIC to determine lag length makes the size of the ENC-NEW test 16.2%, while using SIC makes the size 34.4%. While increasing R to 200 eliminates the distortions in AIC-based tests (compared to tests based on the true lag length), modest distortions remain in SIC-based tests. For instance, the ENC-NEW test has a size of 22.5% when R = 200, P = 40, and the lag is selected using SIC. Our analysis of different lag selection procedures also shows that data-based methods can create size distortions in the GC test that rival and sometimes exceed those of the post-sample tests. For example, as reported in Table 3, in the experiment using DGP-I, R = 100, P = 40, and AIC, the GC test has empirical size of 13.3%, compared to a range of 10.2% to 12.0% for the post-sample tests.14 Similarly, in the experiment with DGP-II, R = 200, and P = 40, the AIC- based GC test has size of 13.0%, compared to a range of 10.6% to 12.0% for the post-sample tests. While using the SIC to select lag length makes the GC test correctly sized in experiments with DGP-I, in experiments with DGP-II the SIC-based GC test often suffers size distortions that rival or exceed those of the post-sample tests. For example, with R = 100 and P = 40, the GC test has size of 33.1%, compared to a range of 25.2% to 34.4% for the post-sample tests. Some other evidence suggests that the size advantage of post-sample tests may be even distribution of each tests is strongly rejected for P/R = 0.1 or 0.2. 13 The distortions do not decline as P rises. For instance, when R = 100 and P = 100, the AIC- and SIC-based versions of the ENC-NEW test have sizes of 18.1% and 47.0%, respectively. 14 For both R = 100 and R = 200 the size of the AIC-based GC test remains at about 13% as P is increased. The persistent size distortions in the GC test reflect the pre-test bias created by using the same sample of data to first (over-) fit the lag length and then test causality. If the first half of the sample is used just for lag selection, and the GC test is computed using just the second half of the sample with a lag length determined with the first half of the sample, the test is correctly sized. The test is also correctly sized if the full sample is used in the GC test with the lag is simply fixed at an order greater than the true order of the DGP. 16 larger when more data mining is involved in choosing the lag length of the VAR in xt and yt. Yet another, more data-intensive approach to model selection is to allow the lags on yt and xt in the nesting equation for yt (i.e., model 2) to differ, and then choose the lag combination that minimizes the AIC for that equation. Using this approach to lag selection, when the DGP is DGP-I, R = 100, and P = 20, the GC test has actual size of 20.2%, while the MSE-F and ENC- NEW tests have size of 11.6% and 13.5%, respectively. 4.3 Power Results Tables 4 and 5 present results on the power of forecast encompassing, equal forecast accuracy, and Granger causality tests. Because the tests are, to varying degrees, subject to size distortions, the reported power figures are based on empirical critical values and therefore size- adjusted.15 The actual size of the tests is 10%; using 5% produces essentially the same results. Two general results are evident in Tables 4 and 5. Power result 1. The small-sample powers of the tests generally permit some simple rankings: ENC-NEW > MSE-F, ENC-T, ENC-REG > MSE-T, MSE-REG. In our experiments, the ENC-NEW test is clearly the most powerful out-of-sample test of predictive ability. In some settings, the power of the ENC-NEW statistic rivals the power of the GC test, even though the GC test is based on many more observations (R + P rather than P). For example, as shown in the lower panel of Table 4, in simulations with DGP-II, b = 0.1, R = 100, and P = 40, the ENC-NEW test has power of 26.4%, comparable to the GC test’s power of 31.0%. The MSE-F, ENC-T, and ENC-REG tests are less powerful than the ENC-NEW test. Using the experiment of the previous example, the MSE-F, ENC-T, and ENC-REG tests have power of 22.8%, 22.3%, and 22.8%, respectively. The MSE-T and MSE-REG tests are less powerful than the other tests. 17 Power result 2. Increasing the number of observations affects the powers of the tests in two basic ways. First, holding P fixed, the powers of the post-sample tests tend to rise with R, although more for some tests than others.16 For instance, as shown in the upper panel of Table 4, with DGP-I and P = 40, the power of the ENC-NEW test rises from 33.2% when R = 100 to 41.4% when R = 200. Second, when R is held fixed, power rises with P. For example, Table 5 shows that, in the DGP-I experiment with R = 100 and b = 0.2, the power of the MSE-F test rises from 41.1% when P = 10 to 77.7% when P = 100. The powers of the three tests for equal MSE converge as P becomes large, and the same happens for the three encompassing tests. Our two key power results still hold when data-based methods are used to determine the lag length. With DGP-I, SIC-based power is virtually the same as when the lag is set at the true order of the DGP; AIC-based power is the same to slightly lower. For example, in the DGP-I experiment with R = 100, P = 40, and b = 0.1, using the SIC yields power of 33.2% for the ENC- NEW test, while using the AIC yields power of 31.2%. With DGP-II, our two key power results hold for all lag selection methods, but the lag selection problems discussed above give the SIC a power advantage. For instance, in the DGP-II experiment with R = 100, P = 40, and b = 0.1, the SIC-based ENC-NEW test has power of 42.0% for the ENC-NEW test, while the AIC-based test has power of 27.8%. 5. Empirical Example In this section we use tests of forecast encompassing, equal forecast accuracy, and Granger causality to determine whether the prime-age male unemployment rate is useful in predicting core CPI inflation. Cecchetti (1995), Staiger, Stock, and Watson (1997), and Stock 15 In results allowing data-determined lags in a given experiment, the test statistic in simulation i, for which the selected lag is j, is compared against the distribution of test statistics from the set of corresponding size simulations in which the lag was selected to be j. 18 and Watson (1999) are recent examples of studies in the long literature on this basic question. Our quarterly data, which begin in 1957:Q1, are divided into in-sample and out-of- sample portions so as to produce a π ≡ P/R value for which this paper and McCracken (1999) ˆ report corresponding asymptotic critical values. After we allow for data differencing and a maximum of four data-determined lags, the in-sample period spans 1958:Q3-1987:Q1. This leaves a total of R = 115 observations. The out-of-sample period spans 1987:Q2-1998:Q3, yielding a total of P = 46 1-step ahead predictions. For this split, π = 0.4. ˆ Consistent with the results of augmented Dickey-Fuller tests for unit roots, our model variables are the change in inflation and the change in the unemployment rate. Over the in- sample period, AIC is minimized at two lags for both the AR and the VAR. The test statistics are compared against asymptotic critical values for π = 0.4 from Table 1 and McCracken (1999) and empirical critical values generated from Monte Carlo simulations of the estimated inflation- unemployment model in which the null of no causality from unemployment to inflation is imposed. As can be seen from the critical values reported in the lower panel of Table 6, the asymptotic critical values for π = 0.4 provide a good approximation to the empirical critical values – a better approximation than provided by the standard normal critical values that are appropriate for π = 0. The upper panel of Table 6 reports in-sample estimates of an AR(2) fit to changes in core CPI inflation and a VAR(2) fit to changes in core CPI inflation and prime-age male unemployment. In the in-sample model estimates, unemployment clearly has predictive power for inflation. Moreover, the full-sample GC test reported in the lower panel of the table strongly rejects the null of no causality from unemployment to inflation. 16 However, in a few cases, the powers of the MSE-T and MSE-REG tests decline as R rises given P. 19 Although weaker, the out-of-sample evidence also indicates unemployment has predictive power for inflation. As reported in the lower panel of Table 6, all of the encompassing tests indicate that the change in unemployment has predictive content for the change in inflation. The ENC-NEW test strongly rejects the null that the AR forecast encompasses the VAR forecast. The ENC-REG test clearly rejects, while the ENC-T test marginally rejects. None of the tests for equal MSE reject the null of equal accuracy. Two factors may account for the difference in the strength of the in-sample and post- sample evidence. One is simply power differences – some of the post-sample tests may not be powerful enough to pick up unemployment’s predictive content. The Monte Carlo results in Section 4 indicate that the power of equal forecast accuracy tests, such as MSE-F, lag behind the power of encompassing counterparts like the ENC-NEW test, which has power rivaling that of the GC test. The second factor is model instabilities. Neither the AR model for inflation nor the VAR pass the supremum Wald or exponential Wald tests for stability developed in Andrews (1993) and Andrews and Ploberger (1994), respectively. 6. Conclusions In this paper we first derive the limiting distributions of two standard tests and one new test of forecast encompassing applied to 1-step ahead predictions from nested linear models. We show that the tests have non-standard distributions when the number of observations used to generate initial estimates of the models and the number of forecast observations increase at the same rate. We then provide numerically-generated critical values for these distributions. We also show that the two standard tests are limiting standard normal when the number of forecasts increases at a slower rate than the number of observations used in the initial model estimates. We then use Monte Carlo experiments to examine the finite-sample size and size- 20 adjusted power of equal accuracy and encompassing tests. These experiments yield three essential results. First, the post-sample tests are, in general, reasonably well sized when one uses the critical values provided in this paper. Second, when standard normal critical values are used the post-sample tests are undersized. Third, the encompassing test proposed in this paper (the ENC-NEW statistic defined in equation (3)) is most powerful. In the final part of our analysis, we find that the post-sample tests provide mixed, but suggestive, evidence on the predictive content of unemployment for inflation. While all of the equal forecast accuracy tests fail to reject the null that unemployment has no predictive content for inflation, each of the encompassing tests indicates that unemployment does have predictive power. Since encompassing tests appear to have a power advantage in finite samples, unemployment probably does have some predictive value. 21 References Amano, R.A., and S. van Norden, 1995, Terms of trade and real exchange rates: The Canadian evidence, Journal of International Money and Finance 14, 83-104. Andrews, D.W.K., 1993, Tests for parameter instability and structural change with unknown change point, Econometrica 61, 821-56. Andrews, D.W.K. and W. Ploberger, 1994, Optimal tests when a nuisance parameter is present only under the alternative, Econometrica 62, 1383-1414. Ashley, R., 1998, A new technique for postsample model selection and validation, Journal of Economic Dynamics and Control 22, 647-665. Ashley, R., C.W.J. Granger and R. Schmalensee, 1980, Advertising and aggregate consumption: An analysis of causality, Econometrica 48, 1149-67. Berkowitz, J. and L. Giorgianni, 1999, Long-horizon exchange rate predictability, Review of Economics and Statistics, Forthcoming. Bram, J. and S. Ludvigson, 1998, Does consumer confidence forecast household expenditure? A sentiment horse race, Economic Policy Review, Federal Reserve Bank of New York, June, 59-78. Cecchetti, S.G., 1995, Inflation indicators and inflation policy, NBER macroeconomics annual, 189-219. Chao, J., V. Corradi and N. Swanson, 2000, An out of sample test for Granger causality, Macroeconomic Dynamics, forthcoming. Chinn, M.D. and R.A. Meese, 1995, Banking on currency forecasts: How predictable is change in money?, Journal of International Economics 38, 161-178. Chong, Y.Y. and D.F. Hendry, 1986, Econometric evaluation of linear macroeconomic models, Review of Economic Studies 53, 671-90. Clark, T.E., 1999, Finite-sample properties of tests for equal forecast accuracy, Journal of Forecasting 18, 489-504. Clark, T.E. and M.W. McCracken, 2000, Not-for-publication appendix to ‘Tests of equal forecast accuracy and encompassing for nested models’, manuscript, Federal Reserve Bank of Kansas City (available at http://www.kc.frb.org/econres/staff/tec.htm). Corradi, V., N.R. Swanson and C. Olivetti, 1999, Predictive ability with cointegrated variables, manuscript, Texas A & M University. 22 Diebold, F.X. and R.S. Mariano, 1995, Comparing predictive accuracy, Journal of Business and Economic Statistics 13, 253-63. Diebold, F.X. and G.D. Rudebusch, 1991, Forecasting output with the composite leading index: A real time analysis, Journal of the American Statistical Association 86, 603-610. Ericsson, N.R., 1992, Parameter constancy, mean square forecast errors, and measuring forecast performance: An exposition, extensions, and illustration, Journal of Policy Modeling 14, 465-95. Ghysels, E. and A. Hall, 1990, A test for structural stability of Euler conditions parameters estimated via the generalized method of moments estimator, International Economic Review 31, 355-64. Granger, C.W.J. and P. Newbold, 1977, Forecasting Economic Time Series (Academic Press, Orlando, FL). Hall, A., 1994, Testing for a unit root in time series with pretest data based model selection, Journal of Business and Economics Statistics 12, 461-70. Hansen, B.E., 1992, Convergence to stochastic integrals for dependent heterogeneous processes, Econometric Theory 8, 489-500. Harvey, D.I., S.J. Leybourne and P. Newbold, 1997, Testing the equality of prediction mean squared errors, International Journal of Forecasting 13, 281-91. Harvey, D.I., S.J. Leybourne and P. Newbold, 1998, Tests for forecast encompassing, Journal of Business and Economic Statistics 16, 254-59. Kilian, L., 1999, Exchange rates and monetary fundamentals: What do we learn from long- horizon regressions?, Journal of Applied Econometrics, Forthcoming. Krueger, J.T. and K.N. Kuttner, 1996, The Fed funds futures rate as a predictor of Federal Reserve policy, Journal of Futures Markets 16, 865-79. Lutkepohl, H. and M.M. Burda, 1997, Modified Wald tests under nonregular conditions, Journal of Econometrics 78, 315-332. Mark, N.C., 1995, Exchange rates and fundamentals: Evidence on long-horizon predictability, American Economic Review 85, 201-18. McCracken, M.W., 2000, Robust out of sample inference, Journal of Econometrics, forthcoming. McCracken, M.W., 1999, Asymptotics for out-of-sample tests of causality, manuscript, Louisiana State University. Meese, R.A. and K. Rogoff, 1983, Empirical exchange rate models of the seventies: Do they fit 23 out of sample?, Journal of International Economics 14, 3-24. Meese, R.A. and K. Rogoff, 1988, Was it real? The exchange rate-interest differential relation over the modern floating-rate period, Journal of Finance 43, 933-948. Ng, S. and P. Perron, 1995, Unit root rests in ARMA models with data-dependent methods for the selection of the truncation lag, Journal of the American Statistical Association 90, 268-81. Staiger, D., J.H. Stock and M.W. Watson, 1997, The NAIRU, unemployment and monetary policy, Journal of Economic Perspectives 11, 33-49. Stock, J.H. and M.W. Watson, 1999, Forecasting inflation, NBER Working Paper #7023. West, K.D., 1996, Asymptotic inference about predictive ability, Econometrica 64, 1067-84. West, K.D., 2000a, Tests for forecast encompassing when forecasts depend on estimated regression parameters, Journal of Business and Economic Statistics, forthcoming. West, K.D., 2000b, Encompassing tests when no model is encompassing, manuscript, University of Wisconsin. West, K.D. and M.W. McCracken, 1998, Regression-based tests of predictive ability, International Economic Review 39, 817-40. White, H., 1999, A reality check for data snooping, Econometrica, forthcoming. 24 qsAj‚ d _‚„{‚t$j‚R BP Y‚ eW]Tq eW]Tiep stI eW]TWeG 7s$R${R $ k (U i‚{Z„R$%‚ 7{Y‚8‚ $) >1 0T$j‚ d 1 ; d( 1( n( x( eW]Tq stI eW]Tiep d (x d;11 dnE( dnnM dnnd dn11 dn1 dnnE (( d(xE d((1 d((x xx n n 11 1 (x dx(x d;E d;;x d;dn d;;n d;( dnM( (( ddEE dd(d d(ME d(EE d(nx d(n; d(1M n (x dx; dx1x dx1 d;E d;n d;E d;nE (( d11 ddnM dd(x dddn ddd; d(Mn d(; ; (x dx; dxE dxx1 d;En d;Md d;; d;;x (( d1d ddx dd1 ddn1 dddd d(d d(( eW]TWeG d (x x1( ;; d( dxM; 1(Mx 1n; 1EMx (( nnx ;n EMx M; d1M( d;;1 dE( 1 (x EE d(1M d;Md 11n; 1MM n1n nE1 (( x1; dE d(d d;d dd; 1(; 1;1M n (x ;( d1n dMEx 1( nxE; nM ;nM; (( EME M( d1Mx d(x 1nEE 1EE; ndn1 ; (x d(E( dx1E 1dMd n(( nM; ;x;1 ;x (( E d(E1 dx1M 1dE 11 n(n1 nxdn WB‚RU d qY‚ ‚R Rs$R${R eW]Tq eW]Tiep stI eW]TWeG s„‚ I‚ht‚I $t 7‚{$Bt n 1 qY‚ Z}}‚„ }st‚j BP qsAj‚ d „‚}B„R ‚R$8s‚R BP Y‚ (Y stI xY }‚„{‚t$j‚R BP Y‚ sRk8}B${ I$R„$AZ$Bt BP ABY Y‚ eW]Tq stI eW]Tiep Rs$R${R bY‚t Y‚ „‚{Z„R$%‚ R{Y‚8‚ $R ZR‚I stI $ k ( qY‚ jBb‚„ }st‚j „‚}B„R Y‚ {B„„‚R}BtI$t }‚„{‚t$j‚R BP Y‚ sRk8}B${ I$R„$AZ$Bt BP Y‚ eW]TWeG Rs$R${ n qY‚ ‚R$8s‚R b‚„‚ {BtR„Z{‚I AsR‚I Z}Bt x((( R$8Zjs‚I I„sbR P„B8 Y‚ „‚j‚%st I$R„$AZ$Bt PB„ s $%‚t %sjZ‚ BP ABY >1 stI $ 7‚‚ 7‚{$Bt nd BP Y‚ ‚- PB„ PZ„Y‚„ I‚s$j Bt YBb Y‚ R$8Zjs$BtR b‚„‚ {BtIZ{‚I 1x qsAj‚ 1 e8}$„${sj 7$9‚ i‚{Z„R$%‚ 3B„‚{sRR WB8$tsj 7$9‚ ) d(0 p ) x( p ) d(( p ) 1(( h )d(( h )dx( h )1x( h )d( h )1( h );( h )d(( h )1(( h )1( h );( h )M( h )1(( *p_Ty q‚RR ]B8}s„‚I Vs$tR VRk8}B${ ]„$${sj zsjZ‚R PB„ $ ) $ s hbp | 7eT3 (M ( (d ddn d( d(( d(E ( d( d(; (E d(1 7eTq d(( ( (1 dx; dn1 ddM d( ( dnn d1d dd( d(n 7eTiep ( (E (1 dn( dd dd( d(; (M dd ddd d(; d(d eW]TWeG d(; d(; d(x ddM dd( d(n d(x d(( ddd d(E ( d(d eW]Tq d( d(x d(d dxd dnx ddM dd( d(; dnd d1n d(M d(; eW]Tiep d(1 d(1 ( d1M ddM d( d(E d(d dd ddn d(1 d(d p] d(n d(; d(d d(n d(1 d(n d(; d(1 d(n ( d(d d(d q‚RR ]B8}s„‚I Vs$tR 7stIs„I WB„8sj ]„$${sj zsjZ‚R 7eTq (dd (( ((; (MM (E( (;( (1( (d( (n (x1 (nx (dM 7eTiep (d( (( ((; (d (; (n; (dM (( (E1 (;x (n1 (dE eW]Tq (Ed (x (x; dd( (( (E (Ex (x (n ( (EM (E( eW]Tiep (xE (xx (x1 (n (E (EM (E( (xx (M1 (d (En (x *p_Tyy q‚RR ]B8}s„‚I Vs$tR VRk8}B${ ]„$${sj zsjZ‚R PB„ $ ) $ s hbp | 7eT3 (; (; (x dd; dd1 d(E d(1 ( d(E d(M d(; d(d 7eTq (x (; (x dn d1E ddM d(; ( d1n ddE dd( d(1 7eTiep (x (n (x d1( dd; dd( d(1 (M d( d(M d( d(d eW]TWeG d(1 d( d(; d1n ddx d(M d(x d(( dd( dd( d(1 d(; eW]Tq d(x d(; d(1 dn dn( ddM d(E d(E d1d dd dd( d(n eW]Tiep d(( d(( d(( dd ddx d( d(d d(n d(M d( d(; d(( p] ( ( d(( d(d (M d(d d(( ( ( ( (M ( q‚RR ]B8}s„‚I Vs$tR 7stIs„I WB„8sj ]„$${sj zsjZ‚R 7eTq (( ((; ((1 (M1 (xE (nn (d; ((E (E (; (1 (d1 7eTiep ((x ((n ((1 (EM (; (1M (dn ((x (Ed (;n (1E (dd eW]Tq (E (EM (Ex ddx (M (M; (; (E d(d (MM ( (( eW]Tiep (Ex (E; (En d(( (Mx (x (E (Ex (( ( (1 (E WB‚RU d qY‚ Iss ‚t‚„s$t }„B{‚RR‚R *p_Ty stI *p_Tyy s„‚ I‚ht‚I $t ‚JZs$BtR w;D stI wxD yt Y‚R‚ R$9‚ ‚-}‚„$8‚tR Y‚ {B‚^{$‚t K $t ‚s{Y *p_ $R R‚ B ( yt ‚s{Y R$8Zjs$Bt duR‚} sY‚sI PB„‚{sRR BP r s„‚ PB„8‚I P„B8 st ‚R$8s‚I Vi 8BI‚j PB„ r stI st ‚R$8s‚I zVi $t r stI ? 1 yt ‚s{Y R$8Zjs$Bt Y‚ js j‚tYR BP Y‚ ‚R$8s‚I 8BI‚jR s„‚ R‚ s Y‚ „Z‚ js B„I‚„ BP Y‚ *p_ n p stI h „‚P‚„ B Y‚ tZ8A‚„ BP $tuRs8}j‚ BAR‚„%s$BtR stI }BRuRs8}j‚ }„‚I${$BtR „‚R}‚{$%‚jk ; 7‚{$BtR n stI ; $t Y‚ ‚- I‚R{„$A‚ Y‚ ‚R Rs$R${R yt Y‚ „‚RZjR AsR‚I Bt sRk8}B${ {„$${sj %sjZ‚R PB„ $ ) $ s hbp Y‚ Rs$R${R s„‚ {B8}s„‚I ss$tR {„$${sj %sjZ‚R s,‚t P„B8 qsAj‚ d stI {]„s{,‚t wdD yt Y‚ | „‚RZjR AsR‚I Bt RstIs„I tB„8sj {„$${sj %sjZ‚R Y‚ Rs$R${R s„‚ {B8}s„‚I ss$tR Y‚ sRk8}B${ I$R„$AZ$Bt BP Y‚ ‚RR PB„ $ ) ( x qY‚ tZ8A‚„ BP R$8Zjs$BtR $R x(((( 1E qsAj‚ n 7‚j‚{‚I i‚RZjR !t e8}$„${sj 7$9‚ GY‚t eR$8s‚I BI‚j NsR V„‚ *ssT*‚‚„8$t‚I i‚{Z„R$%‚ 3B„‚{sRR WB8$tsj 7$9‚ ) d(0 p ) d(( h ) 1( p ) d(( h ) ;( p ) d(( h ) d(( p ) 1(( h ) 1( p ) 1(( h ) ;( p ) 1(( h ) 1(( Vy] 7y] Vy] 7y] Vy] 7y] Vy] 7y] Vy] 7y] Vy] 7y] *p_Ty q‚RR ]B8}s„‚I Vs$tR VRk8}B${ ]„$${sj zsjZ‚R PB„ $ ) $ | s hbp 7eT3 dd( d( d(1 d(( d(n d(E dd( d( d( d(x d(( d(1 7eTq d1M dn1 ddE ddM d(1 d(E dn( dnn ddM d1d d(d d(n 7eTiep dd; dd d( dd( d(( d(; ddE dd d( ddd ( d(d eW]TWeG ddM dd( dd( d(n dd( d(x ddM ddd ddn d(E d(E d(d eW]Tq dn; dnx d1( ddM dd( dd( d1 dnd d1n d1n d(x d(; eW]Tiep dd ddM d(M d( d(x d(E ddx dd ddn ddn d(1 d(d p] dn; d(1 dnn d(; dn; d(; dnn d(n d1 ( d1M d(d q‚RR ]B8}s„‚I Vs$tR 7stIs„I WB„8sj ]„$${sj zsjZ‚R 7eTq (xM (E( (n (;( (dM (1( (d (n (x( (x1 (d (dM 7eTiep (; (; (nn (n; (d (dM (E( (E1 (;; (;x (dx (dE eW]Tq (d (( ( (E (E (Ex (; (n (Md ( (E1 (E( eW]Tiep (M (E (d (EM (E1 (E( (Mn (M1 (n (d (E( (x *p_Tyy q‚RR ]B8}s„‚I Vs$tR VRk8}B${ ]„$${sj zsjZ‚R PB„ $ ) $ s hbp | 7eT3 d;x 1xM d;; 11 dx ;d( ddd dM( dd1 d d(n 1En 7eTq d;; 11 d;n 1Ed dxn nx d1n dEn dd; dn d(1 1;; 7eTiep dn( 1( dnE 1x1 dxd n( d( d;M d(E dEn d(d 1;1 eW]TWeG dx 1 dE1 n;; dMd ;( dd 1(x ddM 11x dd1 1E eW]Tq dx 1E; dx n( d1 ;;n d1n dE d1( 1(( d( 1M1 eW]Tiep d;1 1;; d;E 1; dEM ;n dd( dE1 ddd dM d(x 1 p] dE1 nn dxn nnd dn1 1xd d1M 1d dn( d1 d1 d(x q‚RR ]B8}s„‚I Vs$tR 7stIs„I WB„8sj ]„$${sj zsjZ‚R 7eTq (EE dd (;M d1( (;( d; (EM (x (;M (Mx (d; ( 7eTiep (xE d(n (;n d( (n d;( (Ed (Mx (;1 (M (d1 (x eW]Tq dd dM dd 1;( dnd nE1 d(n d;n (( dx; (x 1n eW]Tiep d(E dM( d(M 11E d1E nx; (1 dnd (M1 d;; (1 1n; WB‚RU d qY‚ Iss ‚t‚„s$t }„B{‚RR‚R *p_Ty stI *p_Tyy s„‚ I‚ht‚I $t ‚JZs$BtR w;D stI wxD yt Y‚R‚ R$9‚ ‚-}‚„$8‚tR Y‚ {B‚^{$‚t K $t ‚s{Y *p_ $R R‚ B ( yt ‚s{Y R$8Zjs$Bt duR‚} sY‚sI PB„‚{sRR BP r s„‚ PB„8‚I P„B8 st ‚R$8s‚I Vi 8BI‚j PB„ r stI st ‚R$8s‚I zVi $t r stI ? 1 qY‚ sAj‚ „‚}B„R R$9‚ „‚RZjR AsR‚I Bt bB I$&‚„‚t s}}„Bs{Y‚R B R‚$t Y‚ js j‚tYR BP Y‚ 8BI‚jR ‚R$8s‚I $t ‚s{Y R$8Zjs$BtU w$D R‚$t js j‚tY s Y‚ B„I‚„ 8$t$8$9$t Vy] PB„ Y‚ ‚R$8s‚I zVi stI w$$D R‚$t js j‚tY s Y‚ B„I‚„ 8$t$8$9$t 7y] PB„ Y‚ ‚R$8s‚I zVi n p stI h „‚P‚„ B Y‚ tZ8A‚„ BP $tuRs8}j‚ BAR‚„%s$BtR stI }BRuRs8}j‚ }„‚I${$BtR „‚R}‚{$%‚jk ; 7‚{$BtR n stI ; $t Y‚ ‚- I‚R{„$A‚ Y‚ ‚R Rs$R${R yt Y‚ „‚RZjR AsR‚I Bt sRk8}B${ {„$${sj %sjZ‚R PB„ $ ) $ s hbp Y‚ Rs$R${R s„‚ {B8}s„‚I ss$tR {„$${sj %sjZ‚R s,‚t P„B8 qsAj‚ d stI {]„s{,‚t wdD yt Y‚ | „‚RZjR AsR‚I Bt RstIs„I tB„8sj {„$${sj %sjZ‚R Y‚ Rs$R${R s„‚ {B8}s„‚I ss$tR Y‚ sRk8}B${ I$R„$AZ$Bt BP Y‚ ‚RR PB„ $ ) ( x qY‚ tZ8A‚„ BP R$8Zjs$BtR $R x(((( 1 qsAj‚ ; 7$9‚uVI`ZR‚I _Bb‚„ K ) 3d i‚{Z„R$%‚ 3B„‚{sRR we8}$„${sj 7$9‚ ) d(0D p ) x( p ) d(( p ) 1(( h )d(( h )dx( h )1x( h )d( h )1( h );( h )d(( h )1(( h )1( h );( h )M( h )1(( *p_Ty 7eT3 nn ;11 xE 1d; 1n 1M; n xnx 1 n;x ;d x1 7eTq n1( ;d( xEE d;d d 11 nnM x( d 1n; n1( x11 7eTiep n1d ;dd xEE d; dMx 1n1 nn x( dM 1;d n1x x1n eW]TWeG nx ;x; xMx 1n; 1EM nn1 ;x1 E( n;1 ;d; xdM (1 eW]Tq nEd ;; xx dxn 1(1 1E ;(; xME 1dd 1M ;( E;d eW]Tiep nE ;x1 xM dE 1d( 11 ;( x( 111 1M ;d; E;n p] n ; E1 n1; n;( nM( ;M1 En( x(M x;E x ;; *p_Tyy 7eT3 1Md nx1 ;x d1 dn 11M n(x ;;( 1;1 1Mn nn ;ME 7eTq 1x nx1 ;( dn; dxn d1 1M1 ;1x dEx 1(M 11 ;;M 7eTiep 1x nx1 ;( dnM dxM d; 1M; ;1 dn 1d1 1x ;; eW]TWeG ndd nMx ;1 dMM 11( 1E; nEE xdx 1M; n;x ;dM E(n eW]Tq n(x nMx x(n d;x d( 11n nnn ;; dMM 1xd nnM x; eW]Tiep n( nM x(n dx( dM 11M nn ; dM 1xM n;d xx( p] nnd ;d( x1M 1x; 1 nd( nM x;d ;1E ;EE x(d Ex; WB‚RU d qY‚ Iss ‚t‚„s$t }„B{‚RR‚R *p_Ty stI *p_Tyy s„‚ I‚ht‚I $t ‚JZs$BtR w;D stI wxD yt Y‚R‚ }Bb‚„ ‚-}‚„$8‚tR Y‚ {B‚^{$‚t K $t ‚s{Y *p_ $R R‚ B 3d yt ‚s{Y R$8Zjs$Bt duR‚} sY‚sI PB„‚{sRR BP r s„‚ PB„8‚I P„B8 st ‚R$8s‚I Vi 8BI‚j PB„ r stI st ‚R$8s‚I zVi $t r stI ? 1 yt ‚s{Y R$8Zjs$Bt Y‚ js j‚tYR BP Y‚ ‚R$8s‚I 8BI‚jR s„‚ R‚ s Y‚ „Z‚ js B„I‚„ BP Y‚ *p_ n p stI h „‚P‚„ B Y‚ tZ8A‚„ BP $tuRs8}j‚ BAR‚„%s$BtR stI }BRuRs8}j‚ }„‚I${$BtR „‚R}‚{$%‚jk ; 7‚{$BtR n stI ; $t Y‚ ‚- I‚R{„$A‚ Y‚ ‚R Rs$R${R yt ‚s{Y ‚-}‚„$8‚t }Bb‚„ $R {sj{Zjs‚I Ak {B8}s„$t Y‚ ‚R Rs$R${R ss$tR ‚8}$„${sj {„$${sj %sjZ‚R {sj{Zjs‚I sR Y‚ (Y }‚„{‚t$j‚ BP Y‚ I$R„$AZ$BtR BP Y‚ Rs$R${R $t Y‚ {B„„‚R}BtI$t R$9‚ ‚-}‚„$8‚t w$t bY${Y Y‚ *p_ p stI h s„‚ Y‚ Rs8‚ sR $t Y‚ }Bb‚„ ‚-}‚„$8‚t ‚-{‚} K ) (D x qY‚ tZ8A‚„ BP R$8Zjs$BtR $R d(((( 1M qsAj‚ x 7$9‚uVI`ZR‚I _Bb‚„ K ) 31 i‚{Z„R$%‚ 3B„‚{sRR we8}$„${sj 7$9‚ ) d(0D p ) x( p ) d(( p ) 1(( h )d(( h )dx( h )1x( h )d( h )1( h );( h )d(( h )1(( h )1( h );( h )M( h )1(( *p_Ty 7eT3 ; Mxn xM ;dd ;Mn xM  1 xxn E; n n1 7eTq ( Mn; xM 1dd 1; ;nd EMM ME 1; ;( xE Md 7eTiep dd MnE xM 1n1 ndn ;;n E1 M 1E ;1( E(1 M1 eW]TWeG M;d 1d M( ;1 x n1 (M Md En Md 1 ; eW]Tq Mdx dd M1 1EE nM xM( Mx( E nMM x; Md( M1 eW]Tiep M1d dx Mn n(d ;d x Mx; E ;1n Ed1 M11 M; p] MEn n; M ; M( Mn nx MM x; E M; M *p_Tyy 7eT3 EE M 1; n;( ;dn x(E (E MMM ;Md xM (( (1 7eTq E; M nn dx 1( nM E;n M1 1E; nMx xx; MxE 7eTiep Exd M n1 1( 1M( nx E;E M; 1 n xEn MxE eW]TWeG En MEx E( ;(M xdE E;1 MxE EE E1d Ed MM( MM eW]Tq ;( MxE E1 1;1 n; xdn 1 xn nEn x; xx E eW]Tiep ;E MEd En 1EE n1 x1 M xx nMM x1 EE EM p] M Md E EEn E EE MMM E 1( ;d En x WB‚RU d qY‚ Iss ‚t‚„s$t }„B{‚RR‚R *p_Ty stI *p_Tyy s„‚ I‚ht‚I $t ‚JZs$BtR w;D stI wxD yt Y‚R‚ }Bb‚„ ‚-}‚„$8‚tR Y‚ {B‚^{$‚t K $t ‚s{Y *p_ $R R‚ B 31 yt ‚s{Y R$8Zjs$Bt duR‚} sY‚sI PB„‚{sRR BP r s„‚ PB„8‚I P„B8 st ‚R$8s‚I Vi 8BI‚j PB„ r stI st ‚R$8s‚I zVi $t r stI ? 1 yt ‚s{Y R$8Zjs$Bt Y‚ js j‚tYR BP Y‚ ‚R$8s‚I 8BI‚jR s„‚ R‚ s Y‚ „Z‚ js B„I‚„ BP Y‚ *p_ n p stI h „‚P‚„ B Y‚ tZ8A‚„ BP $tuRs8}j‚ BAR‚„%s$BtR stI }BRuRs8}j‚ }„‚I${$BtR „‚R}‚{$%‚jk ; 7‚{$BtR n stI ; $t Y‚ ‚- I‚R{„$A‚ Y‚ ‚R Rs$R${R yt ‚s{Y ‚-}‚„$8‚t }Bb‚„ $R {sj{Zjs‚I Ak {B8}s„$t Y‚ ‚R Rs$R${R ss$tR ‚8}$„${sj {„$${sj %sjZ‚R {sj{Zjs‚I sR Y‚ (Y }‚„{‚t$j‚ BP Y‚ I$R„$AZ$BtR BP Y‚ Rs$R${R $t Y‚ {B„„‚R}BtI$t R$9‚ ‚-}‚„$8‚t w$t bY${Y Y‚ *p_ p stI h s„‚ Y‚ Rs8‚ sR $t Y‚ }Bb‚„ ‚-}‚„$8‚t ‚-{‚} K ) (D x qY‚ tZ8A‚„ BP R$8Zjs$BtR $R d(((( 1 qsAj‚ E q‚R$t Y‚ _„‚I${$%‚ ]Bt‚t BP =t‚8}jBk8‚t PB„ yt4s$Bt i‚{Z„R$%‚ 3B„‚{sRR p ) ddx h ) ;E ytu7s8}j‚ BI‚j eR$8s‚R dxMU+n B dMU+d e-}jstsB„k *‚}‚tI‚t %s„$sAj‚ %s„$sAj‚ {zWqy07LzH {zW qy07LzH J z E€qLrE z0H gLzX0yz0 (1; wdx;D (nn wd;MD T(( w(ndD {zW qy07LzHd T1MM w(1D Tnd w(nD (x w(dD {zW qy07LzH1 T1n w(1D T1EE w(D (dx w(1(D J z E€qLrE z0Hd Td1( w;x;D (n w(nD J z E€qLrE z0H1 Tdn w;xD TdM1 w(;D ƒ p1 (1 dEE nxE q‚RR BP _„‚I${$%‚ _Bb‚„ BP =t‚8}jBk8‚t PB„ yt4s$Bt q‚R VRk8}B${ e8}$„${sj Rs$R${R {„$${sj %sjZ‚R {„$${sj %sjZ‚R PB„ $ ) 3; 7e Vi ;1( 7e zVi ;d1 7eT3 Mn d(1 ddd( 7eTq ( Ed; (d 7eTiep dn Ed; EEE eW]TWeG xdME d(d d( eW]Tq ddd1 d(ME ddM eW]Tiep dEM d(ME ddn p] Md( 1nn 1;; WB‚RU d qY‚ hZ„‚R $t }s„‚tY‚R‚R $t Y‚ Z}}‚„ }st‚j BP Y‚ sAj‚ s„‚ RstIs„I ‚„„B„R PB„ Y‚ „‚}B„‚I {B‚^{$‚t ‚R$8s‚R 1 duR‚} sY‚sI PB„‚{sRR BP Y‚ {Yst‚ $t $t4s$Bt s„‚ PB„8‚I P„B8 st ‚R$8s‚I Vi 8BI‚j PB„ Y‚ {Yst‚ $t $t4s$Bt stI st ‚R$8s‚I zVi $t Y‚ {Yst‚R $t $t4s$Bt stI Zt‚8}jBk8‚t n p stI h „‚P‚„ B Y‚ tZ8A‚„ BP $tuRs8}j‚ BAR‚„%s$BtR stI }BRuRs8}j‚ }„‚I${$BtR „‚R}‚{$%‚jk ; qY‚ R$t$h{st{‚ j‚%‚j BP Y‚ ‚RR $R d(0 x 7‚{$BtR n stI ; $t Y‚ ‚- I‚R{„$A‚ Y‚ ‚R Rs$R${R qY‚ sRk8}B${ {„$${sj %sjZ‚R s„‚ s,‚t P„B8 qsAj‚ d stI {]„s{,‚t wdD E qY‚ ‚8}$„${sj {„$${sj %sjZ‚R s„‚ ‚t‚„s‚I P„B8 s Bt‚ ]s„jB ‚-}‚„$8‚t wZR$t x(((( R$8Zjs$BtRD $t bY${Y Y‚ *p_ $R s zVi $t Y‚ {Yst‚R $t $t4s$Bt stI Zt‚8}jBk8‚t $8}BR$t Y‚ tZjj Ys Zt‚8}jBk8‚t tB ‚t‚„ Y‚ $t4s$Bt ‚JZs$Bt qY‚ ‚JZs$BtR BP Y‚ R$8Zjs‚I 8BI‚j ‚R$8s‚I b$Y `ZR $tuRs8}j‚ Iss s„‚ $%‚t $t {BjZ8tR 1 stI ; BP Y‚ B} }st‚j qY‚ {B%s„$st{‚ 8s„$- BP Y‚ „‚R$IZsjR $t Y‚ *p_ $R w W w W `Uz‡qDH 13En b3(Md zs„ ) 3 `–zÃi€DH b3(Md 3d(1 n(