Bootstrapping GMM Estimators for Time Series¤ Atsushi Inouey Mototsugu Shintaniz North Carolina State University Vanderbilt University First Draft: October 2000 This Version: February 2001 Abstract This paper establishes that the bootstrap provides asymptotic re¯nements for the generalized method of moments estimator of overidenti¯ed linear models when autocovariance structures of moment functions are unknown. Because the heteroskedasticity and autocorrelation consistent covariance matrix estimator cannot be written as a function of sample moments and converges at a rate slower than T ¡1=2 , the asymptotic re¯nement cannot be proved in the conventional way. As a result, we ¯nd that the bootstrap approximation error for the distribution of the t test and the test of overidentifying restrictions is of larger order than typically found in the literature. We also ¯nd that the choice of kernels plays a more important role in our second- order asymptotic theory than in the conventional ¯rst-order asymptotic theory. Nevertheless, the bootstrap approximation improves upon the ¯rst-order asymptotic approximation. A Monte Carlo experiment shows that the bootstrap improves the accuracy of inference on regression parameters in small samples. We apply our bootstrap method to inference about the parameters in the monetary policy reaction function. KEYWORDS: asymptotic re¯nements, block bootstrap, HAC covariance matrix estimator, de- pendent data, Edgeworth expansions, instrumental variables, J test. ¤ We thank Jordi Gal¶ for providing us with the data and program used in Clarida, Gal¶ and Gertler (2000). We ³ ³ also thank Alastair Hall, Lutz Kilian and seminar participants at Brown University, University of Michigan and the 2000 Triangle Econometrics Conference for helpful comments. y Department of Agricultural and Resource Economics, North Carolina State University, Box 8109, Raleigh, NC 27695-8109. E-mail: atsushi@unity.ncsu.edu. z Department of Economics, Vanderbilt University, Box 1819 Station B, Nashville, TN 37235. E-mail: mototsugu.shintani@vanderbilt.edu. 1. Introduction In this paper we establish that the bootstrap provides asymptotic re¯nements for the generalized method of moments (GMM) estimator of possibly overidenti¯ed linear mod- els. Our analysis di®ers from earlier work in that we allow for general autocovariance structures of moment functions. In typical empirical situations, the autocovariance structure of moment functions is unknown and the inverse of the heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimator is used as a weight- ing matrix in GMM estimation. It is well known, however, that coverage probabilities based on the HAC covariance estimator are often too low, and that the t test tends to reject too frequently (see Andrews, 1991). In this paper, we propose a bootstrap method for the GMM estimator to improve the ¯nite sample performance of the t test and the test of overidentifying restrictions (J test). We use the block bootstrap origi- nally proposed by KÄnsch (1989) for weakly dependent data (see also Carlstein, 1986). u When the block length increases at a suitable rate with the sample size, such block bootstrap procedures eventually will capture the unknown structure of dependence. Our linear framework is of particular interest in applied time series analysis. GMM estimation of linear models has been applied to the expectation hypothesis of the term structure (Campbell and Shiller, 1991), the monetary policy reaction function (Clarida, Gal¶ and Gertler, 2000), the permanent-income hypothesis (Runkle, 1991), and the ³ present value model of stock prices (West, 1988). Since the GMM estimates often have policy implications in structural econometric models, it is important for researchers to obtain accurate con¯dence intervals. For example, the interpretation of the policy rule crucially depends on the value of the estimated parameters (see Clarida, Gal¶ and ³ Gertler, 2000). 1 Not surprisingly, given the poor performance of the conventional asymptotic ap- proximation, the econometric literature on the bootstrap for GMM is growing rapidly. Hahn (1996) shows the ¯rst-order validity of the bootstrap for GMM with iid observa- tions. 1 For dependent data, Hall and Horowitz (1996) show that the block bootstrap provides asymptotic re¯nements for GMM. However, Hall and Horowitz (1996) assume that the autocovariances of the moment function are zero after ¯nite lags, and thus their framework does not cover the use of the HAC covariance matrix estimator for the general dependence structure. Economic theory often provides information about the speci¯cation of moment conditions, but not necessarily about the dependence struc- ture of the moment conditions. Therefore, it is important for applied work to be able to allow for more general forms of autocorrelation. This extension is not straightfor- ward because the HAC covariance matrix estimator cannot be written as a function of sample moments and converges at a rate slower than T ¡1=2 . Thus, the conventional arguments cannot be applied directly to prove the existence of Edgeworth expansions and to establish asymptotic re¯nements of the bootstrap. Recently, GÄtze and KÄnsch (1996) and Lahiri (1996) show that the block bootstrap o u can provide asymptotic re¯nements for a smooth function of sample means and for parameters in a linear regression model, respectively, even when the HAC covariance estimator is used. They show that the bootstrap provides asymptotic re¯nements for approximating the distribution of the estimator and for the coverage probability of one- sided con¯dence intervals. However, they do not show asymptotic re¯nements for the two-sided symmetric t test nor do they provide any result for the overidenti¯ed case which is of great interest in empirical work. The purpose of this paper is to prove that 1 Brown and Newey (1995) propose an alternative e±cient bootstrap method based on the empirical likelihood. 2 the bootstrap provides asymptotic re¯nements for these statistics in overidenti¯ed linear models estimated by GMM. To our knowledge, the higher-order properties of the block bootstrap for GMM with unknown autocovariance structures have not been formally investigated. Our results are nonstandard for two reasons. First, we show that the order of the bootstrap approximation error is larger than typically found in the literature on the bootstrap for parametric estimators. The intuition behind this result is as follows: The HAC covariance matrix estimator is (proportional to) a nonparametric estimator of the spectral density at frequency zero, and its convergence rate is slower than T ¡1=2 . For the ¯rst-order asymptotic theory, all that matters is the consistency of the HAC covariance matrix estimator. However, the nonparametric nature of the HAC covariance matrix estimator becomes important in the higher-order asymptotic theory and complicates the analysis of the two-sided symmetric t test and the J test statistic. Nevertheless, we are able to establish that the bootstrap approximation error is smaller than the conventional normal approximation error. Second, we note that the choice of kernels plays a more important role in our second- order asymptotic theory than in the conventional ¯rst-order asymptotic theory because the order of the bootstrap approximation error depends on the bias of the HAC covari- ance estimator. For the bootstrap to provide asymptotic re¯nements, the bias must vanish su±ciently fast. For the one-sided t test, most of the commonly used kernels sat- isfy this condition. For two-sided symmetric t test and for the J test statistic, however, one must use kernels, such as the truncated kernel (White, 1984) and the trapezoidal kernel (Politis and Romano, 1995), whose bias vanishes even faster. The resulting HAC covariance matrix estimator based on these kernels, however, is not necessarily positive 3 semide¯nite. In this paper, we propose a modi¯ed HAC covariance matrix estimator that is always positive semide¯nite. In a Monte Carlo experiment, we ¯nd that our bootstrap method improves the accuracy of inference in small samples, especially for the two-sided symmetric t test. To illustrate the usefulness of the bootstrap approach, we apply our bootstrap procedure to the monetary policy reaction function of Clarida, Gal¶ and Gertler (2000). We ¯nd ³ that the data do not necessarily support some of their conclusions. The rest of the paper is organized as follows. Section 2 introduces the model and describes the proposed bootstrap procedure. Section 3 presents the assumptions and theoretical results. Section 4 provides some Monte Carlo results. Section 5 presents an empirical illustration. Section 6 concludes the paper. All proofs are relegated to an appendix. 2. Model and Bootstrap Procedure Consider a stationary time series (x0 ; yt ; zt )0 which satis¯es t 0 E[zt ut ] = 0; (2.1) 0 where ut = yt ¡ ¯0 xt , ¯0 is a p-dimensional parameter, xt is a p-dimensional vector, zt is a k-dimensional vector and p < k. Given a realization f(x0 ; yt ; zt )0 gT0 , we are t 0 t=1 interested in two-step GMM estimation of ¯0 based on the moment condition (2.1). Let ` denote the lag truncation parameter used in HAC covariance matrix estimation and 4 T = T0 ¡ ` + 1.2 We ¯rst obtain the ¯rst-step GMM estimator ¯T by minimizing ~ 2 30 2 3 T0 T0 1 X 0 1 X 4 zt(yt ¡ ¯ xt ) 5 VT 4 zt(yt ¡ ¯ 0 xt )5 T0 t=1 T0 t=1 with respect to ¯, where VT is some k £ k positive semide¯nite matrix. Then we obtain ^ the second-step GMM estimator ¯T by minimizing " T #0 T " # 1X 0 ^¡1 1 X zt (yt ¡ ¯ xt ) ST zt (yt ¡ ¯ 0 xt ) ; T t=1 T t=1 where 2 3 T ` µ ¶³ ´ 1 X4 2 0 X j ^ ST = zt ut zt + ~ ! ~ ~ 0 ~~ 0 zt+j ut+j ut zt + zt utut+j zt+j 5 T t=1 j=1 ` ~0 ut = yt ¡ ¯T xt : ~ is the HAC covariance matrix estimator for the moment function (2.1), !(¢) is a kernel. ^ ¡1=2 ^ We are interested in the distribution of the studentized statistic §T (¯T ¡ ¯0 ) where P 0 ^¡1 PT z x0 )¡1 and in the distribution of the J test statistic §T = ( T xtzt ST ^ t=1 t=1 t t " T #0 " T # 1 X ^0 ^¡1 1 X ^0 JT = p zt (yt ¡ ¯T xt ) ST p zt(yt ¡ ¯T xt ) : T t=1 T t=1 We propose the following block bootstrap procedure. Suppose that T = b` for some integer b. Step 1. Let N1 ; N2 ; :::; Nb be iid uniform random variables on f0; 1; :::; T ¡ `g and let (x¤0 ¤ ¤0 0 0 0 0 (j¡1)`+i ; y(j¡1)`+i ; z(j¡1)`+i ) = (xNj +i ; yNj +i ; zNj +i ) ; for 1 · i · ` and 1 · j · b. 2 ^ We use T observations and the modi¯ed HAC covariance matrix estimator ST to obtain asymptotic re¯nements for the two-sided symmetric t test and the J test statistic. This modi¯cation is not necessary for obtaining asymptotic re¯nements for one-sided con¯dence intervals. See also Hall and Horowitz (1996, p.895). 5 ~¤ Step 2. Calculate the ¯rst-step bootstrap GMM estimator ¯T by minimizing " T #0 " T # 1X ¤ ¤ 1X ¤ ¤ z (y ¡ ¯ 0 x¤ ) ¡ ¹¤ VT z (y ¡ ¯ 0 x¤ ) ¡ ¹¤ T t=1 t t t T T t=1 t t t T where T ¡` X X1 ` 1 ¹¤ = T ^0 zt+i (yt+i ¡ ¯T xt+i ): T ¡ ` + 1 t=0 ` i=1 ^¤ Step 3. Compute the second-step bootstrap GMM estimator ¯T by minimizing " T #0 " T # 1X ¤ ¤ ^¤¡1 1X ¤ ¤ z (y ¡ ¯ 0 x¤ ) ¡ ¹¤ ST z (y ¡ ¯ 0 x¤ ) ¡ ¹¤ ; T t=1 t t t T T t=1 t t t T where b ` ` ^¤ 1 XXX ¤ ST = (z u¤ ~ ¡ ¹¤ )(zNk +j u¤ k +j ¡ ¹¤ )0 ; ¤ ~N T k=1 i=1 j=1 Nk +i Nk +i T T u¤ = yt ¡ ¯T x¤ : ~t ¤ ~¤0 t ^ ¤¡1=2 (¯ ¤ ¡ ¯T ) where Step 4. Obtain the bootstrap version of the studentized statistic §T ^ ^ T P ¤0 ^¤¡1 PT §¤ = ( T x¤ zt ST ^ T t=1 t ¤ ¤0 ¡1 t=1 zt xt ) and the J test statistic ( T )0 ( T ) ¤ 1 X ¤ ¤ ^¤0 ¤ ^¤¡1 1 X ¤ ¤ ^¤0 ¤ JT = p [zt (yt ¡ ¯T xt ) ¡ ¹¤ ] T ST p [zt (yt ¡ ¯T xt ) ¡ ¹¤ ] : T T t=1 T t=1 By repeating Steps 1{4 su±ciently many times, one can approximate the ¯nite-sample distributions of the studentized statistic and the J test statistic by the empirical distri- butions of their bootstrap version. Remarks: 1. As in Hall and Horowitz (1996), we recenter the bootstrap version of the moment functions. Unlike the just identi¯ed case, the bootstrap version of the moment condition does not hold without recentering in the case of overidenti¯ed restrictions. The expression ¹¤ is the mean of the bootstrapped moment function with respect to T the probability measure induced by the bootstrap algorithm. 6 2. Davison and Hall (1993) show that naÄ applications of the block bootstrap do ³ve not provide asymptotic re¯nements for studentized statistics involving the long-run variance estimator. Speci¯cally, they show that the error of the naÄ bootstrap is of ³ve order O(b¡1 ) + O(`¡1 ) and thus is greater than or equal to the error of the ¯rst order asymptotic approximation. We therefore modify the bootstrap version of the HAC covariance matrix estimator (see GÄtze and Hipp, 1996, for the just-identi¯ed case). o ^¤ The expression ST given in Step 3 is a consistent estimator for the variance of the bootstrapped moment function with the bootstrap probability measure. 3. Asymptotic Theory In this section, we present our main theoretical results. Unless noted otherwise, we shall denote the Euclidean norm of a vector x by kxk. First, we provide the following set of assumptions. Assumption 1: (a) f(x0 ; yt ; zt )0 g is strictly stationary and strong mixing with mixing coe±cients sat- t 0 isfying ®m · (1=d) exp(¡dm) for some d > 0. (b) There is a unique ¯0 2
0. b (d) Let Fa denote the sigma-algebra generated by Ra ; Ra+1 ; :::; Rb . For all m; s; t = t+s 1; 2; ::: and A 2 Ft¡s , t¡1 1 t¡1 t+s+m EjP (AjF¡1 [ Ft+1 ) ¡ P (AjFt¡s¡m [ Ft+1 )j · (1=d) exp(¡dm): 7 (e) For all m; t = 1; 2; ::: and µ 2
¤ ) = ® + o(`T ¡1 ) + O(`¡q ):
® (3.8)
Remarks: Theorems 1 and 2 show that the distributions of the studentized statistic
and the J test statistic and their bootstrap versions can be approximated by their
Edgeworth expansions. Theorem 3 shows the order of the bootstrap approximation
error. For the one-sided t test, the two-sided symmetric t test and the J test statistic,
the approximation errors made by the ¯rst-order asymptotic theory are of order
O(T ¡1=2 ) + O(`¡q ); O(`T ¡1 ) + O(`¡q ) and O(`T ¡1 ) + O(`¡q ); (3.9)
respectively, whereas the bootstrap approximation errors are of order
O(`T ¡1 ) + O(`¡q ); o(`T ¡1 ) + O(`¡q ) and o(`T ¡1 ) + O(`¡q ): (3.10)
10
Thus the bootstrap provides asymptotic re¯nements if the bias of the HAC covariance
matrix estimator vanishes fast enough, i.e.,
O(`¡q ) = o(T ¡1=2 ); O(`¡q ) = o(`T ¡1 ) and O(`¡q ) = o(`T ¡1 ): (3.11)
for the three statistics, respectively.
For the one-sided t test, the bootstrap provides asymptotic re¯nements for a wide
class of kernels that satisfy O(`¡q ) = o(T ¡1=2 ), such as the Parzen kernel. However, the
bootstrap does not provide asymptotic re¯nements for the Bartlett kernel as it does not
satisfy (3.11), because its characteristic exponent is one. For the two-sided symmetric t
test and the J test statistic, the bootstrap can provide asymptotic re¯nements only for
kernels whose characteristic exponent is greater than 2, such as the truncated kernel,
(
1 for jxj < 1
!(x) = ;
0 otherwise
the trapezoidal kernel (Politis and Romano, 1995)
8
> 1
< for jxj · ®
jxj¡®
!(x) = 1¡ for ® < jxj · 1 ;
>
: 1¡®
0 otherwise
where 0 < ® < 1, and the Parzen (b) kernel (Parzen, 1957)
(
1 ¡ jxjq for jxj · 1
!(x) =
0 otherwise
where q > 2. Under the assumption of exponentially decaying mixing coe±cients, the
truncated and trapezoidal kernels have no asymptotic bias and thus satisfy (3.11). If q >
2 and ` 6= O(T 1=(q+1)), the Parzen (b) kernel also satis¯es (3.11). A potential problem
with these kernels is that the resulting weighting matrix is not necessarily positive
semide¯nite. To eliminate this problem, the weighting matrix can be modi¯ed as follows:
By Schur's decomposition theorem (e.g., Theorem 13 of Magnus and Neudecker, 1999,
11
p.16), there exist an orthogonal k £ k matrix E whose columns are eigenvalues of WT =
^¡1
ST and a diagonal matrix ¤ = diag(¸1 ; :::; ¸k ), whose elements are the eigenvalues of
WT , such that
WT = E 0¡1 ¤E ¡1 : (3.12)
De¯ne a modi¯ed HAC covariance matrix estimator by
WT = E 0¡1 ¤+ E ¡1 ;
+
(3.13)
+
where ¤+ = diag(max(¸1 ; 0); :::; max(¸k ; 0)). Then WT is positive semide¯nite, asymp-
totically equivalent to (3.12) and thus is consistent. Politis and Romano (1995, equation
12) uses a similar modi¯cation in the context of univariate spectral density estimation.
For the trapezoidal kernel, the frequency of positive semide¯nite corrections can be re-
duced by choosing small ®. However, Politis and Romano (1995) recommends ® = 1=2.
4. Monte Carlo Results
In this section, we conduct a small simulation study to examine the accuracy of the
proposed bootstrap procedure. We consider the following stylized linear regression
model with an intercept and a regressor, xt :
yt = ¯1 + ¯2 xt + ut ; for t = 1; : : : ; T: (4.14)
The disturbance and the regressors are generated from the following AR(1) processes
with common ½,
ut = ½ut¡1 + "1t ; (4.15)
xt = ½xt¡1 + "2t ; (4.16)
12
where "t = ("1t ; "2t )0 » N(0; I2 ). In the simulation, we use ¯ = (¯1 ; ¯2 )0 = (0; 0)0 for the
regression parameter and ½ 2 f0:5; 0:9; 0:95g for the AR parameters. For instruments,
we use xt , xt¡1 and xt¡2 in addition to an intercept. This choice of instruments implies
an over-identi¯ed model with 2 degrees of freedom for the J test. Two values for the
sample size T , 64 and 128, are considered. The kernel functions employed are the
trapezoidal, Parzen (b) and truncated kernels. In all experiments, the number of Monte
Carlo trials is 1000.
The choice of the block length is important in practice. Ideally, one would choose
a longer block length for more persistent processes and a shorter block length for less
persistent processes. In the literature, this is typically accomplished by selecting the
lag truncation parameter that minimizes the mean squared error of the HAC covari-
ance matrix estimator (see Andrews, 1991; and Newey and West, 1994). Because the
trapezoidal and truncated kernels have no asymptotic bias, however, one cannot take
advantage of the usual bias-variance trade-o® and thus no optimal block length can be
de¯ned for these kernels. Thus, we propose the following procedure that is similar to
the general-to-speci¯c modeling strategy for selecting the lag order of autoregressions
in the literature on unit root testing (see Hall, 1994; Ng and Perron, 1995). By the
Wold representation theorem, the moment function has a moving average (M A) repre-
sentation of possibly in¯nite order. The idea is to approximate this MA representation
by a sequence of ¯nite-order MA processes. Because the block bootstrap is originally
designed to capture the dependence of m-dependent-type processes when ` is ¯xed, it
makes sense to approximate the process by an MA process that is m-dependent.
The proposed procedure takes the following steps.
Step 1. Let `1 < `2 < ¢ ¢ ¢ < `max be candidate block lengths that satisfy Assumption 1(g)
13
and set k = max ¡1.
Step 2. Test the null that every element of the moment function is MA(`k ) against the
alternative that at least one of the elements is MA(`k+1 ).
Step 3. If the null is accepted and if k > 1, then let k = k ¡ 1 and go to Step 2. If the null
is accepted and if k = 1, then let ` = `1 . If the null is rejected, then set ` = `k+1 .
Because there is parameter uncertainty due to ¯rst-step estimation and because
we apply a univariate testing procedure to each element of the moment function, it is
di±cult to control the size of this procedure. In this Monte Carlo experiment, therefore,
we use the 99% level critical value to be conservative.
Our primary interest is to compare the size properties of tests based on asymptotic
and bootstrap critical values. For each experiment, the empirical size for the t test
for the regression slope parameter ¯2 as well as for the J test is obtained using the
10% nominal signi¯cance level. Each bootstrap critical value is constructed from 999
replications of the bootstrap sampling process. In addition to the results based on
the asymptotic and bootstrap critical values using our proposed procedure, we report
the asymptotic results based on the Bartlett and QS kernels, with Andrews' (1991)
data-dependent bandwidth estimator and Andrews and Monahan's (1992) prewhitening
procedure.
Table 1 summarizes the result of the simulation study. In all cases, the size proper-
ties of the bootstrap t test are better than those of the asymptotic t test. The choice of
kernel function does not make much of a di®erence for the performance. Indeed the em-
pirical sizes of the bootstrap test are very close to the nominal size when T is 128. The
degree of the reduction in the size distortion depends on the value of the AR parameters
as well as the sample size. The bootstrap works quite well with persistent processes.
14
Because the moment functions have an AR(1) autocovariance structure, the prewhiten-
ing procedure has a considerable advantage in our simulation design. However, the
bootstrap outperforms the conventional prewhitened HAC procedure with asymptotic
critical values. In contrast, the advantage of the bootstrap for the J test is not clear
4
because the J test performs quite well even with asymptotic critical values. Based on
this experiment, we recommend our bootstrap procedure especially for the t test for
regression parameters.
5. Empirical Illustration
To illustrate the usefulness of the proposed bootstrap approach, we conduct bootstrap
inference about the parameters in the monetary policy reaction function of Clarida,
Gal¶ and Gertler (2000, hereafter CGG). CGG model the target for the federal funds
³
¤
rate rt by
rt = r¤ + ¯(E[¼t+1 j- t ] ¡ ¼ ¤ ) + °E[xt j- t ]
¤
(5.17)
where ¼t is the in°ation rate, ¼ ¤ is the target for in°ation, - t is the information set at
time t, xt is the output gap, and r¤ is the target with zero in°ation and output gap.
Policy rules (5.17) with ¯ > 1 and ° > 0 are stabilizing and those with ¯ · 1 and
° · 0 are destabilizing. CGG obtain the GMM estimates of ¯ and ° based on the set
of unconditional moment conditions
Ef[rt ¡ (1 ¡ ½1 ¡ ½2 )[rr¤ ¡ (¯ ¡ 1)¼ ¤ + ¯¼t+1 + °xt ] + ½1 rt¡1 + ½2 rt¡2 ]zt g = 0; (5.18)
where rt is the actual federal fund rate, rr¤ is the equilibrium real rate and zt is a vector
of instruments. They ¯nd that the GMM estimate of ¯ is signi¯cantly less than unity
4
See Tauchen (1986) and Hall and Horowitz (1996) for similar ¯ndings.
15
during the pre-Volcker era, while the estimate is signi¯cantly greater than unity during
the Volcker-Greenspan era.
We reexamine these ¯ndings by applying our bootstrap procedure as well as the
bootstrap procedure of Hall and Horowitz (1991) and the standard HAC asymptotics.
We obtain GMM estimates of ¯ and ° based on the linear moment conditions
Ef[rt ¡ c ¡ µ1 ¼t+1 ¡ µ2 xt ¡ ½1 rt¡1 ¡ ½2 rt¡2 ]ztg = 0; (5.19)
where c = (1 ¡ ½1 ¡ ½2 )[rr¤ ¡ (¯ ¡ 1)¼ ¤ ]. Then ¯T = µ1T =(1 ¡ ½1T ¡ ½2T ) and °T =
^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
µ2T =(1¡ ½1T ¡ ½2T ), where µ1T ; µ2T ; ½1T and ½2T are the GMM estimates of µ1 ; µ2 ; ½1 and
^
½2 , respectively. We use CGG's baseline dataset and two sample periods, the pre-Volcker
period (1960:1-1979:2) and the Volcker-Greenspan period (1979:3-1996:3) (see CGG for
the description of the data source). In addition to their baseline speci¯cation, we
construct the optimal weighting matrix using the inverse of the HAC covariance matrix
estimator to allow for more general dynamic speci¯cations in the determination of the
actual funds rate. For the asymptotic con¯dence intervals, we use the conventional
prewhitened and recolored estimates based on the Bartlett and QS kernels with the
automatic bandwidth selection method (Andrews 1991, Andrews and Monahan 1992).
For the con¯dence intervals constructed from our bootstrap, we use the trapezoidal,
Parzen (b) and truncated kernels. We use the data-dependent procedure described
in the previous section to select the block length for the bootstrap. The number of
bootstrap replications is set to 999.
Table 2 presents GMM estimates of these parameters. Asymptotic standard errors
are reported in parentheses. The ¯rst two rows of each of Tables 2(a) and (b) replicate
CGG's results. These ¯ndings are robust to whether or not the HAC covariance matrix
estimator is used.
16
Table 3 shows 90% two-sided con¯dence intervals of these parameters. Consistent
with CGG's ¯ndings, the upper bound of the asymptotic con¯dence interval for ¯ is less
than unity during the pre-Volcker period, and the lower bound is far greater than unity
during the Volcker-Greenspan period. Based on these estimates, CGG suggest that
the Fed was accommodating in°ation before 1979, but not after 1979. The bootstrap
con¯dence interval, however, indicates that ¯ may be greater than unity even during the
pre-Volcker period, consistent with the view that the Fed has always been combating
in°ation. Moreover, unlike the asymptotic con¯dence interval, the bootstrap con¯dence
interval does not rule out that ° is negative during the Volcker-Greenspan period.
6. Concluding Remarks
In this paper we establish that the bootstrap provides asymptotic re¯nements for the
GMM estimator of possibly overidenti¯ed linear models when the autocovariance struc-
ture of the moment function is unknown. Because the HAC covariance matrix estimator
cannot be written as a function of sample moments and converges at a rate slower than
T ¡1=2 , the conventional techniques cannot be used directly to prove the existence of
the Edgeworth expansions. Because of the nonparametric nature of the HAC covari-
ance matrix estimator, the order of the bootstrap approximation error is larger than
the typical order of the bootstrap approximation error for parametric estimators. Nev-
ertheless, the bootstrap provides improved approximations relative to the ¯rst-order
approximation. We also ¯nd that the choice of kernels plays a more important role
in our second-order asymptotic theory than in the conventional ¯rst-order asymptotic
theory because the order of the bootstrap approximation error depends on the bias of
the HAC covariance estimator. We note that an extension of the present results to
17
nonlinear dynamic models as well as further investigation of data-dependent methods
for selecting the optimal block length would be useful.
18
Appendix
Notation
To simplify the notation, we will assume p = 1 throughout the appendix. In the proof for the
case p > 1, the scalar ¯ in the current proof is replaced by an arbitrary linear combination of ¯.
-denotes the Kronecker product operator. If ® is an n-dimensional nonnegative integral, j®j de-
Pn Pn
notes its length, i.e., j®j = i=1 j®i j. k¢ k denotes the Euclidean norm, i.e., kxk = ( i=1 x2 )1=2 ,
i
where x is an n-dimensional vector. We will write !(j=`) as !j for notational simplicity. ·j (x)
denotes the jth cumulant of a random variable x. vec(¢) is the column-by-column vectorization
function. vech(¢) denotes the column stacking operator that stacks the elements on and below
the leading diagonal. For a nonnegative integral vector ® = (®1 ; ®2 ; :::; ®n ), let
@ ®1 @ ®n
D® = ®1 ¢ ¢ ¢ :
@x1 @x®nn
` and l are treated di®erently: ` denotes the lag truncation parameter and l denotes an integer.
Let ut = yt ¡ ¯0 xt , ut = yt ¡ ¯T xt , ut = yt ¡ ¯T xt , vt = zt ut , vt = zt ut , vt = zt ut , wt = zt x0 ,
0
^ ^0 ~ ~0 ^ ^ ~ ~ t
½ P ½ P
~ ~0
(1=T ) T vt+j vt j ¸0 0 0
(1=T ) T vt+j wt + wt+j vt j ¸ 0
^
¡j = Pt=1 ~
; r¡j = Pt=1 ;
~ ~0 0 0
T T
(1=T ) t=1 vt vt¡j j <0 (1=T ) t=1 vt wt¡j + vt wt¡j j < 0
½ PT 0
½
(1=T ) t=1 vt+j vt j ¸0 E(v w0 + w v 0 ) j ¸ 0
~
¡j = PT 0 ; r¡j = E(vt+j0 t + v t+j t ) j < 0 ;
0
(1=T ) t=1 vt vt¡j j <0 t wt¡j t wt¡j
½ 0
½ PT 0
E(vt+j vt ) j ¸ 0 (1=T ) t=1 wt+j wt j ¸ 0
¡j = E(v v 0 ) j < 0 ; r2 ¡j = PT 0 ;
t t¡j (1=T ) t=1 wt wt¡j j < 0
P` P` P`
^
ST = ^
!j ¡j ; ~
ST = ~
!j ¡j ; ¹
ST = j=¡` !j ¡j ;
Pj=¡`
T ¡1 jjj Pj=¡`
` ~ j ; rST = P`
ST = (1 ¡ T )¡j ; rS~T = !j r¡ ¹
j=¡` !j r¡j ;
P1 +1
j=¡T
2~
Pj=¡`
` 2~
rS = j=¡1 r¡j ; r ST = j=¡` !j r ¡j :
PT PT
Let GT = (1=T ) t=1 wt and mT = T ¡1=2 t=1 vt . Then the studentized statistic can be
written as p 1
^ ^ T
^¡1 T
^¡1
fT = T §¡1=2 (¯T ¡ ¯0 ) = (G0 ST GT )¡ 2 G0 ST mT :
We use the following notation for the bootstrap. Let
1 X ¤ ¤ 1 X
T b
m¤
T = p (zt ut ¡ ¹¤ ) = p
T BNk ;
T t=1 b k=1
1 X 1 X
` `
BNk = p (zNk +i uNk +i ¡ ¹¤ ) = p
^ T (^Nk +i ¡ ¹¤ ) ;
v T
` i=1 ` i=1
1 X¡ ¤ ¢
`
b
BNk = p zN k+i u¤ bi ¤ e
bNk+i ¡ ¹¤ ; u¤ = yi ¡ ¯ ¤0 x¤ ;
T i
` i=1
1 X ¤ ¤0 1X
T b
G¤
T = zt xt = FN ;
T t=1 b k=1 k
1X 1X
` `
0
FNk = zNk +i xNk +i = wNk +i :
` i=1 ` i=1
1 X b b0 1X
b b
^¤
ST = e¤
BNk BNk ; ST = 0
BNk BNk ; ST = Var¤ (m¤ ) :
¤
T
b b
k=1 k=1
19
Then the bootstrap version of the ¯rst-step and the second-step GMM estimators can be written
as
" b #¡1
1X 0 1X 1X 0 1 X
b b b
e¤
¯ = ¯+ ^ FNk VT FNk FNk VT p BNk
b b b T b k=1
k=1 k=1 k=1
¡1 1
= ¯ + [G¤0 VT G¤ ] G¤0 VT p m¤ ;
^
T T T T
T
" b #¡1
1 X 0 ^¤¡1 1 X 1 X 0 ^¤¡1 1 X
b b b
¤
^
¯ = ¯+ ^ FNk ST FNk FNk ST p BNk
b b b T b k=1
k=1 k=1 k=1
h i¡1
^ T
^¤¡1 T
= ¯ + G¤0 ST G¤ ^¤¡1 1 T
G¤0 ST p m¤ ;
T
T
respectively.
Proofs of Lemmas
Next, we will present the lemmas used in the proofs of the theorems. Lemma A.1 produces
a Taylor series expansion of the studentized statistic fT . Lemma A.2 provides bounds on the
moments and will be used in the proofs of Lemmas A.3{A.6. Lemma A.3 shows the limits and
the convergence rates of the ¯rst three cumulants of gT in (A.1), that will be used to derive
the formal Edgeworth expansion. Lemmas A.5 and A.6 provide bounds on the approximation
error. For convenience, we present Lemma B.1 that will be used in the proofs of Lemmas B.2
and B.3. Lemma B.2 shows the consistency and convergence rate of the bootstrap version of the
moments. Lemma B.3 shows the limits and the convergence rates of the ¯rst three cumulants
of the bootstrap version.
Lemma A.1:
fT
= a0 mT + b0 [(GT ¡ G0 ) -mT ] + c0 [vech(ST ¡ S0 ) -mT ]
^
+d0 [(GT ¡ G0 ) -vech(ST ¡ S0 ) -mT ] + e0 [vech(ST ¡ S0 ) -vech(ST ¡ S0 ) -mT ]
^ ^ ^
3=2
+Op ((`=T ) )
= a0 mT + b0 [(GT ¡ G0 ) -mT ] + c0 [vech(ST ¡ ST ) -mT ] + c0 [vech(ST ¡ S0 ) -mT ]
^ ¹ ¹
0 ^T ¡ ST ) -mT ] + e0 [vech(ST ¡ ST ) -vech(ST ¡ ST ) -mT ]
+d [(GT ¡ G0 ) -vech(S ¹ ^ ¹ ^ ¹
+d0 [(GT ¡ G0 ) -vech(ST ¡ S0 ) -mT ] + e0 [vech(ST ¡ ST ) -vech(ST ¡ S0 ) -mT ]
¹ ^ ¹ ¹
+e0 [vech(ST ¡ S0 ) -vech(ST ¡ ST ) -mT ] + e0 [vech(ST ¡ S0 ) -vech(ST ¡ S0 ) -mT ]
¹ ^ ¹ ¹ ¹
+Op ((`=T )3=2 )
´ gT + c0 [vech(ST ¡ S0 ) -mT ] + d0 [(GT ¡ G0 ) -vech(ST ¡ S0 ) -mT ]
¹ ¹
+e0 [vech(ST ¡ ST ) -vech(ST ¡ S0 ) -mT ] + e0 [vech(ST ¡ S0 ) -vech(ST ¡ ST ) -mT ]
^ ¹ ¹ ¹ ^ ¹
0 ¹ ¹ 3=2
+e [vech(ST ¡ S0 ) -vech(ST ¡ S0 ) -mT ] + Op ((`=T ) ); (A.1)
where a, b, c, d and e are q, q 2 , q(q 2 +q), q(q 2 +q)=2, q 2 (q 2 +q)=2 and q((q 2 +q)=2)2 -dimensional
vectors of smooth functions of G0 and S0 , respectively.
Proof of Lemma A.1: (A.1) immediately follows from a Taylor series expansion of fT around
(m0 ; G0 ; vech(ST )0 )0 = (01£q ; G0 ; vech(S0 )0 )0
T T
^ 0
and from Theorem 1 of Andrews (1991). Q.E.D.
Lemma A.2:
EkmT kr+´ = O(1); (A.2)
20
EkT 1=2 (GT ¡ G0 )kr+´ = O(1); (A.3)
Ek(T =`)1=2 vech(ST ¡ ST )kr=2
~ ¹ = O(1); (A.4)
Ek(T =`)1=2 vech(rST ¡ rST )kr=2
~ ¹ = O(1); (A.5)
1=2
EkT vech(ST ¡ ST )kr=2
^ ~ = O(1): (A.6)
Proof of Lemma A.2: First, (A.2) and (A.3) immediately follow from the moment inequality of
Yokoyama (1980). Second, we will show (A.4). Note that
[T =`]
X
` X
(T =`)1=2 (ST ¡ ST ) = (T=`)1=2
~ ¹ !j (¡j ¡ ¡j ) = (`=T )1=2
~ Wi
j=¡` i=1
X X X
= (`=T )1=2 ( Wi + Wi + Wi ); (A.7)
i=0mod3 i=1mod3 i=2mod3
where
1 X
i` X
`
0 0 0 0 0 0
Wi = fvt vt ¡ E(vt vt ) + !j [vt+j vt ¡ E(vt+j vt ) + vt vt+j ¡ E(vt vt+j )]g:
` j=1
t=(i¡1)`+1
Note that the summands in each sum on the RHS of (A.7) are asymptotically independent by
construction. Thus,
° °r X
3
° ¹ °2 r r
E °(T=`)1=2 vech(ST ¡ ST )° = O(Ekvech(W2 )k 2 ) =
~ O(Ekvech(W2 (i))k 2 ) (A.8)
i=1
where
X X
2` `¡1 X
2` X
¡1 X
`¡1
W2 (1) = `¡1 !j vt+j vt ; W2 (2) = `¡1
0 0
!j vt vt¡j ; W2 (3) = 0
E(v0 v¡j ):
t=`+1 j=0 t=`+1 j=¡`+1 j=¡`+1
Thus it su±ces to show that, for i; j = 1; 2; :::; q,
r
EjW2 (1)(i;j) j 2 = O(1); (A.9)
r
(i;j)
EjW2 (2) j 2 = O(1); (A.10)
(i;j) r
EjW2 (3) j2 = O(1); (A.11)
where W2 (¢)(i;j) denotes the (i; j)th element of W2 (¢). By Assumptions 1(a) and 1(f), it follows
that X (k ) (k ) (k )
EjW2 (1)(i;j) jr=2 = O(`r=2 Ejvt1 1 vt2 2 ¢ ¢ ¢ vtr r j); (A.12)
t1 ·t2 ·¢¢¢·tr
where 0 · tl · 2` and kl = i; j for l = 1; 2; :::; r. Then the standard arguments used in proofs of
the moment inequality complete the proof of (A.9). The proof of (A.10) is analogous to that of
(A.9) and thus is omitted. By the mixing inequality of Hall and Heyde (1980, Corollary A.2),
it follows that for some d0 > 0
r
X
`¡1
r
X
`¡1
0 r
EjW2 (3)(i;j) j 2 = ( 0
E(v0 v¡j )) 2 = ( ®d ) 2 = O(1);
j (A.13)
j=¡`+1 j=¡`+1
and thus (A.11) holds. Therefore, (A.4) immediately follows from (A.7){(A.11). The proof of
(A.5) is analogous to that of (A.4) and thus is omitted.
Lastly, we will prove (A.6). Note that
T 1=2 (ST ¡ ST ) = rST T 1=2 (¯T ¡ ¯0 ) + r2 ST T 1=2 (¯T ¡ ¯0 )2 :
^ ~ ~ ~ ~ ~ (A.14)
21
Thus it follows from (A.5) and Minkowski's inequality that
[EkrST kr ]1=r · [EkrST ¡ rST kr ]1=r + [EkrST kr ]1=r = O(`1=2 T ¡1=2 ) + O(1);
~ ~ ¹ ¹ (A.15)
X
` X
`
[Ekr2 ST kr ]1=r
~ · [Ek !j (r2 ¡j ¡ E(r2 ¡j ))kr ]1=r + [Ek !j E(r2 ¡j )kr ]1=r
j=¡` j=¡`
¡1=2
= O(`T ) + O(`): (A.16)
Therefore (A.6) follows from (A.14), (A.15), (A.16), Assumption 1(i) and HÄlder's inequality.
o
Q.E.D.
Lemma A.3:
T 1=2 ·1 (gT ) = ®1 + O(`¡q ) + o(`T ¡1=2 ); (A.17)
(T =`)(·2 (gT ) ¡ 1) = °1 + O(`¡1=2 ); (A.18)
T 1=2 ·3 (gT ) = ·1 ¡ 3®1 + O(`¡q ) + o(`T ¡1=2 ); (A.19)
(T =`)(·4 (gT ) ¡ 3) = ³1 + O(`¡1=2 ); (A.20)
where
X
1 X
1
®1 = b0 E[w0 -vi ] + c0 0
E[vech(v0 vi ) -vj ]
i=¡1 i;j=¡1
X
1
0
+c Efvech[rS(E(w0 )0 V E(w0 ))¡1 E(w0 )0 V v0 ] -vi g
¹
i=¡1
1 X X
` T
°1 = 2 lim Efa0 v0 c0 [vech(vi vi¡j ¡ ¡j ) -vk ]g
0
T !1 `
j=¡` i;k=¡T
1 X X
` T
+2 lim Efa0 v0 e0 [vech(vi vi¡j ¡ ¡j ) -vech(vk vk¡l ¡ ¡l ) -vm ]g
0 0
T !1 `T
i;l=¡` i;k;m=¡T
1 X
T X
`
+ lim Efc0 [vech(v0 v¡i ¡ ¡i ) -vj ]c0 [vech(vk vk¡l ¡ ¡k ) -vm ]g;
0 0
T !1 `T
j;k;m=¡T i;l=¡`
1
X T ¡1
X
1
·1 = E(a0 v0 a0 vi a0 vj ) + 3 lim Efa0 v0 a0 vi b0 [vech(wj ¡ E(wj )) -vk ]g
T !1 T
i;j=¡1 i;j;k=¡T +1
1 X
T
+3 lim Efa0 v0 a0 vi c0 [vech(vj vj¡k ¡ ¡k ) -vl g
0
T !1 T i;j;k;l=¡T
1 X
T
+3 lim Efa0 v0 a0 vi c0 vech[rS(E(w0 )0 V E(w0 ))¡1 E(w0 )0 V vj ] -vk g;
¹
T !1 T 2
i;j;k=¡T
³1
4 X
T X
`
= Efa0 v0 a0 vi a0 vj c0 [vech(vk vk¡l ¡ ¡l ) -vm ]g
0
`T
i;j;k;m=¡T l=¡`
4 X
T
X
`
+ lim Efa0 v0 a0 vi a0 vj e0 [vech(vk vk¡l ¡ ¡l ) -vech(vm vm¡n ¡ ¡n ) -vo ]g
0 0
`T 2
i;j;k;m;o=¡T l;n=¡`
6 X
T X
`
+ lim Efa0 v0 a0 vi c0 [vech(vj vj¡k ¡ ¡k ) -vl ]c0 [vech(vm vm¡n ¡ ¡n ) -vo ]g
0 0
`T 2
i;j;l;m;o=¡T k;n=¡`
22
1 X X
T `
¡12 lim Efa0 v0 c0 [vech(vj vj¡k ¡ ¡k ) -vl ]g
0
`
j;l=¡T k=¡`
1 X
T
X
`
¡12 lim Efa0 v0 e0 [vech(vj vj¡k ¡ ¡k ) -(vl vl¡m ¡ ¡m ) -vn ]g
0 0
`T
j;l;n=¡T k;m=¡`
1 X
T X
`
¡6 lim Efc0 [(v0 v¡i ¡ ¡i ) -vj ]c0 [(vk vk¡l ¡ ¡l ) -vm ]g:
0 0
`T 2
j;k;m=¡T i;l=¡`
Proof of Lemma A.3: First, we will prove (A.17). By HÄlder's inequality and Lemma A.2, it
o
su±ces to show that
X
1
1=2
T E[(GT ¡ G0 ) -mT ] = E[w0 -vi ] + O(T ¡1 ); (A.21)
i=¡1
1
X
T 1=2 E[vech(ST ¡ ST ) -mT ] =
~ ¹ E[vech(v0 vi ) -vj ] + O(`¡q ) + O(`T ¡1 );
0
(A.22)
i;j=¡1
X1
T 1=2 E[vech(ST ¡ ST ) -mT ] =
^ ~ Efvech[rS(E(w0 )0 V E(w0 ))¡1 E(w0 )0 V v0 ] -vi g
¹
i=¡1
+O(`1=2 T ¡1=2 ); (A.23)
(T =`)E[vech(ST ¡ ST ) -vech(ST ¡ ST ) -mT ] = o(1):
^ ¹ ^ ¹ (A.24)
First, (A.21) follows from several applications of the mixing inequality. Second, we will show
(A.22). We have
1
X`
T 2 E[ !j vech(¡j ¡ ¡j ) -mT ]
~
j=0
X
` X
T ¡1
T ¡ i1(j > i) ¡ jjj1(j > 0 or j · ¡i) 0
= !i E[vech(v0 v¡i ) -vj ]
T
i=0 j=¡`¡T +1
X
` X
T ¡1
= !i E[vech(v0 v¡i ) -vj ] + O(`T ¡1 )
0
i=0 j=¡`¡T +1
X
` T ¡1
X
= E[vech(v0 v¡i ) -vj ] + O(`¡q ) + O(`T ¡1 )
0
i=0 j=¡`¡T +1
X X
1 1
= E[vech(v0 v¡i ) -vj ] + O(`¡q ) + O(`T ¡1 ):
0
(A.25)
i=0 j=¡1
The ¯rst equality follows from strict stationarity. Repeated applications of the moment inequal-
ity of Yokoyama (1980) produce
X
` T ¡j
X
T ¡ i1(j > i) ¡ jjj1(j > 0 or j · ¡i) 0
!i E[vech(v0 v¡i ) -vj ]
T
i=0 j=¡`¡T +1
0 2
¡2j¡1 ¡j ¡(1=2)i
X
` X X X
@T ¡1 4 r0 r0
= O !i jjj®¡i¡j + jjj®i + i®¡j
i=0 j=¡`¡T j=¡2j j=¡i
31
¡1
X X
i T ¡1
X
0
r0 r 0 5A
+ i®r
i+j + (i + j)®i + (i + j)®j
j=¡(1=2)i+1 j=0 j=i+1
¡1
= O(`T ): (A.26)
23
for some r0 2 (0; 1), from which the second equality follows. Arguments analogous to the
proof of Theorem 10 of Hannan (1970, pp.283-284) yield the last two equalities. By symmetric
arguments, it follows that
1
X
¡1
T E[
2 ~
!j vech(¡j ¡ ¡j ) -mT ]
j=¡`
X
¡1 X
1
= E[vech(v0 v¡i ) -vj ] + O(`¡q ) + O(`T ¡1 ):
0
(A.27)
i=¡1 j=¡1
Hence, (A.23) follows from (A.25) and (A.27). Third, we will show (A.23). It follows from
(A.14), Assumption 1(i) and Lemma A.2 that
1
^ ~
T 2 E[vech(ST ¡ ST ) -mT ]
1
= T 2 E[vech(rST (¯T ¡ ¯0 ) + r2 ST (¯T ¡ ¯0 )2 ) -mT ]
~ ~ ~ ~
1 1
¹ ~ ¹ ~
= T 2 E[vech((rST ¡ rST )(¯T ¡ ¯0 ) -mT )] + T 2 E[vech(rST (¯T ¡ ¯0 ) -mT )]
~
1
+T 2 E[vech((r2 ST ¡ r2 ST )(¯T ¡ ¯0 )2 ) -mT ]
~ ¹ ~
1
+T 2 E[vech((r2 r2 ST (¯T ¡ ¯0 )2 ) -mT )]
¹ ~
1
X
= Efvech[rS(E(w0 )0 V E(w0 ))¡1 E(w0 )0 V v0 -vi ]g + O(`1=2 T ¡1=2 );
¹ (A.28)
i=¡1
which completes the proof of (A.23). Lastly, we will show (A.24).
^ ¹ ^ ¹
(T =`)E[vech(ST ¡ ST ) -vech(ST ¡ ST ) -mT ]
~ ¹ ~ ¹
= (T =`)E[vech(ST ¡ ST ) -vech(ST ¡ ST ) -mT ] + o(1)
X
` X
T
= `¡1 T ¡3=2 0
E[vech(vt+i vt ¡ ¡i ) -vech(vs+j vs ¡ ¡j ) -vu ] + o(1)
i;j=¡` t;s;u=1
2 ¡1=2
= O(` T ) = o(1): (A.29)
Therefore, (A.17) follows from (A.21){(A.24).
Next, we will prove (A.18). It follows from (A.17), HÄlder's inequality and Lemma A.2 that
o
·2 (gT ) ¡ 1 = E(gT ) ¡ [E(gT )]2 ¡ 1
2
= 2Efa0 mT b0 [(GT ¡ G0 ) -mT ]g + 2Efa0 mT c0 [vech(ST ¡ ST ) -mT ]g
~ ¹
0 0
+2Efa mT e [vech(ST ¡ ST ) -vech(ST ¡ ST ) -mT ]g
~ ¹ ~ ¹
0 ^T ¡ ST ) -mT ]g2 + O(`1=2 T ¡1 ):
¹
+Efc [vech(S (A.30)
Thus, we only need to analyze the ¯rst four terms on the RHS of (A.30). First, by repeated
applications of the mixing inequality as in the proof of moment inequalities (e.g, the proof of
Lemma 4 of Billingsley, 1968, pp.172{174), one can show that
T Efa0 mT b0 [(GT ¡ G0 ) -mT ]g = O(1): (A.31)
Second, it follows from arguments similar to the one used in the proof of (A.17) that
(T =`)Efa0 mT c0 [vech(ST ¡ ST ) -mT ]g
~ ¹
X XXX
` T T T
= (`T )¡1 !j Efa0 vt c0 [vech(vs vs¡j ¡ ¡j ) -vu ]g
0
j=¡` t=1 s=1 u=1
X
` X
T ¡1
= `¡1 !j (1 ¡ ¿i;k )Efa0 v0 c0 [vech(vi vi¡j ¡ ¡j ) -vk ]g
0
j=¡` i;k=¡T +1
24
X
` T ¡1
X
= `¡1 !j Efa0 v0 c0 [vech(vi vi¡j ¡ ¡j ) -vk ]g + O(`T ¡1 )
0
j=¡` i;k=¡T +1
X
` X
T ¡1
= `¡1 Efa0 v0 c0 [vech(vi vi¡j ¡ ¡j ) -vk ]g + O(`¡qw ) + O(`T ¡1 )
0
j=¡` i;k=¡T +1
X
` T ¡1
X T ¡1
X
= lim `¡1 Efa0 v0 c0 [vech(vt vt¡j ¡ ¡j ) -vs ]g + O(`¡1 ); (A.32)
0
T !1
j=¡` t=¡T +1 s=¡T +1
(T =`)Efa0 mT e0 [vech(ST ¡ ST ) -vech(ST ¡ ST ) -mT ]g
~ ¹ ~ ¹
1 X X
` T
= !i !j Efa0 vr e0 [vech(vs vs¡i ¡ ¡i ) -vech(vt vt¡j ¡ ¡j ) -vu ]g
0 0
`T 2
i;j=¡` r;s;t;u=1
1 X X
` T
= !i !j (1 ¡ ¿s;t;u )Efa0 v0 e0 [vech(vs vs¡i ¡ ¡i ) -vech(vt vt¡j ¡ ¡j ) -vu ]g
0 0
`T
i;j=¡` s;t;u=¡T
1 X X
` T
= !i !j Efa0 v0 e0 [vech(vs vs¡i ¡ ¡i ) -vech(vt vt¡j ¡ ¡j ) -vu ]g
0 0
`T i;j=¡` s;t;u=¡T
+O(`2 T ¡1 )
1 X X
` T
= Efa0 v0 e0 [vech(vs vs¡i ¡ ¡i ) -vech(vt vt¡j ¡ ¡j ) -vu ]g
0 0
`T i;j=¡` s;t;u=¡T
+O(`¡q ) + O(`2 T ¡1 )
1 X X
` T
= lim Efa0 v0 e0 [vech(vs vs¡i ¡ ¡i ) -vech(vt vt¡j ¡ ¡j ) -vu ]g
0 0
T !1 `T
s;t;u=¡T
i;j=¡`
¡1
+O(` ); (A.33)
and
(T =`)Efc0 [vech(ST ¡ ST ) -mT ]g2
^ ¹
X
T X
`
= `¡1 T ¡2 !i !j Efc0 [vech(vs vs¡i ¡ ¡i ) -vt ]c0 [vech(vu vu¡j ¡ ¡j ) -vv ]g
0 0
t;s;u;v=1 i;j=¡`
X
T X
`
= (`T )¡1 !i !j (1 ¡ ¿j;k;m )Efc0 [vech(v0 v¡i ¡ ¡i ) -vj ]
0
j;k;m=¡T i;l=¡`
£c0 [vech(vk vk¡l ¡ ¡k ) -vm ]g
0
X
T X
`
= (`T )¡1 !i !j Efc0 [vech(v0 v¡i ¡ ¡i ) -vj ]c0 [vech(vk vk¡l ¡ ¡k ) -vm ]g
0 0
j;k;m=¡T i;l=¡`
¡1
+O(`T )
X
T X
`
= (`T )¡1 Efc0 [vech(v0 v¡i ¡ ¡i ) -vj ]c0 [vech(vk vk¡l ¡ ¡k ) -vm ]g
0 0
j;k;m=¡T i;l=¡`
¡q
+O(` ) + O(`T ¡1 )
XT X
`
= lim `¡1 T ¡1 Efc0 [vech(v0 v¡i ¡ ¡i ) -vj ]c0 [vech(vk vk¡l ¡ ¡k ) -vm ]g
0 0
T !1
j;k;m=¡T i;l=¡`
¡1
+O(` ); (A.34)
25
where ¿i;k = (1=T ) min(max(jij; jkj; ji ¡ kj); T ) and ¿s;t;u = (1=T ) min(max(jsj; jtj; juj; js ¡ tj; jt ¡
uj; ju ¡ sj); T ). The proofs of (A.32), (A.33) and (A.34) are similar to that of (A.17) and thus
details are omitted. Therefore, (A.18) follows from (A.30){(A.33).
Third, we will prove (A.19). By (A.17), (A.18) and
·3 (gT ) = E(gT ) ¡ 3E(gT )E(gT ) + 2(E(gT ))3 ;
3 2
(A.35)
it su±ces to show that
T 1=2 E(gT ) = ·1 + O(`¡q ) + o(`T ¡1=2 ):
3
(A.36)
It follows from Assumption 1(i), HÄlder's inequality and Lemma A.2 that
o
E(gT ) = E[(a0 mT )3 ] + 3Ef(a0 mT )2 b0 [(GT ¡ G0 )0 -mT ]g
3
+3Ef(a0 mT )2 c0 [vech(ST ¡ ST ) -mT ]g
~ ¹
+3Ef(a0 mT )2 c0 [vech(ST ¡ ST ) -mT ]g + o(`T ¡1 ):
^ ~ (A.37)
The rest of the proof is similar to that of (A.17), and thus we will only show that
1
X
`
T 2 Ef(a0 mT )2 c0 [ ~
vech(¡j ¡ ¡j ) -mT ]g
j=¡`
T ¡1
X
= lim (1=T ) Efa0 v0 a0 v¿ c0 [vech(vt vt¡k ¡ ¡k ) -vs ]g:
0
(A.38)
T !1
¿;t;s;k=¡T +1
It follows from arguments similar to the proof of (A.21) that
1
T 2 Ef(a0 mT )2 c0 [vech(ST ¡ ST ) -mT ]g
~ ¹
T ¡1
X X
`
= (1=T ) !j (1 ¡ ¿s;t;u )Efa0 v0 a0 vs c0 [vech(vt vt¡j ¡ ¡j ) -vu ]g
s;t;u=¡T +1 j=¡`
X
T ¡1 X
`
= (1=T ) !j Efa0 v0 a0 vs c0 [vech(vt vt¡j ¡ ¡j ) -vu ]g + O(T ¡1 )
s;t;u=¡T +1 j=¡`
T ¡1
X X
`
= (1=T ) Efa0 v0 a0 vs c0 [vech(vt vt¡j ¡ ¡j ) -vu ]g + O(`¡q )
s;t;u=¡T +1 j=¡`
X
T ¡1 X
`
= lim T ¡1 Efa0 v0 a0 v¿ c0 [vech(vt vt¡j ¡ ¡j )0 -vs ]g + O(`¡q ): (A.39)
0
T !1
¿;t;s=¡T +1 j=¡`
By arguments similar to the proof of Lemma 1 of Andrews (1991, pp.850{851), one can show
that the RHS of (A.39) equals the in¯nite sum of the product of two expectations plus some
¯nite number. By the mixing inequality, it follows that the in¯nite sum of the product of two
expectations is ¯nite. Therefore, the RHS of (A.39) is well de¯ned.
Lastly, we will show (A.20).
·4 (gT ) ¡ 3 = 4Ef(a0 mT )3 c0 [vech(ST ¡ ST ) -mT ]g
^ ¹
+4Ef(a0 mT )3 e0 [vech(ST ¡ ST ) -vech(ST ¡ ST ) -mT ]g
^ ¹ ^ ¹
³ ´
+6E (a0 mT )2 fc0 [vech(ST ¡ ST ) -mT ]g2
^ ¹
¡12Efa0 mT c0 [vech(ST ¡ ST ) -mT ]g
^ ¹
¡12Efa0 mT e0 [vech(ST ¡ ST ) -vech(ST ¡ ST ) -mT ]g
^ ¹ ^ ¹
0 2 1=2 ¡1
¡6Efc [vech(ST ¡ ST ) -mT ]g + O(` T );
^ ¹ (A.40)
26
from which the desired result follows by similar arguments. Q.E.D.
Lemma A.4:
Ãg;T (x)
· ¸
1 1 iµ 3 ` µ2 µ4 `
= exp ¡ µ 2 + T ¡ 2 (®1 (iµ) ¡ (·1 ¡ 3®1 )) ¡ ( °1 + ³1 ) + o( ) ; A.41)
(
2 6 T 2 24 T
P (gT · x) = ª(x) + T ¡1=2 p1 (x) + (`=T )p2 (x) + o(`=T ): (A.42)
Proof of Lemma A.4: The proof of (A.41) follows from the standard arguments. (A.42) can be
obtained by inverting (A.41). Q.E.D.
Lemma A.5: Following GÄtze and KÄnsch (1996), de¯ne a truncation function by
o u
¿ (x) = T ° xf (T ¡° kxk)=kxk
where ° 2 (2=r; 1=2) and f 2 C 1 (0; 1) satis¯es (i) f (x) = x for x · 1; (ii) f is increasing; and
y
(iii) f (x) = 2 for x ¸ 2. Let fT denote fT with Rt ´ (vt ; vt ; vec(wt )0 ) replaced by
¹ 0
~
Ry = (vt ; vt ; vec(wt )0 )0 = ¿ ((vt ; vt ; vec(wt )0 )0 ) :
¹
t
y0 y0
~ y 0
~0
Let ªy and ªy denote the Edgeworth expansions of fT and gT , respectively. Let Ãg;T (x) and
T g;T
y y y
Ãg;T (x) denote the characteristic functions of gT and ªy , respectively. Then
~y y
g;T
Z
y ~y
sup jP (fT · x) ¡ ªT (x)j · C jÃg;T (µ) ¡ Ãg;T (µ)jjµj¡1 dµ + O(`¡q ) + o(`T ¡1 ): (A.43)
x jµj