|
Abstract
This paper shows that out-of-sample forecast comparisons can help prevent data
mining-induced overfitting. The basic results are drawn from simulations of a simple Monte
Carlo design and a real data-based design similar to those in Lovell (1983) and Hoover and
Perez (1999). In each simulation, a general-to-specific procedure is used to arrive at a
model. If the selected specification includes any of the candidate explanatory variables,
forecasts from
the model are compared to forecasts from a benchmark model that is nested within the
selected model. In particular, the competing forecasts are tested for equal MSE and
encompassing. The simulations indicate most of the post-sample tests are roughly correctly
sized, as long as just the in-sample portion of the data are used in model selection.
Moreover, the tests have relatively good power, although some are consistently more
powerful than others. The paper concludes with an application, modeling quarterly U.S.
inflation.
JEL Nos.: C52, C53, E37
Keywords: forecasts, overfitting, model selection, causality
Todd E. Clark is an assistant vice president and economist at the Federal Reserve Bank of
Kansas City. He
gratefully acknowledges the helpful comments of Mike McCracken and seminar participants at
the Federal
Reserve Bank of Kansas City. The views expressed are those of the author and not
necessarily those of the
Federal Reserve Bank of Kansas City or the Federal Reserve System.
|