STA 290 Seminar Series
DATE: Tuesday, January 17th 2017, 4:10pm
LOCATION: MSB 1147, Colloquium Room
SPEAKER: Xiaoying Tian Harris, Stanford University
TITLE: “Prediction error after model search”
ABSTRACT: It is difficult to estimate the prediction error of adaptively chosen linear estimators. In this work, we propose an asymptotically unbiased estimator for the prediction error after adaptive model selection. Under some additional mild assumptions, we show that our estimator converges to the true prediction error in $L^2$ at the rate of $O(n^{-1/2})$, with $n$ being the number of data points. Our estimator applies to general selection procedures, not requiring analytical forms for the selection. The number of variables to select from can grow as an exponential factor of $n$, allowing applications in high-dimensional data. It also allows model misspecifications, not requiring linear underlying models. One application of our method is that it provides an estimator for the degrees of freedom for many discontinuous estimation rules like best subset selection or relaxed Lasso. We consider in-sample prediction errors in this work, with some extension to out-of-sample errors in low dimensional, linear models. Examples such as best subset selection and relaxed Lasso are considered in simulations, where our estimator outperforms both $C_p$ and cross validation in various settings.