Applied Statistics 2016 /  van de Geer

Adaptation using sparsity inducing norms


Sara van de Geer
ETH Zürich, Switzerland

To set the framework, we first consider exact recovery using a sparsity inducing norm. Let $f^0 \in \mathbb{R}^n$ be a given vector and $X \in \mathbb{R}^{n \times p}$ be a given ``design" (or ``coding") matrix. Aim is to reconstruct the vector $\beta^0 \in \mathbb{R}^p$ which solves $X \beta^0 = f^0$ and has certain sparsity properties. One then chooses an
``appropriate" norm $\Omega$ on $\mathbb{R}^p$ and solves $$ \min_{\beta \in \mathbb{R}^p} \Omega (\beta) \ \mbox{subject to} \ X \beta = f^0 . $$ Assuming the so-called ``$\Omega$-null space property", one has exact recovery: the minimizer $\beta^*$ is equal to sparse vector $\beta^0$.

With the $\Omega$-null space property, one can also derive adaptation results for statistical estimation procedures. We will examine estimators based on an empirical risk function, such as sum-of-squares, minus log-likelihood, etc.. Denoting the empirical risk at a parameter value $\beta \in \mathbb{R}^p$ by $\hat R_n (\beta)$, we consider the estimator $$ \hat \beta_n := \arg \min_{\beta} \biggl \{ \hat R_n (\beta) + \Omega (\beta) \biggr \} . $$ Let $R(\beta):= {\rm I\hskip-0.48em E} \hat R_n (\beta)$ be the theoretical risk and let $\beta^0 := \arg \min_{\beta} R(\beta)$ be the target. We show that with high probability $R (\hat \beta_n)$ is close to $R(\beta^0)$ with a remainder term representing the estimation error. Adaptation means that the remainder is about as small as one could achieve when the sparsity of $\beta^0$ were known. We then provide several examples, such as the Lasso, matrix completion and sparse PCA.