Adaptation using sparsity inducing norms
Sara van de Geer
ETH Zürich, Switzerland
To set the framework, we first consider exact recovery
using a sparsity inducing norm. Let $f^0 \in \mathbb{R}^n$ be a given vector
and $X \in \mathbb{R}^{n \times p}$ be a given ``design" (or ``coding")
matrix. Aim is to reconstruct the vector $\beta^0 \in \mathbb{R}^p$ which
solves $X \beta^0 = f^0$ and has certain sparsity properties. One then
chooses an ``appropriate" norm $\Omega$ on $\mathbb{R}^p$ and solves $$
\min_{\beta \in \mathbb{R}^p} \Omega (\beta) \ \mbox{subject to} \ X \beta =
f^0 . $$ Assuming the socalled ``$\Omega$null space property", one has
exact recovery: the minimizer $\beta^*$ is equal to sparse vector $\beta^0$.
With the $\Omega$null space property, one can also derive
adaptation results for statistical estimation procedures. We will examine
estimators based on an empirical risk function, such as sumofsquares,
minus loglikelihood, etc.. Denoting the empirical risk at a parameter value
$\beta \in \mathbb{R}^p$ by $\hat R_n (\beta)$, we consider the estimator $$
\hat \beta_n := \arg \min_{\beta} \biggl \{ \hat R_n (\beta) + \Omega
(\beta) \biggr \} . $$ Let $R(\beta):= {\rm I\hskip0.48em E} \hat R_n
(\beta)$ be the theoretical risk and let $\beta^0 := \arg \min_{\beta}
R(\beta)$ be the target. We show that with high probability $R (\hat
\beta_n)$ is close to $R(\beta^0)$ with a remainder term representing the
estimation error. Adaptation means that the remainder is about as small as
one could achieve when the sparsity of $\beta^0$ were known. We then provide
several examples, such as the Lasso, matrix completion and sparse PCA.
