Adaptation using sparsity inducing norms
Sara van de Geer
ETH Zürich, Switzerland
To set the framework, we first consider exact recovery
using a sparsity inducing norm. Let $f^0 \in \mathbb{R}^n$ be a given vector
and $X \in \mathbb{R}^{n \times p}$ be a given ``design" (or ``coding")
matrix. Aim is to reconstruct the vector $\beta^0 \in \mathbb{R}^p$ which
solves $X \beta^0 = f^0$ and has certain sparsity properties. One then
chooses an ``appropriate" norm $\Omega$ on $\mathbb{R}^p$ and solves $$
\min_{\beta \in \mathbb{R}^p} \Omega (\beta) \ \mbox{subject to} \ X \beta =
f^0 . $$ Assuming the so-called ``$\Omega$-null space property", one has
exact recovery: the minimizer $\beta^*$ is equal to sparse vector $\beta^0$.
With the $\Omega$-null space property, one can also derive
adaptation results for statistical estimation procedures. We will examine
estimators based on an empirical risk function, such as sum-of-squares,
minus log-likelihood, etc.. Denoting the empirical risk at a parameter value
$\beta \in \mathbb{R}^p$ by $\hat R_n (\beta)$, we consider the estimator $$
\hat \beta_n := \arg \min_{\beta} \biggl \{ \hat R_n (\beta) + \Omega
(\beta) \biggr \} . $$ Let $R(\beta):= {\rm I\hskip-0.48em E} \hat R_n
(\beta)$ be the theoretical risk and let $\beta^0 := \arg \min_{\beta}
R(\beta)$ be the target. We show that with high probability $R (\hat
\beta_n)$ is close to $R(\beta^0)$ with a remainder term representing the
estimation error. Adaptation means that the remainder is about as small as
one could achieve when the sparsity of $\beta^0$ were known. We then provide
several examples, such as the Lasso, matrix completion and sparse PCA.
|