Competing With Gaussian Linear Experts
This paper describes an approach to prove guarantees on the loss of online Ridge Regression. The results of the paper are approached from the probabilistic perspective in [An Identity For Kernel Ridge Regression | Zhdanov and Kalnishkan, 2010]].
Theorem. The Ridge Regression algorithm for the learner with ⚠ $a > 0$
satisfies, at any step ⚠ $T$
,
⚠ $\displaystyle{ \sum_{t=1}^T \frac { (y_t - \gamma_t)^2}{ 1+x_t'A_{t-1}^{-1}x_t} = \min_{\theta \in \mathbb{R}^n} \left(\sum_{t=1}^T (y_t - \theta'x_t)^2 + a\|\theta\|^2 \right).}$
The following corollary on the upper bound of clipped Ridge Regression follows from the theorem.
Corollary. Assume ⚠ $|y_t| \le Y$
for all ⚠ $t$
, clip the predictions of Ridge Regression to ⚠ $ [-Y,Y] $
, and denote them by ⚠ $\gamma_t^Y$
. Then
⚠ $\displaystyle{ \sum_{t=1}^T (y_t - \gamma_t^Y)^2 \le \min_\theta \left(\sum_{t=1}^T (y_t - \theta' x_t)^2 + a\|\theta\|^2 \right) + 4Y^2 \ln\det \left(I + \frac{1}{a} \sum_{t=1}^T x_t x_t'\right).}$
The approach to prove the main theorem is very elegant. The Ridge Regression predictions are considered as the means of the Bayesian Ridge Regression predictions, and the loss bound is easily derived from the loss bound on the Bayesian Ridge Regression. The loss bound on the logarithmic loss of the Bayesian method in the form of equality is proved using the Bayesian algorithm, a special case of the Aggregating Algorithm. The Bayesian algorithm competes with Gaussian Linear Experts. The following bound is proven.
Theorem. For any sequence ⚠ $x_1,y_1,x_2,y_2,\ldots,$
the cumulative logarithmic loss of the Bayesian Ridge Regression algorithm at any step ⚠ $T$
can be expressed as
⚠ $\displaystyle{ L_T = \min_\theta \left(L_T(\theta) + \frac{a}{2\sigma^2}\|\theta\|^2\right) + \frac{1}{2}\ln \det \left( I + \frac{1}{a}\sum_{t=1}^T x_t x_t' \right).}$
If ⚠ $\|x_t\|_\infty \le X$
for any ⚠ $t = 1,2,\ldots,$
then
⚠ $\displaystyle{ L_T \le \min_\theta \left(L_T(\theta) + \frac{a}{2\sigma^2}\|\theta\|^2\right) + \frac{n}{2}\ln \left( 1 + \frac{TX^2}{a}\right).}$
The guarantees for the kernelized version are also proven.
- Fedor Zhdanov and Vladimir Vovk. Competing with Gaussian linear experts. Technical report, arXiv:0910.4683 [cs.LG], arXiv.org e-Print archive, 2009.
- Fedor Zhdanov and Yuri Kalnishkan. An identity for kernel ridge regression. In Proceedings of the 21st International Conference on Algorithmic Learning Theory, 2010.