Exponentiated Gradient
Exponentiated Gradient was introduced by Kivinen and Warmuth (1997) for the online linear regression problem. I can also be applied for prediction with expert advice, and in some sense can be classified to the category of exponential weights algorithms. In the simplest case, the algorithm predicts a sequence of numbers ⚠ $ y_t\in[0,1] $
using advice from a finite number ⚠ $K$
of experts. The prediction space is ⚠ $ [0,1] $
. Each expert is initially assigned weight ⚠ $ 1/K $
. The master's prediction is a weighted average of the experts' predictions. At each step the weights are updated by the exponential rule, which depends on the gradient of a loss function: ⚠ $w^k_{t+1} := w^k_t \exp^{-\eta\nabla_\gamma(\lambda(\omega_t,\gamma_{t})) \gamma^k_t}$
, and then normalized. The intuituion behind this is that the weight is decreasing for the expert who adviced points in the direction of the largest increase of the loss function.
It is shown in Cesa-Bianchi (2006), that for the convex loss functions with a bounded gradient ⚠ $||\nabla_\gamma \lambda|| < 1$
and ⚠ $\eta > 0$
the algorithm guarantees the inequality
⚠ $ L_T \le L_T(k)+\sqrt{2T\ln K} $
for all ⚠ $T$
and ⚠ $k$
, where ⚠ $L_T$
is the loss suffered by master over the first ⚠ $T$
trials, and ⚠ $L_T(k)$
is the loss suffered by the ⚠ $k$
th expert over the first ⚠ $T$
trials. The best ⚠ $\eta=\sqrt{\frac{2\ln K}{T}}$
.
For the online linear regression problem with bounded signal ⚠ $x \in \mathbb{R}^n: ||x_t||_\infty \le B$
, in Cesa-Bianchi (1999) you can find the proof that for the convex twice differentiable by the second argument loss function the Exponentiated Gradient algorithm can achieve
⚠ $L_T \le L_T(\theta) + BL\sqrt{\frac{T\ln n}{2}}$
,
where ⚠ $L_T(\theta)$
is the loss of any linear function of ⚠ $x$
, and ⚠ $L$
is a bound for the derivative of the loss function: ⚠ $ |\nabla_\gamma\lambda(\omega_t,\gamma_{t})| \le L$
. The best ⚠ $\eta=\sqrt{8\ln n}/ (BL\sqrt{T})$
.
Bibliography
- Jyrki Kivinen and Manfred K. Warmuth: Exponentiated gradient versus gradient descent for linear predictors. Information and Computation 132(1):1-63, 1997.
- N. Cesa-Bianchi: Analysis of two gradient-based algorithms for on-line regression. Journal of Computer and System Sciences, 59(3):392-411, 1999.
- N. Cesa-Bianchi and G. Lugosi: Prediciton, Learning and Games. Cambridge University Press, 2006.