Gradient Descent
Gradient Descent is an algorithm for online linear regression. In the simplest case, the algorithm predicts a sequence of numbers ⚠ $ y_t\in[0,1] $
. The prediction space is ⚠ $ [0,1] $
. The master's prediction is the weighted average of the input vectors ⚠ $x_t$
. At each step the weights are updated by the rule, which depends on the gradient of a loss function: ⚠ $w^k_{t+1} := w^k_t - \eta \lambda_\gamma^{'}(\omega_t,\gamma_{t}) \gamma^k_t$
.
For the bounded signal ⚠ $x: ||x_t||_2 \le B$
and convex twice differentiable by the second argument loss function the Gradient Descent algorithm can achieve
⚠ $L_T \le L_T(\theta) + BLU\sqrt{T}$
,
where ⚠ $L_T(\theta)$
is the loss of any linear function of ⚠ $x$
, and ⚠ $L$
is a bound for the derivative of the loss function: ⚠ $ |\nabla_\gamma\lambda(\omega_t,\gamma_{t})| \le L$
. The best ⚠ $\eta=U / (BL\sqrt{T})$
. Here ⚠ $U$
us the norm of the experts: ⚠ $ ||\theta||_2\le U$
The Gradient Descent algorithm was intestigated for this problem by Cesa-Bianchi et al. (1994), and is also known as Least Mean Squares algorithm for the square-loss function.
Bibliography
- N. Cesa-Bianchi, P.M. Long, and M.K. Warmuth: Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Transactions on Neural Networks, 7(3):604-619, 1996.