There were some standard,
old techniques I didn't see.
For the set of real numbers
R, a positive integer n, and
a function f: R^n --> R, we
seek x in R^n to minimize
f(x). Let z in R^n denote
the value of x that
minimizes f(x).
Borrowing from D. Knuth's
mathematical typesetting
system TeX, we let x_i
denote component i of x.
More generally we use the
underscore to denote
subscript.
Assume that function f is
differentiable and let
D_x f(x) be the derivative,
that is, the gradient,
that is, the vector
where component i is the
partial derivative if f
with respect to component
i of x.
(1) Line search. Suppose
we have iterations
j = 1, 2, ... where x^j is
our jth estimate of z.
Then, for positive real number
s, our step size, we can
the usual gradient descent
is
x^{j + 1} = x^j - s D_x f(x^j)
Well, with this approach, it
need not be that
f(x^{j + 1}) < f(x^j)
So, an improvement is,
for each iteration j, to
do a line search to find
the set size s. If f is
convex, then there is a
simple, standard approach
to how to adjust s on this
like search.
(2) Conjugate Gradients.
Gradient descent is
vulnerable to a lot of
movement in directions
that are nearly orthogonal
to the best direction.
The classic technique
conjugate gradients
after n iterations
improves this situation.
(3) Newton Iteration.
The simple Newton iteration
for, say, square root
generalizes to
a function of n variables but, of course,
requires finding second
derivatives of f.
(4) Quasi-Newton.
Look up quasi-Newton that
estimates the second
derivatives of Newton
iteration from the
gradients from the
iterations.
For the set of real numbers R, a positive integer n, and a function f: R^n --> R, we seek x in R^n to minimize f(x). Let z in R^n denote the value of x that minimizes f(x).
Borrowing from D. Knuth's mathematical typesetting system TeX, we let x_i denote component i of x. More generally we use the underscore to denote subscript.
Assume that function f is differentiable and let D_x f(x) be the derivative, that is, the gradient, that is, the vector where component i is the partial derivative if f with respect to component i of x.
(1) Line search. Suppose we have iterations j = 1, 2, ... where x^j is our jth estimate of z.
Then, for positive real number s, our step size, we can the usual gradient descent is
x^{j + 1} = x^j - s D_x f(x^j)
Well, with this approach, it need not be that
f(x^{j + 1}) < f(x^j)
So, an improvement is, for each iteration j, to do a line search to find the set size s. If f is convex, then there is a simple, standard approach to how to adjust s on this like search.
(2) Conjugate Gradients. Gradient descent is vulnerable to a lot of movement in directions that are nearly orthogonal to the best direction. The classic technique conjugate gradients after n iterations improves this situation.
(3) Newton Iteration. The simple Newton iteration for, say, square root generalizes to a function of n variables but, of course, requires finding second derivatives of f.
(4) Quasi-Newton. Look up quasi-Newton that estimates the second derivatives of Newton iteration from the gradients from the iterations.