There were some standard, old techniques I didn't see. For the set of real numbe...

There were some standard, old techniques I didn't see.

For the set of real numbers R, a positive integer n, and a function f: R^n --> R, we seek x in R^n to minimize f(x). Let z in R^n denote the value of x that minimizes f(x).

Borrowing from D. Knuth's mathematical typesetting system TeX, we let x_i denote component i of x. More generally we use the underscore to denote subscript.

Assume that function f is differentiable and let D_x f(x) be the derivative, that is, the gradient, that is, the vector where component i is the partial derivative if f with respect to component i of x.

(1) Line search. Suppose we have iterations j = 1, 2, ... where x^j is our jth estimate of z.

Then, for positive real number s, our step size, we can the usual gradient descent is

x^{j + 1} = x^j - s D_x f(x^j)

Well, with this approach, it need not be that

f(x^{j + 1}) < f(x^j)

So, an improvement is, for each iteration j, to do a line search to find the set size s. If f is convex, then there is a simple, standard approach to how to adjust s on this like search.

(2) Conjugate Gradients. Gradient descent is vulnerable to a lot of movement in directions that are nearly orthogonal to the best direction. The classic technique conjugate gradients after n iterations improves this situation.

(3) Newton Iteration. The simple Newton iteration for, say, square root generalizes to a function of n variables but, of course, requires finding second derivatives of f.

(4) Quasi-Newton. Look up quasi-Newton that estimates the second derivatives of Newton iteration from the gradients from the iterations.