11
\$\begingroup\$

I'm trying to implement the so-called golden-section optimization in C++. This is a simple algorithm to find an extremum of a univariate function \$f\$ within an interval \$[a,b]\$.

algo

The crux is to write the code such that \$f\$ appears only once in the method body.

The code below is working but very far from elegant or optimal. Any ideas how to improve (reduce the number of lines and cases)?

#include <functional>
#include <math.h>

double Minimization(std::function<double(double)> f, double a, double b, double minstepwidth, int maxiterations)
{
    double R = (sqrt(5.) - 1.) / 2.; // 1/golden ratio
    double x1 = b + (a - b) * R;
    double x2 = a + (b - a) * R;
    double x,y, fx1 = 1E300, fx2 = 1E300;
    int i = 0;
    while (fabs(a - b) > minstepwidth && i < maxiterations)
    {
        if (fx1 == 1E300) // start detected
        {
            x = x1;
        }
        else if (fx2 == 1E300) // start detected
        {
            x = x2;
        }
        else if (fx1 < fx2)
        {
            b = x2;
            x2 = x1;
            x1 = b + (a - b) * R;
            x = x1;
        }
        else 
        {
            a = x1;
            x1 = x2;
            x2 = a + (b - a) * R;
            x = x2;
        }
        y = f(x); // unique appearance
        i++;
        if (fx1 == 1E300)
        {
            fx1 = y;
        }
        else if (fx2 == 1E300)
        {
            fx2 = y;
        }
        else if (fx1 < fx2)
        {
            fx2 = fx1;
            fx1 = y;
        }
        else
        {
            fx1 = fx2;
            fx2 = y;
        }
    }
    return (a + b) / 2.;
}

Addendum

Since everyone seems to be curious: Simplified, this C++ method (with float instead of double) is part of a larger code (running on the CPU) which needs an identical twin written in HLSL as a pixel shader (running on the GPU). The way \$f\$ is implemented (cannot be changed) increases the amount of shader instructions a lot every time \$f\$ appears in the HLSL code. Since both codes should be equal (for maintenance and to 100% return the same results) the constraint applies also for the C++ code.

\$\endgroup\$
8
  • 3
    \$\begingroup\$ Are you saying that the function must only be evaluated at one place in on your code? That seems very restrictive to me, and makes the code unnecessary complicated. \$\endgroup\$ Commented Jul 29 at 16:17
  • \$\begingroup\$ Yes, exactly. Only once in the method body. \$\endgroup\$ Commented Jul 29 at 16:34
  • 3
    \$\begingroup\$ Can you just do auto g = f; and use g? \$\endgroup\$ Commented Jul 30 at 6:28
  • 7
    \$\begingroup\$ I've regularly seen performance requirements requiring that a function only be evaluated once per iteration, but I've never seen or heard of a requirement of a function only appearing once in the body. Are you sure you understood the requirements correctly, and if you did could explain the reasoning behind so we can be sure this isn't some X/Y problem? \$\endgroup\$ Commented Jul 30 at 7:32
  • 2
    \$\begingroup\$ @infinitezero For god's sake, I added some context... ;) \$\endgroup\$ Commented Jul 30 at 18:55

3 Answers 3

11
\$\begingroup\$

The crux is to write the code such that f appears only once in the method body.

This restriction appears pointless to me, but anyway. You can still write clean code if you incorporate three ideas:

  1. Start with an interval that is larger than [a,b] by a factor of 1.618034 so that the interval [a,b] naturally appears on the second iteration of the loop. The first iteration then performs the initialization without needing special case handling. Initializing x2 = b instead of x2 = a + (b - a) * R gives you the larger interval on the first iteration and the correct [a,b] interval on the second iteration.
  2. Initialize the function value fx2 = NaN (not-a-number) which is a special floating point value that always compares false, even with itself. We can use this behavior to essentially disable the if statement on the first loop iteration, making the initialization unconditional (again without special case handling.)
  3. Conditionally flip the x-axis, then unconditionally recurse into the left subinterval. This allows us to make an unconditional call to f(x1) outside the if-else construct. Without your restriction that f() appears only once, we would normally call f(x1) in the if branch and f(x2) in the else branch.

Here is the resulting code. I frequently used std::tie(...) = std::make_tuple(...) which is my favorite way to swap and shuffle a large number of variables, but other constructs are also possible.

double Minimization(std::function<double(double)> f, double a, double b, double minstepwidth, int maxiterations)
{
    // R = 0.618034 (golden ratio)
    double R = (std::sqrt(5.) - 1.) / 2.;
    // Start with an interval that is larger than [a,b] by a factor of 1.618034 (golden ratio). This allows
    // the algorithm to unconditionally recurse once, giving us the interval [a,b] on the second iteration.
    // This somewhat weird initialization scheme avoids calling f() outside the loop.
    double x0 = a;
    double x2 = b;
    // The initial value of x3 is never used (except when the loop is skipped by maxiterations == 0) 
    // since we unconditionally recurse into the left subinterval and thus replace x3 with x2. The initial
    // value of fx2 is NaN, which always compares false (even with itself) and ensures that we never flip
    // the x-axis on the first iteration.
    double x3 = b;
    double fx2 = std::numeric_limits<double>::quiet_NaN();

    for (size_t i = 0; std::abs(x0 - x3) > minstepwidth && i < maxiterations; i++)
    {
        // Invoke the function in just one place. I don't know why, but rules are rules.
        double x1 = std::lerp(x0, x2, R);
        double fx1 = f(x1);
        // We want our function to increase monotonically. If not, we flip the x-axis.
        if (fx1 > fx2)
        {
            std::tie(x0, x1, x2, x3) = std::make_tuple(x3, x2, x1, x0);
            std::swap(fx1, fx2);
        }
        // Unconditionally recurse into the left subinterval, i.e. [x0,x2]
        std::tie(fx2, x2, x3) = std::make_tuple(fx1, x1, x2);
    }
    return std::midpoint(x0, x3);
}
\$\endgroup\$
1
  • \$\begingroup\$ That's a well worked out piece of code exactly targeting the question, good job. \$\endgroup\$ Commented Jul 30 at 8:27
10
\$\begingroup\$

Avoid using sentinel values

J_H already mentioned magic numbers, and suggested replacing them with a constant. However, in this case using a sentinel value is really a bad idea. You have no idea what values f() produces, so 1e300 might actually be a valid function value. If this value would be returned by f(), then your algorithm would return incorrect values.

Why restrict the use of f()?

The crux is to write the code such that \$f\$ appears only once in the method body.

Why? This is a self-imposed restriction that causes the issues with the sentinel values. So now the code is wrong in some cases, it is more complicated than necessary, and it doesn't have any performance benefit.

You don't need minstepwidth and maxiterations

I see why you have a minimum step width and a maximum number of iterations. You want to have a certain precision of the result, but due to floating point numbers being imprecise, you have a maximum number of iterations to ensure the loop always terminates. However, it is a bit ugly, and it can actually be avoided.

Assuming you start with x1 < x2, then if f1 > f2, you will set x1 to the value of x2 (which is guaranteed to be a different value), and then you will calculate a new value for x2. If that new value is equal to or lower than x1, then you reached the floating point precision limit. However, if it is bigger, then the condition x1 < x2 still holds, and you can safely do another iteration of the loop. The same argument applies for the case of f1 <= f2.

Since it is never the case that x1 and x2 will have the same value after one iteration, and floating points have a limited precision, it follows that you will only need a finite number of steps (around 53 given the number of bits of the mantissa of a double).

So just writing while (x1 < x2) would be fine. You can swap a and b at the start if necessary to ensure that a < b.

Use std::sqrt() from <cmath>

Instead of the C header <math.h>, include the C++ version <cmath>. This will then guarantee that you can use std::sqrt(), which is overloaded so it automatically works on the right floating point type.

Use std::lerp() and std::midpoint()

Instead of doing your own interpolations of values, I recommend you use std::lerp() and std::midpoint(). These avoid some pitfalls when interpolating both integers and floating point numbers, and arguably are a bit more readable.

Consider using std::exchange()

You can reduce the number of lines of code a bit by using std::exchange(). In particular, instead of:

a = b;
b = c;

You can write:

a = std::exchange(b, c);

You could also write the above line as:

std::exchange(a, std::exchange(b, c));

And there you can see you could chain even more exchanges. It's too bad it only takes three parameters, because in your code it would have been nice to be able to write:

std::exchange(a, x1, x2, std::lerp(a, b, R));

But, this is a bit of a lesser known function, and it doesn't make any substantial improvements, so I would be fine with not using this.

Naming

Be consistent with your code style. If you start variables with lower case letters, always do that. It's also customary to start function names with lower case, and only use names that start with a capital for types.

I strongly recommend that you avoid using nouns for function names. Instead, use verbs. So for example, instead of Minimization(), I would write minimize() or find_minimum().

Example of refactored code

With the above suggestions implemented, the code would look like:

#include <cmath>
#include <functional>
#include <numeric>
#include <utility>

double find_minimum(std::function<double(double)> f, double a, double b) {
    static constexpr double tau = (std::sqrt(5.0) - 1) / 2;

    if (a > b) {
        std::swap(a, b);
    }

    double x1 = std::lerp(a, b, 1 - tau);
    double x2 = std::lerp(a, b, tau);

    double f1 = f(x1);
    double f2 = f(x2);

    while (x1 < x2) {
        if (f1 > f2) {
            a = std::exchange(x1, x2);
            x2 = std::lerp(a, b, tau);
            f1 = std::exchange(f2, f(x2));
        } else {
            b = std::exchange(x2, x1);
            x1 = std::lerp(a, b, 1 - tau);
            f2 = std::exchange(f1, f(x1));
        }
    }

    return std::midpoint(x1, x2);
}
\$\endgroup\$
2
  • 2
    \$\begingroup\$ In general these are good tips but for my case the 'crux' is mandatory. The rest (parameters, libraries, limits, naming) is restricted as its part of a huge project with millions of lines of code and dependencies. In any case \$f\$ is small to avoid precision errors and working near 1E308 is a bad idea anyway, that's why the value. \$\endgroup\$ Commented Jul 29 at 21:31
  • 4
    \$\begingroup\$ So this is for a real-world project? It's not some strange learning exercise? I'm really curious what the actual reasoning for the 'crux' is, then. There's no benefit of any kind that I can see. \$\endgroup\$ Commented Jul 30 at 6:58
3
\$\begingroup\$

invariants

I wouldn't mind seeing an assert, or at least an English sentence, spelling out how x1, x2, & x3 are ordered.

And do we really need to introduce a & b ? If so, consider at least renaming them, as their meanings in the code and in the cited wikipedia reference are different.

repeated magic numbers

... fx1 = 1E300, fx2 = 1E300;

Please define a constant SENTINEL value, and then use that symbol in the if tests.

Consider using NaN for this purpose, along with std::isnan().

unsigned

Consider using an unsigned type for maxiterations and i, given that they can't be negative.

motivation

https://en.wikipedia.org/wiki/Fibonacci_search_technique

Fibonacci search can also have an advantage in searching data stored in certain structures, such as on a magnetic tape, where seek times are heavily dependent on the distance from the current head position.

The source code should explain why this approach is in some sense "better" than simple binary search. For example, do we anticipate that f() can be evaluated more cheaply or quickly if we already know nearby value(s)? Mention an URL that describes your observed benchmark results.

The source code should also explain the "f() appears only once" constraint. It does significant harm to the readability of the code, so anyone reading the source will need to understand the engineering tradeoffs and the justification for adopting the constraint.

This will aid a future maintainer, who may wish to re-evaluate whether either aspect is still applicable to an evolving Use Case.

\$\endgroup\$
4
  • \$\begingroup\$ The 'wiki' is a bit over-complicated, so I used an easier approach. I thought about adding the motivation but that requires a longer explaination well beyond the scope of the question (99.9% do not state that in their questions either). Yes, 1E300 is not ideal, but NaN is usually reserved if something goes wrong. \$\endgroup\$ Commented Jul 29 at 18:03
  • 2
    \$\begingroup\$ Binary search is useful to find the zero of a function. Golden-section search is useful to find the extremum of a function. Those are two different problems. \$\endgroup\$ Commented Jul 30 at 9:14
  • \$\begingroup\$ @Stef, yeah. I was viewing it as, at any point we can know f(x) and f(x + epsilon), and hence we know f’(x). By hypothesis we start with the derivative, the slope, of f’(a) having different sign from f’(b). And then it just looks like binary search for wherever in the middle the derivative crosses zero. \$\endgroup\$ Commented Jul 30 at 15:12
  • 1
    \$\begingroup\$ Regarding "at any point we can know f(x) and f(x + epsilon), and..." that's basically how golden section optimization works, but with a twist: In the current iteration, we re-use f(x) from the previous iteration as f(x+epsilon). This way, we only need one evaluation of f(x) per iteration whereas your naive proposal needs two. If f(x) is expensive to evaluate, that's a huge difference. \$\endgroup\$ Commented Jul 30 at 20:47

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.