Gradient-based methods

The same principles that reign in local search methods for discrete spaces also apply to continuous spaces, i.e., start at some random place and take little steps toward the maximum. Taking into account that most real-world problems are continuous, it is no surprise that the literature on this topic is vast. For conventional, well-understood problems, a gradient-based technique may be the best approach you can take. It’s really worth it having a look at how they work!

Search objectives in continuous spaces are broadly described by real-valued smooth functions , i.e., twice differentiable functions. Then, the gradient/derivative of this objective function is a vector that gives the magnitude and direction of the steepest slope:

With this you can perform steepest-ascent hill climbing by updating the current state (initially random) according to the formula:

where is a small constant often called the “step size”, which determines the amount of contribution that the gradient has in the update process. Little by little, one step at a time, the algorithm will get to the maximum. However, having to calculate the gradient expression analytically by hand is a pain in the arse. Instead, you can compute an empirical gradient (i.e., numerically) by evaluating the response to small increments in each dimension/variable.

In order to see the gradient ascent algorithm in action, I will use a parallel LC resonant band-pass filter, see Figure 1.

Figure 1. Parallel LC resonant band-pass filter. Source: All About Circuits.

Parallel LC resonant band-pass filter
Parallel LC resonant band-pass filter.

The goal here is to find the band-pass frequency (a real number) of this filter. Recall that this is to be done numerically (the resonance of this circuit is well known to be , so at 159.15Hz for these component values: L=100mH, C=10uF). Thus, the magnitude of the transfer function of this filter is given as follows:

Note that is the magnitude of the impedance of the parallel LC circuit. The landscape plot of this function (that is the frequency domain) is shown in Figure 2. Note that this particular problem is somewhat tricky for gradient ascent, because the derivative increases as it approaches the maximum, so it may easily diverge during the update process (a really small step size is mandatory here). Other objective (convex) functions like the exponential error are expected to be more suitable because they are normally smoother around the extrema.

Figure 2. Gradient ascent applied on the magnitude function of a parallel LC resonant band-pass filter.

Gradient ascent applied on the magnitude function of a parallel LC resonant band-pass filter
Gradient ascent applied on the magnitude function of a parallel LC resonant band-pass filter.

Despite the high peakedness of the he objective function, the gradient ascent method is able to find the maximum in a breeze. The complete code listing is shown in Figure 3. Regarding its computational complexity, the space is bounded by the size of the variables vector, but the time depends on the specific shape of the landscape, the size of the step, etc.

Figure 3. Gradient ascent algorithm.

-- Gradient ascent algorithm.
--
-- PRE:
-- objfun - objective funciton to maximise (funciton).
-- var - variables, starting point vector (table).
-- step - step size (number).
--
-- POST:
-- Updates var with the max of objfun.
--
-- Loop while evaluation increases.

function gradient_ascent(objfun, var, step)
  local epsilon = 1e-4
  local feval = objfun(var)
  local grad = {}
  for i = 1, #var do
    table.insert(grad, 0)
  end
  local newvar = {}
  for i = 1, #var do
    table.insert(newvar, 0)
  end
  while (true) do
    local preveval = feval
    -- calc grad
    for i = 1, #var do
      local value = var[i]
      var[i] = value + epsilon
      local right = objfun(var)
      var[i] = value - epsilon
      local left = objfun(var)
      var[i] = value
      grad[i] = (right - left) / (2 * epsilon)
    end
    -- new var
    for i = 1, #var do
      newvar[i] = var[i] + step * grad[i]
    end
    -- eval
    feval = objfun(newvar)
    if (feval > preveval) then
      -- update var
      for i = 1, #var do
        var[i] = newvar[i]
      end
    else
      break
    end
  end
end

You must note, though, that this is a simple optimisation technique that works well if one single global extremum is present, but that more refined tweaks need to be applied if the objective function gets more complex. In the end, gradient ascent is complete but non-optimal. A typical and effective way to add robustness and speed up the process is to take as a time-dependent function (measured by the number of iterations in the loop) that settles the learning capability, pretty much like the simulated annealing algorithm. Other strategies take into account the momentum of the learning process in order to speed it up at times, in addition to avoiding local maxima. There is also the venerable Newton-Raphson algorithm, which makes use of the Hessian matrix (involving second derivatives) to optimise the learning path, although it can be somewhat tedious to run because it involves matrix inversion. This is all a rich and endless source of PhD topics.

Having a basic knowledge of such gradient-based methods, though, opens up a full range of possibilities, such as the use of Artificial Neural Networks (ANN). ANN’s are very fashion nowadays because of the disruption of deep learning. I know I’m being a little ahead of time with this technique, but it’s just so appealing to tackle so many problems that I can’t resist (note that an ANN is part of the A.I. Maker logo). In the next post, you’ll see how ANN’s can be easily learnt and deployed by directly applying gradient-based methods on complex functions. Stay tuned!

A closer look at problem state encoding in genetic algorithms through electronics

Encoding problem state information in a genetic algorithm (GA) is one of the most cumbersome aspects of its deployment. GA deals with binary vectors, so this is the format that you must comply to in order to find optimum solutions with this technique. And depending on which strategy you choose to do it, the solution of the search process may be different, and/or the GA algorithm may take more or less time (and/or space) to reach it. In order to display some of these issues, I will go through a complete example with an electronic circuit in this post. Specifically, the circuit under analysis will be a non-inverting amplifier built with an OP07 operational amplifier, see Figure 1. The goal of this discrete optimisation problem will be to find the optimum values of its two resistors Rf and Rg (over a finite set, so that they may be available in the store and the circuit be implementable) that maximise the symmetric excursion of a 1V peak input sinusoidal signal without clipping. Let’s synthesise this circuit with GA!

Figure 1. Schematic of a non-inverting amplifier. Source: Wikipedia.

Schematic of a non-inverting amplifier
Schematic of a non-inverting amplifier.

First step: encode each resistor value alone into a binary array. It is a good approach to represent the bits of an integer number, where the weight of each bit is directly related to a specific weight of the resistive spectrum, i.e., the most significant bits always correspond to high resistor values (and vice-versa), see Figure 2. I had also tried to follow the colour-based encoding that resistors display, i.e., like a mantissa and an exponent, but it’s been much of a failure, which I suspect is given by the non-linear relation between bit weight and resistance value. Note that you must still check that the represented number is within the accepted range of resistor values. Otherwise, discard it by assigning a bad fitness score so that it is likely to disappear during the evolution process. In this example, I’ll be using the 10% standard resistor values, from 10 ohm to 820k ohm (so, 60 different values in total, 6 bits).

Figure 2. Resistor value encoding and crossover.

Resistor value encoding and crossover
Resistor value encoding and crossover.

Next, encode the two resistor values in an array to create an initial population. To this end, append one resistor to the other to build an aggregate individual, like [Rf Rg], again see Figure 2. When having to combine them, the crossover point will keep the value of one resistor stable while evolving the other under improvement, which makes perfect sense.

Now, the output function of the non-inverting amplifier is given by:

and the fitness function used to evaluate the effectiveness of a solution is mainly driven by the inverse of the error of the symmetric excursion of (recall that greater scores are for better individuals):

Taking into account that with a power supply of 15V the OP07 saturates at about 12V, let’s fix the maximum symmetric excursion of the output to 20V (i.e., ) to avoid signal clipping. Having all that set, you’re ready to start optimising. Having a look at the fitness surface that the GA algorithm will deal with, see Figure 3, it shows that it will have a hard time dealing with so many extrema. Note that all maxima are located on a 9x slope line, which indicates that the solution must maintain a ratio of 9 between Rf and Rg. Also note that resistors placed on the lower end of the resistance range tend to be more stable and consume more power, while the ones placed on the higher end of the range draw less current but are more unstable. Since none of these considerations are included in the fitness function, the GA procedure may end on any of these local maxima, e.g, and , with a fitness score of .

Figure 3. Fitness surface plot.

Fitness surface plot
Fitness surface plot.

The performance of a brute-force exhaustive iterative search solution is taken for reference here, which has a computational complexity in time of , where is amount of possible resistor values (i.e., 60), and is the number of resistors in circuit (i.e., 2). This yields a search space of 3600 states. No matter how fancy GA’s may be, for such small search space, exhaustive linear search is much faster (it takes less than a second, while GA takes 5 secs). With 4 resistors, though, the search space is over 12M and the two approaches take about 10 secs. Over this bound, GA is able to perform better. Long are the days when engineers had to assemble the prototype circuits on a breadboard to optimise the numerical values of each of the components, see this video at 6:15.