# Algorithmic Differentiation Tools & Adjoint Solvers

Algorithmic Differentiation (AD) is a novel way to differentiate computer code, i.e. to compute *∂F*/*∂s*_{1}, *∂F*/*∂x*_{2},... where *F* is given by a computer program, e.g.

if(x1<x2) then F = x1*x1 + x2*x2 + exp(x3*x3) + ... else F = x1 + x2 + x3 + ... endif

The derivatives are mathematically exact and computed to machine precision. Traditionally, derivative estimates would be obtained by finite differences (also called "bump and revalue"), which is slow and can be inaccurate. In contrast, AD (and adjoint AD in particular - AAD) gives precise derivative information at a fraction of the cost. NAG and expert partners provide a variety of services and software for Algorithmic Differentiation including consulting, training courses, software tools and the embedding of these in client applications plus a compiler and algorithms for AD.

Here are some examples of what adjoint AD has enabled in various domains.

**Engineering**

Figure 1: Sensitivity of drag coefficient to the surface geometry of a car at high speed (left) and low speed (right)

AD enables sensitivity analyses of huge simulations, enabling shape optimization, intelligent design and and comprehensive risk studies.

Figure 1 shows sensitivities of the drag coefficient to each point on a car's surface when it moves at high speed (left) and low speed (right). The simulation was performed with AD-enabled OpenFOAM built on top of the AD tool dco (See below for information about dco). The surface mesh had 5.5 million cells and the gradient vector was 18GB. (1)

The normal simulation on a top-end desktop took 44s, while the AD-enabled simulation took 273s. To obtain the same gradient information, on the same machine, by finite differences would take roughly 5 years.

The gradient information can now be used to optimize the shape of the car so as to reduce the drag.

(1) Towara M and Naumann U (2013). A discrete adjoint model for OpenFOAM. *Procedia Comp. Sci.* Volume 18.

**Scientific Modelling**

Figure 2: MITgcm sensitivities of zonal ocean water flow through the Drake Passage to changes in bottom topography.

AD helps our understanding of climate change and improves weather predictions.

Figure 2 shows the sensitivity of the amount of water flowing through the Drake passage to changes in the topography of the ocean floor. The simulation was performed with the AD-enabled MIT Global Circulation Model (MITgcm) run on a supercomputer. The ocean was meshed with 64,800 grid points. (2)

Obtaining the gradient through finite differences took a month and a half. The adjoint AD code obtained the gradient in less than 10 minutes.

The gradient information can be used to further refine climate prediction models and our understanding of global weather, for example the high sensitivity over the South Pacific Ridge and Indonesian Throughflows even though these are far away from the Drake Passage.

(2) Utke J, Naumann U, Wunsch C, Hill C, Heimbach P, Fagan M, Tallent N and Strout M. (2008). OpenAD/F: A modular, open-source tool for automatic differentiation of Fortran codes. *ACM Trans. Math. Softw*, **34(4)** 18:1-18:36.

**Finance**

AD and AAD is used in finance to get sensitivities of complex instruments quickly, enabling real time risk management and hedging of quantities like xVA.

Here we show some results from a paper which studied two typical codes arising in finance: Monte Carlo and PDEs. (3)

Monte Carlo | |||||
---|---|---|---|---|---|

n |
f |
cfd | AD | AD/f |
cfd/AD |

34 | 0.5s | 29.0s | 3.0s (2.5MB) | 6.0x | 9.7x |

62 | 0.7s | 80.9s | 5.1s (3.2MB) | 7.3x | 15.9x |

142 | 1.5s | 423.5s | 12.4s (5.1MB) | 8.3x | 34.2x |

222 | 2.3s | 1010.7s | 24.4s (7.1MB) | 10.6x | 41.4x |

PDE | |||||

34 | 0.6s | 37.7s | 11.6s (535MB) | 19.3x | 3.3x |

62 | 1.0s | 119.5s | 18.7s (919MB) | 18.7x | 6.4x |

142 | 2.6s | 741.2s | 39s (2GB) | 15.0x | 19x |

222 | 4.1s | 1857.3s | 60s (3GB) | 14.6x | 31x |

Table 1: Run times and memory requirements as a function of gradient size *n* for Monte Carlo and PDE applications.

Table 1 shows the runtimes of a first-order adjoint code using dco vs. central finite differences on a typical finance application (option pricing under local volatility model, 10K sample paths/spatial points and 360 time steps). The second column *f* is the normal runtime of the application, cfd is the runtime for central finite differences and AD is adjoint AD runtime along with additional memory required (tape size). Calculations were run on a laptop so only the relative runtimes AD/*f* and cfd/AD are important, the latter showing the speedup of AD over finite differences.

In finance such derivative information is often used for hedging and risk calculations, so these gradients must be computed many times per day.

(3) du Toit J and Naumann U (2014). Adjoint algorithmic differentiation tool support for typical numerical patterns in computational finance. *NAG Technical Report TR3/14*.

AD computes mathematical derivatives of a computer program. Conceptually the process is straightforward:

- Computers can only add, subtract, multiply and divide floating point numbers
- Any computer program (operating on floating point numbers) simply consists of many of these fundamental operations strung together, along with calls to special functions (e.g. sin, cos, exp, etc.)
- It is simple to compute the derivatives of all these basic operations
- We can use the chain rule to combine these basic derivatives to get the derivative of our computer program

This process of differentiating a code can be applied recursively to generate 2nd, 3rd, 4th or higher derivatives of any computer program. Since the code is differentiated symbolically (albeit at a very low level), the derivatives are mathematically exact and are computed to machine accuracy.

Traditionally derivative estimates are computed through finite differences (also called “bumping”). Compared with these, AD offers three clear advantages:

- Finite differences can be tricky to get right: some knowledge of the function is often required
- Higher order finite differences can be arbitrarily bad: AD will give exact gradients of any order
- Adjoint AD (see here for more details) can be orders of magnitude faster than bumping