Algorithmic Differentiation (AD) by operator overloading in C++

dco/c++ Userguide Chapter 1
Summary of Features

Contents

Introduction

We consider implementations of multivariate vector functions $$ f : D^{n} \times D^{n'} \rightarrow D^m \times D^{m'} : (\mathbf{y}, \mathbf{y}') = f(\mathbf{x},\mathbf{x}') $$ as computer programs over some base data type $D$ (for example, single or higher precision floating-point data, intervals, convex/concave relaxations, vectors/ensembles). (We assume that the arithmetic inside $f$ is completely defined (through overloading of the elemental functions; see below.) The $n$ active (also: independent) and $n'$ passive inputs are mapped onto $m$ active (also: dependent) and $m'$ passive outputs. The given implementation is assumed to be $k$ times continuously differentiable at all points of interest implying the existence and finiteness of the Jacobian $$ \nabla f = \nabla_\mathbf{x} f(\mathbf{x},\mathbf{x}') \equiv \frac{\partial \mathbf{y}}{\partial \mathbf{x}}(\mathbf{x},\mathbf{x}') \in D^{m \times n}, $$ the Hessian $$ \nabla^2 f = \nabla^2 f(\mathbf{x},\mathbf{x}') \equiv \frac{\partial^2 \mathbf{y}}{\partial \mathbf{x}^2}(\mathbf{x},\mathbf{x}') \in D^{m \times n \times n}, $$ if $k\geq2,$ and of potentially higher derivative tensors $$ \nabla^k f = \nabla^k f(\mathbf{x},\mathbf{x}') \equiv \frac{\partial^k \mathbf{y}}{\partial \mathbf{x}^k}(\mathbf{x},\mathbf{x}') \in D^{m \times \overset{k}{\underset{l=1}{\times}} n}. $$ We denote $$\nabla^k f=\left(\left [\nabla^k f \right ]_i^{j_1,\ldots,j_k} \right)_{i=0,\ldots,m-1}^{j_1,\ldots,j_k=0,\ldots,n-1},$$ and we use $*$ to denote the entire range of an index. For example, $\left [\nabla f \right ]_i^*$ denotes the $i$th row and $\left [\nabla f \right ]_*^j$ the $j$th column of the Jacobian, respectively. Algorithmic differentiation is implemented by overloading of elemental functions including the built-in functions and operators of C++ as well as user-defined higher-level elemental functions.

1 First-order Tangent Mode

1.1 Generic first-order scalar tangents

Generic first-order scalar tangent mode enables the computation of products of the Jacobian with vectors $\mathbf{x}^{(1)} \in D^n$ $$ \mathbf{y}^{(1)}=\nabla f \cdot \mathbf{x}^{(1)} \in D^m $$ through provision of a generic first-order scalar tangent data type over arbitrary base data types $D$.

1.2 Generic first-order vector tangents

Generic first-order vector tangent mode enables the computation of products of the Jacobian with matrices $X^{(1)} \in D^{n \times l}$ $$ Y^{(1)}=\nabla f \cdot X^{(1)} \in D^{m \times l} $$ through provision of a generic first-order vector tangent data type over arbitrary base data types $D.$

1.3 Preaccumulation through use of expression templates

Expression templates enable the generation of statically optimized gradient code at the level of individual assignments which can be beneficial for vector tangent mode.

2 First-order Adjoint Mode

2.1 Generic first-order scalar adjoints

Generic first-order scalar adjoint mode enables the computation of products of the transposed Jacobian with vectors $\mathbf{y}_{(1)} \in D^m$ $$ \mathbf{x}_{(1)}=\nabla f^T \cdot \mathbf{y}_{(1)} \in D^n $$ through provision of a generic first-order scalar adjoint data type over arbitrary base data types $D.$

2.2 Generic first-order vector adjoints

Generic first-order vector adjoint mode enables the computation of products of the transposed Jacobian with matrices $Y_{(1)} \in D^{m \times l}$ $$ X_{(1)}=\nabla f^T \cdot Y_{(1)} \in D^{n \times l} $$ through provision of a generic first-order scalar adjoint data type over arbitrary base data types $D.$

2.3 Global “blob” tape

Memory of the specified size is allocated and used for storing the tape without bound checks.

2.4 Global “chunk” tape

The tape grows in chunks up to the physical memory bound.

2.5 (Thread-)local tapes

Multiple “blob” or “chunk” tapes can be allocated, for example, to implement thread-safe adjoints.

2.6 Preaccumulation through use of expression templates

Expression templates enable the generation of statically optimized gradient code at the level of individual assignments. This preaccumulation can result in a decrease in tape memory requirement.

2.7 Repeated evaluation of tape

Tapes can be recorded at a given point and interpreted repeatedly.

2.8 Adjoint callback interface

The adjoint callback interface supports

2.9 User-defined local Jacobians interface

Externally preaccumulated partial derivatives can be inserted directly into the tape for use within subsequent interpretations.

3 Second- and Higher-order Tangent Mode

3.1 Generic second-order scalar tangents

Instantiation of the generic first-order scalar tangent data type with the generic first-order scalar tangent data type over a non-derivative base data type yields the second-order scalar tangent data type. It enables the computation of scalar projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its two domain dimensions of length $n$ as $$ \mathbf{y}^{(1,2)}=\langle \nabla^2 f,\mathbf{x}^{(1)},\mathbf{x}^{(2)}\rangle \equiv \left(\mathbf{x}^{{(1)}^T} \cdot \left [\nabla^2 f \right ]_i^{*,*} \cdot \mathbf{x}^{(2)} \right)_{i=0,\ldots,m-1} \in D^m $$ for $\mathbf{x}^{(1)}, \mathbf{x}^{(2)} \in D^n$ over arbitrary base data types $D$. The computational cost of accumulating the whole Hessian over $D$ becomes $O(n^2) \cdot \text{Cost}(f)$ with both $\mathbf{x}^{(1)}$ and $\mathbf{x}^{(2)}$ ranging independently over the Cartesian basis vectors in $D^n.$

3.2 Generic second-order vector tangents

Instantiation of the generic first-order vector tangent data type with the generic first-order vector tangent data type over a non-derivative base data type yields the second-order scalar tangent data type. It enables the computation of vector projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its two domain dimensions of length $n$ as $$ Y^{(1,2)}=\langle \nabla^2 f,X^{(1)},X^{(2)}\rangle \equiv \left(X^{{(1)}^T} \cdot \left [\nabla^2 f \right ]_i^{*,*} \cdot X^{(2)} \right)_{i=0,\ldots,m-1} \in D^{m \times l_1 \times l_2} $$ for $X^{(1)} \in D^{n \times l_1},$ $X^{(2)} \in D^{n \times l_2}$ over arbitrary base data types $D$. The computational cost of accumulating the whole Hessian over $D$ remains equal to $O(n^2) \cdot \text{Cost}(f)$ with both $X^{(1)}$ and $X^{(2)}$ set equal to the identities in $D^n.$

3.3 Other generic second-order tangents

Instantiation of the generic first-order vector tangent data type with the generic first-order scalar tangent data type over a non-derivative base data type yields a second-order tangent data type. Similarly, instantiation of the generic first-order scalar tangent data type with the generic first-order vector tangent data type over a non-derivative base data type yields a second-order tangent data type.

3.4 Generic third- and higher-order tangents

Instantiation of tangent types with $k$th-order tangent types yields $(k+1)$th-order tangent types.

4 Second- and Higher-order Adjoint Mode

4.1 Generic second-order scalar adjoints

Instantiation of the generic first-order scalar adjoint data type with the generic first-order scalar tangent data type over a non-derivative base data type yields a second-order scalar adjoint data type. It enables the computation of scalar projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its image and one of its domain dimensions as $$ \mathbf{x}_{(1)}^{(2)}=\langle \mathbf{y}_{(1)},\nabla^2 f,\mathbf{x}^{(2)}\rangle \equiv \left(\mathbf{y}_{(1)}^T \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot \mathbf{x}^{(2)} \right)_{j=0,\ldots,n-1} \in D^n $$ for $\mathbf{y}_{(1)} \in D^m$ and $\mathbf{x}^{(2)} \in D^n$ over arbitrary base data types $D$. The computational cost of accumulating the whole Hessian over $D$ becomes $O(m\cdot n) \cdot \text{Cost}(f)$ with $\mathbf{y}_{(1)}$ and $\mathbf{x}^{(2)}$ ranging over the Cartesian basis vectors in $D^m$ and $D^n$, respectively.

4.2 Generic second-order vector adjoints

Instantiation of the generic first-order vector adjoint data type with the generic first-order vector tangent data type over a non-derivative base data type yields a second-order vector adjoint data type. It enables the computation of vector projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its image and one of its domain dimensions as $$ X_{(1)}^{(2)}=\langle Y_{(1)},\nabla^2 f,X^{(2)}\rangle \equiv \left(Y_{(1)}^T \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot X^{(2)} \right)_{j=0,\ldots,n-1} \in D^{l_1 \times n \times l_2} $$ for $Y_{(1)} \in D^{m \times l_1}$ and $X^{(2)} \in D^{n \times l_2}$ over arbitrary base data types $D$. The computational cost of accumulating the whole Hessian over $D$ remains equal to $O(m\cdot n) \cdot \text{Cost}(f)$ with $Y_{(1)}$ and $X^{(2)}$ set equal to the identities in $D^m$ and $D^n$, respectively.

4.3 Other generic second-order adjoints

Instantiation of the generic first-order vector adjoint data type with the generic first-order scalar tangent data type over a non-derivative base data type yields a second-order adjoint data type. Similarly, instantiation of the generic first-order scalar adjoint data type with the generic first-order vector tangent data type over a non-derivative base data type yields a second-order adjoint data type.

Symmetry of the Hessian in its two domain dimensions yields the following additional second-order adjoint data types: instantiation of a generic first-order tangent data type with a generic first-order adjoint data type over a non-derivative base data type yields a second-order adjoint data type for computing $$ \langle \mathbf{y}_{(2)}^{(1)},\nabla^2 f,\mathbf{x}^{(1)}\rangle \equiv \left(\mathbf{y}_{(2)}^{(1)^T} \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot \mathbf{x}^{(1)} \right)_{j=0,\ldots,n-1} \in D^n. $$ Similarly, instantiation of a generic first-order adjoint data type with a generic first-order adjoint data type over a non-derivative base data type yields a second-order adjoint data type for computing $$ \langle \mathbf{x}_{(1,2)},\mathbf{y}_{(1)},\nabla^2 f\rangle \equiv \left(\mathbf{y}_{(1)}^T \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot \mathbf{x}_{(1,2)} \right)_{j=0,\ldots,n-1} \in D^n. $$

4.4 Generic third- and higher-order adjoints

Instantiation of tangent types with $k$th-order adjoint types yields $(k+1)$th-order adjoint types. Similarly, instantiation of adjoint types with $k$th-order tangent or adjoint types yields $(k+1)$th-order adjoint types.