From PDEs to gradient flows for deterministic particle dynamics

Let us consider PDEs that describe the evolution of a density \(\rho_t(x)\) that evolves in time, with \(x\in \mathcal{R}^D\). We want to describe the temporal evolution of the density \(\rho_t(x)\), e.g., a density of particles at location \(x\).

One fundamental equation for this is the continuity equation, which prescribes how the the density \(\rho_t(x)\) evolves in time according to laws of mass conservation. In particular, the continuity equation expresses the conservation of mass by expressing that the time derivative of the density \(\rho_t(x)\) plus the divergence of the product of a velocity field \(v(x,t)\) and the density must vanish \[\begin{equation} \partial_t \rho_t(x) + \nabla \cdot \left( v(x,t) \rho_t(x) \right) = 0, \end{equation}\] given some initial condition \(\rho_0(x) = \rho^0(x)\). The velocity field in this equation prescribes a spatial transformation of the density as time evolves, i.e., how a density of particles starting from \(\rho^0(x)\) moves according to the velocity field \(v(x,t)\).

The continuity equation admits a useful discretisation in terms of particles. We can define an associated ordinary differential equation for their evolving positions. In fact, we can consider the evolution of \(N\) particles in the Euclidean space, evolving according to the ODE \[\begin{equation} \frac{dX_i(t)}{dt} = v(X_i(t),t) , \end{equation}\] given some initial conditions \(X_i(0)\).

There is a close correspondence between this system of ODEs (the particles) and the solutions of the PDE. In particular, if the velocity field is sufficiently nice (globally Lipschitz in space), then as long as the iniital conditions of the particles are drawn from the density representing the initial condition of the PDE, i.e. if we take the empirical measure at each of the particle locations, i.e., \[\begin{equation} \hat{\rho}_0 = \frac{1}{N} \sum^N_{i=1} \delta(x-X_i(0)) \xrightarrow{N \rightarrow \infty} \rho_0(x), \end{equation}\] then the evolving locations of the Dirac masses (particles) converges according to the Wasserstein metric to the continuum solution of the PDE, \[\begin{equation} \hat{\rho}_t = \frac{1}{N} \sum^N_{i=1} \delta(x-X_i(t)) \xrightarrow{N \rightarrow \infty} \rho_t(x). \end{equation}\]

Because of this close connection between the PDEs and the particle representation, a lot of PDEs have a natural discretisation in terms of particles. The PDE conserves mass, i.e. whatever the integral of the initial condition is, that integral will be preserved over time. And the particle representation preserves positivity.

In general this equation is a Wasserstein gradient flow for an arbitrary velocity field.

The difference between the transient empirical solution \(\hat{\rho}_t\) and the exact solution \(\rho_t\) within the time interval \(t \in[0,T]\) will be bounded by a constant weighted by the initial distance of the initial condition

\[\begin{equation} \mathcal{W}_2(\hat{\rho}_t,\rho_t) \leq C_{T,\| \nabla v\|_{\infty}} \mathcal{W}_2(\hat{\rho}_0, \rho_0). \end{equation}\]

Fokker Planck equations as gradient flows

An equation that is a Wasserstein gradient flow and has attracted a lot of interest in the last years is the Fokker-Planck equation. It describes the evolution of a density according to a drift term \[\begin{equation} \partial_t \rho_t(x) = \nabla \cdot \left( \nabla V \rho_t(x)\right) + \nabla \nabla \rho_t(x). \end{equation}\] This equation has an associated particle method, a stochastic one, because we have the diffusion term present \[\begin{equation} dX_t = -V(X_t)dt + \sqrt{2} dW_t. \end{equation}\] The empirical measure represented in terms of particles will converge almost surely to the solution of the PDE.

However, this PDE has a corresponding particle discretisation, where each particle evolves according to an Ordinary differential equation

[ prof flow ode ]

Since the density interacts with itself, the particles interact with each other through the second term.

This equation has a Wasserstein gradient flow structure. It is the gradient flow of the following energy \[\begin{equation} \mathcal{E}(\rho) = \int V(x) \rho(x) dx + \int \rho(x) \log \rho(x) dx \end{equation}\] it has an external potential term, and the second term is the negative entropy. The particles are going at a direction negative of the gradient of the energy landscape for conservative systems, where the potential is small, to make the energy smaller. The diffusion term forces the density to spread out, so this term makes the entropy bigger.

Fokker-Planck equation can be viewed as a gradient flow. The central point of this idea is to define a manifold on which the Fokker-Planck system is a dynamical system on the manifold and evolving according to its gradient.

By understanding the convexity properties of the function \(V(x)\) that represents the potential, can inform us about convexity properties of the energy landscape, and from there we can recover properties of the Fokker-Planck equation, i.e. contraction of solutions, exponential convergence to the equilibrium, etc.

The particle solution is not a Wasserstein gradient flow of this energy. The reason for this is that I could write the PDE as a continuity type of equation \[\begin{equation} \partial_t \rho_t(x) = \nabla \cdot \left[ \underbrace{( \nabla V + \frac{\nabla \rho}{\rho})}_{\text{velocity field:} v(x,t)} \rho \right] \end{equation}\] But this is a weird velocity field and for a general density the particle method will not be well defined. Due to the diffusion the instantaneous Dirac masses will not remain Dirac masses.

The Wasserstein space is the space of probability measures on \(\mathcal{R}^N\) with the metric induced by the Wasserstein distance.

The seminal work of Jordan-Kinderlehrer-Otto~ established the view of the Fokker–Planck equation as a gradient flow of the Kullback Leibler divergence functional on a probability space equipped with a Wasserstein metric. The solution of the Fokker–Planck equation with drift forces arising as a gradient of a potential \(V(x)\), i.e., \(f(x)=\nabla V(x)\) was identified as the gradient flow of the free energy with respect to the Wasserstein metric. For a gradient system, the free energy difference between two states \(\delta F\) amounts to the negative entropy production times the temperature \(\delta F = - \mathcal{T} \mathcal{S}\), where \(\mathcal{S}\) stands for the entropy production. Thus this formulation may be viewed as a maximum entropy principle for the Fokker Planck.

The Fokker-Planck equation can be viewed as as the gradient flow in the Wasserstein metric of the relative entropy functional \[\begin{equation} S(\rho) = \int_{\mathcal{R}^d} \rho(x) \log\left( \frac{\rho(x)}{e^{-V(x)}} \right)dx. \end{equation}\]

However, a major computational challenge of this known as JKO scheme is how to computationally efficiently compute the optimal transport cost.

The optimal transport yields geodesics in the Wasserstein space [cite Villani old and new]. The evolution of the probability density described by the Fokker–Planck equation amounts to the a curve in the Wasserstein space, the actual length of this curve is identified as the distance between the initial and the terminal points if this curve is a geodesic. All other curves that connect the two measures have larger dissipation. Optimal transport protocols correspond to geodesics in the Wasserstein space and can be employed in an equivalent definition of curvature.