Saturday, 20 February 2021

Thom's Gradient Conjecture for Parabolic Systems and the Yang-Mills Flow

Thom's gradient conjecture, proved in this paper, asserts that convergent gradient flows of analytic functions on $\mathbb{R^n}$ cannot spiral forever. More precisely, the projection of the flow onto the unit sphere must converge.

In my paper linked below, I show that this result holds also for gradient flows of analytic functions on infinite dimensional Hilbert spaces, provided that the second derivative is a Fredholm operator. This is similar in spirit to the extension by L. Simon of the Lojasiewicz inequality to the same domain. I also show that the result holds for geometric flows with a Gauge symmetry, such as the Yang-Mills flow.

Thom's Gradient Conjecture for Parabolic Systems and the Yang-Mills Flow

An infinite dimensional curve selection lemma

Let $X \subset V$ be a semianalytic set with $0 \in \overline X$, i.e. there exists a sequence $x_n \in X$ with $x_n \to 0$. If we allow $V$ to be a finite dimensional Hilbert space for a moment, the curve selection lemma tells us that there exists an analytic curve $\gamma(t):[0,\varepsilon) \to V$ with $\gamma(0)=0$ and $\gamma([0,\varepsilon)) \subseteq X$.

Since the curve selection lemma is central to many proofs concerning semianalytic sets on finite dimensional spaces, it's interesting to consider when a similar result might hold for sets defined through inequalities involving analytic functions on infinite dimensional spaces.

The curve selection lemma often functions as a kind of compactness result that allows us to restrict attention to a one-dimensional curve. Like the Lojasiewicz inequality, it won't hold in general in infinite dimensions and this failure can be linked to the non-compactness of the unit sphere. For example, suppose we have $\mathcal{E}(u) = \|u\|^3 - c(u)\|u\|^2$. For an orthonormal basis $\{e_i\}$, we can arrange that the coefficient $c(e_i) \to 0$ as $i \to \infty$, as we cycle through the infinite number of dimensions available. Thus, the set $\{ \mathcal{E}(u) > 0 \}$ contains a sequence approaching the origin but contains no analytic curve emanating from the origin.

First, note that the desired curve exists if and only if there exists at least one sequence $x_n \to 0$ with $x_n \in N \subset X$, where $N$ is a finite dimensional analytic manifold, since the ordinary curve selection lemma can then be applied.

We consider now the special case of a function with a Hessian that is elliptic.

Let $V$ be a Hilbert space and let $U \subseteq V$ be an open subset. Let $\mathcal{E} \in C^2(U)$ be an analytic function and assume the $0 \in U$ is a critical point, i.e. $\mathcal{E}'(0) = 0$. We suppose that $\mathcal{E}''(0)$ is a Fredholm operator, that is, it has finite-dimensional kernel and cokernel, and closed range. We also assume for convenience that $\mathcal{E}(0) = 0$. We define the set $W^\varepsilon = \{u:\mathcal{E}(u)\neq 0, \varepsilon\|\mathcal{E}_\theta\| \leq |\mathcal{E}_r| \}$.

Let $P$ be the orthogonal projection onto $\ker \mathcal{E}''(0)$ and $P'$ the adjoint projection. We define the finite dimensional analytic manifold \[ S = \{u \in U| (I-P')\mathcal{E}'(u)=0 \}, \] and denote by $Q$ the nonlinear projection onto $S$ (see [1] for details). We have the following Taylor series. \[ \mathcal{E}(u) = \mathcal{E}(Qu) + \frac{1}{2}\langle \mathcal{E}''(Qu)(u-Qu),u-Qu \rangle + o(\|u-Qu\|^3). \] $\bf{Lemma}$ Define the set $K \subseteq U$ by $K = \mathcal{E}(u) + \mathcal{H}(u) \ \sigma \ 0$ where $\sigma \in \{<,\leq,>,\geq\}$, and $\mathcal{H}$ is an analytic function consisting only of terms of order 3 and higher. Suppose $0 \in Cl(K \cap W^\varepsilon)$. Then there exists an analytic curve $\gamma(t):[0,\varepsilon) \to K \cap W^\varepsilon$ with $\gamma(0)=0$.

$\bf{Proof}$ Since $\mathcal{H}$ consists only of higher order terms which can be incorporated into the higher order terms of $\mathcal{E}$, we assume that $\mathcal{H}=0$. We also assume for readability that $\sigma$ is $>$, since the other cases are analogous. We have \begin{align*} K & = \{u \in U | \mathcal{E}(Qu) + (\mathcal{E}(u) - \mathcal{E}(Qu)) > 0\} \\ & = \{u \in U | \mathcal{E}(Qu) + \frac{1}{2}\langle \mathcal{E}''(Qu)(u-Qu),u-Qu \rangle + o(3) > 0\}. \end{align*} From 12.15 of [1], we know that $\|(I-P')\mathcal{E}'\| \geq c||u-Qu||$. Then from the triangle inequality and the definition of $W^\varepsilon$, we know that \[ |\mathcal{E}_r| \geq c||\mathcal{E}'|| \geq c||u-Qu|| \;\; (*). \] We can assume that $\mathcal{E}(Qu) \leq 0$ in a neighbourhood of $0$, since otherwise we can apply the usual curve selection lemma to the finite dimensional manifold $S$. We can write the quadratic term as \[ \frac{1}{2}\langle \mathcal{E}''(Qu)\hat{u},\hat{u} \rangle ||u-Qu||^2, \] where $\hat{u} = (u - Qu) / \| u - Qu \|$. If there exists $u_0$ such that the quadratic term is positive, then it is trivial to find the required curve. Thus, we may assume that \[ \frac{1}{2}\langle \mathcal{E}''(Qu)\hat{u},\hat{u} \rangle \leq 0 \] in $W^\varepsilon$. By assumption there exists a sequence $u_n \in K \cap W^\varepsilon$ with $u_n \to 0$. Since $\mathcal{E}(u_n) > 0$, The only remaining case is \[ \frac{1}{2}\langle \mathcal{E}''(Qu_n)\hat{u}_n,\hat{u}_n \rangle \to 0. \] Since the derivative must grow linearly along $V_1$, this can only happen if the radial component of $\mathcal{E}''(Qu_n)(u_n-Qu_n)$ is going to zero. This however violates $(*)$, since we are inside $W^\varepsilon$.

We remark that unlike in the finite dimensional case a curve selection lemma will not hold for the set $S$ outside of $W^\varepsilon$, as the Hessian cannot control the behaviour of the higher order terms where the linear growth in the derivative has no radial component. However, a curve selection lemma may hold for other expressions such as those involving the derivative $\mathcal{E}'$.

[1] Chill, R., Fasangova, E., Gradient Systems

Saturday, 6 February 2021

The Lojasiewicz inequality for non-analytic functions

A function $f:\mathbb{R}^n \to \mathbb{R}$ satisfies a Lojasiewicz inequality at $0$ if in a neighbourhood of $0$ we have \[ |\nabla f| \geq c|f|^\rho, \] for some $c>0$ and $\rho \in [\frac{1}{2},1)$. It is well-known that the Lojasiewicz inequality holds for analytic functions. While analyticity is sufficient for the Lojasiewicz inequality to hold, it is not necessary. Trivial examples like $f(x) = x^2 + e^{1/x}$ demonstrate this. What then is an appropriate weaker condition?

A function $f:\mathbb{R}^n \to \mathbb{R}$ is analytic at $0$ if it is locally equal to its Taylor series $T(x)$, i.e., $f(x)=T(x)$. For a non-analytic function let's write \[ f(x) = T(x) + \omega(x), \] where $\omega$ has a Taylor series which is identically zero. In other words, $\omega$ is the "non-analytic" part of the function. For the Lojasiewicz inequality to hold, $\omega$ need not be zero, and it is in fact only necessary that $\omega$ is dominated by the function's Taylor series in a certain sense.

To see this, observe that if the Lojasiewicz inequality does not hold, then for any sequences $c_n \to 0$ and $\rho_n \to 1$, we can find a sequence $x_n \to 0$ such that \[ |\nabla f(x_n)| < c_n|f(x_n)|^{\rho_n}. \] We can choose the sequence $x_n$ to converge to $0$ as fast as we like.

Let $\mathcal{C}$ be the set of smooth curves emanating from $0$, parameterised by arc length. Consider the sets \[ \mathcal{C}_{a,k}^\varepsilon = \{\gamma \in \mathcal{C}; |\nabla f(\gamma(t))| \geq at^k \; \forall \; t \in [0,\varepsilon) \}, \] \[ X_{a,k}^\varepsilon = \cup_{\gamma \in \mathcal{C}_{a,k}^\varepsilon} \gamma([0,\varepsilon)). \] Clearly the Lojasiewicz inequality holds inside any such set $X_{a,k}^\varepsilon$, even for a function which is not analytic. Thus the sequence $x_n$ is eventually outside $X_{a,k}^\varepsilon$ for any $k \in N$ arbitrarily large and any $a,\varepsilon$ arbitrarily small. Intuitively, we might guess that the sequence $x_n$ is (in some approriate sense) asymptoting to the analytic variety \[ V = \{x: \nabla T = 0\}.

\] If the sequence $x_n$ lies on an analytic curve through the origin, then on that curve we must have $\nabla T = 0$. From a previous post, we can arrange that the sequence $x_n$ is asymptoting to some analytic curve $\gamma$ faster than any given polynomial in $r$. On this curve we must have $\nabla T = 0$. It's natural to conjecture the following:

$\bf{Conjecture:}$ A non-analytic function $f = T + \omega:\mathbb{R}^n \to \mathbb{R}$ satisfies the Lojasiewicz inequality if and only if it satisfies the Lojasiewicz inequality on the set $\nabla T = 0$ (in a neighbourhood of 0).

To begin to prove this conjecture, note that the analytic variety $V$ admits a Whitney stratification into a finite number of analytic manifolds at $0$. Thus, we can assume that our sequence $x_n$ is asymptoting to an analytic manifold $M$ faster than any polynomial in $r$. As for a function, a curve can be written as the sum of an analytic part and a non-analytic part. Let $\lambda(t) = \gamma(t) + \bar \gamma(t)$, where $\gamma$ is an analytic curve inside $V$ and $\bar \gamma$ is a curve with Taylor series in $t$ identically zero. By assumption, the Lojasiwicz inequality holds on $\gamma$. If we can show that the Lojasiwicz inequality also holds on all curves that deviate from $V$ by smaller than polynomial terms, we are done.

Saturday, 24 October 2020

A subsequence converging to an analytic curve

This is a question that confronted me while I was trying to find a proof of a different result. Consider a sequence $x_n \in \mathbb{R}^m$ with $x_n \to 0$. Under what circumstances does a subsequence converge to an analytic curve $\gamma:[0,\varepsilon) \to \mathbb{R}^m$ with $\gamma(0)=0$? Let me make this notion precise: the sequence must converge according to all derivatives, that is, faster than any power of $r = \|x\|$. It should be noted that the eventual subsequence we extract need not lie on the curve, because of the existence of nonanalytic functions which converge more quickly than any polynomial. By the compactness of $S^{m-1}$ we can pass to a subsequence such that $s_1^n = \|x_n\|$ is converging to some point $s_1 \in S^{m-1}$. So we begin to construct a curve by starting with \[ \gamma(t) = tv_1, \] where $v_1=s_1$. Next, let $s^2_n$ be the intersection of $\gamma(t) = v_1t +v_2^nt^2$ with $S^{n-1}$, where the vector $v_2^n$ is chosen such that $\gamma(t)$ contains $x_n$. Passing to a subsequence, we have $s^2_n \to s_2$. We then have the curve \[ \gamma(t) = v_1t +v_2t^2, \] where $v_2$ is chosen so that the curve intersects $S^{m-1}$ at $s_2$. We assume for the moment that such a $v_2$ exists, and examine this assumption shortly. Note that the distance from $x_n$ to the curve is bounded by $ct_n^2$, where $\gamma(t_n)$ is the point of the curve closest to $x_n$, and since $s^2_n \to s_2$, $c$ can be made as small as we like by truncating the sequence. Iterating this process, we arrive at a curve \[ \gamma(t) = v_1t +v_2t^2 + \ldots + v_kt^k, \] and a subsequence $x_n$ such that $d(x_n,\gamma(t_n)) \leq ct_n^k \leq cr_n^k$, where $r_n =||x_n||$. Now we return to the question of whether the vector $v_2$ (and $v_3,\ldots,v_k$) actually exists. It can happen that as $s^2_n \to s_2$, $v_2^n$ becomes unbounded and consequently $v_2$ doesn't exist. This means that the sequence is converging to $v_1$ slower than $t^2$. This corresponds to the case where the curve must be written as a Puiseux series, rather than a Taylor series. In this case, we multiply the Taylor series by $t$, i.e. we consider \[ \gamma(t) = v_1t^2 +v_2t^3 + \ldots + v_kt^{k+1}, \] and again try to construct $v_2$. It could happen that $v_2 = 0$, in which case we attempt to construct $v_3$, continuing to multiply by $t$ whenever a vector fails to exist. Eventually we will obtain the first two non-zero terms of our curve \[ \gamma(t) = v_1t^{1+l} + v_jt^{j+l} \] for some $j \geq 2$. Otherwise, all subsequences of the sequence $x_n$ must be asymptoting to $v_1$ slower than any rational power $\rho$ of $r$, $\rho > 1$, $\rho \to 1$, which isn't possible (this is in contrast to a sequence asymptoting faster than any rational power, which is possible due to the existence of non-analytic functions). From here, we can construct all remaining terms using our original process, i.e. all remaining vectors $v_i$ will exist. Thus, after multiplying by $t$ enough times, we will eventually be able to construct a curve \[ \gamma(t) = v_1t^{1+l} +v_2t^{2+l} + \ldots + v_kt^{k}, \] which satisfies our requirements.

Thursday, 10 September 2015

The two envelopes problem

This post concerns the two envelopes problem.

To summarise, suppose we know that one envelope contains some money, and another envelope contains twice that amount of money. However, we do not know which is which. We choose one of the envelopes, which contains an unknown amount $x$. There is a 50% chance that this is the larger envelope, and a 50% chance it is the smaller. Thus it would seem that the expected value for the other envelope should be $\frac{1}{2}(2x+\frac{1}{2}x)>x$, so that the other envelope is always the larger one. But since this argument would apply equally well to each envelope, it is obviously incorrect.

The problem is a little tricky, but the error with the argument is clear once you spot it.

The mistake is that one cannot talk about expectation value in the absence of a prescribed probability distribution. Suppose someone puts some money in an envelope. What is the expected value for the amount of money in the envelope? It's clearly a nonsense question. Assuming every positive amount has equal probability, then the expected value would seem to be $\infty / 2$. Similarly, suppose someone puts some money in one envelope, and twice that amount in another envelope. What is the expected value of the amount of money in either envelope? Again, no reasonable answer can be given. Therefore, having supposed that the first envelope contains x, the probability distribution for the second envelope is simply unknown. The only thing we can say about it is that values other than x/2 and 2x are not possible.

While it is true that the other envelope must be either x/2 or 2x, it is not true that each of these must be equally likely. In fact, if the person preparing the envelopes only ever chooses from one of two values, the possible values for the envelopes span only a factor of two, so x/2 and 2x cannot both be possible. Furthermore, the probability distribution for the amount contained in the first envelope and the probability distribution for whether it is the larger or the smaller of the envelopes, are not independent random variables. Although it is correct to say that $x$ has a 50% chance of being the smaller value, and a 50% chance of being the larger value, $x$ may be a different value in each case!

In summary, one must begin with the (not independent) probability distributions for the two envelopes, before being able to talk about expectation values.

Tuesday, 25 August 2015

When does a smooth function fail to be analytic?

It's well known that not all smooth functions $f(x)$ are analytic, i.e. can be locally represented by a power series \[ f(x) = f(0) + f'(0)x + \frac{1}{2!}f''(x)x^2 + \frac{1}{3!}f^{(3)}(x)x^3 + \ldots \] The counterexample typically given is the function \[ f(x) = \left\{ \begin{array}{lr} 0 & : x \leq 0 \\ e^{-1/x} & : x > 0 \end{array} \right. \] However, I thought it would be worthwhile to make explicit why the failure occurs.

Suppose we are interested in the interval $[0,\varepsilon]$. An $n$th-order polynomial is a function whose $n$th derivative is constant. So it stands to reason that as long as a function's $n$th derivative doesn't change "too much" over this interval, the function should be well approximated by the first $n$ terms of its Taylor series.

Let's write \[ f^{n}(x) = f^{n}(0) + E_{n}(x), \] where $E_{n}(x)$ is the error function representing how much $f^{(n)}$ differs from constant. Then we can integrate to get \[ f^{(n-1)}(x) = f^{(n-1)}(0) + f^{(n)}(0)x + \int_0^x{E_n(t)dt}, \] where the error in the $(n-1)$th derivative is \[ \left| E_{n-1}(x) \right| = \left| \int_0^x{E(t)dt} \right| \leq \sup_{t \in [0,\varepsilon]}{\left| E_n(t) \right|}x. \] Iterating this process, we eventually find that the original function differs from it's $n$th-order Taylor series by \[ \left| E_0(x) \right| \leq \frac{1}{n!} \sup_{t \in [0,\varepsilon]}{\left| E_n(t) \right|}x^n. \] For the function to be analytic, this expression needs to converge to zero as $n \to \infty$. Due to the $n!$ denominator, this can only fail if $E_n$, and hence the $n$th derivative, is growing extremely quickly as n increases. In this sense, non-analytic functions are pathological.

We can observe this pathological behaviour for our counterexample by plotting a few of the derivatives. While the 3rd derivative (left) shoots up to the value of 30 within .15 of the origin, the 10th derivative (right) is already order $10^{14}$ approximately $.04$ from the origin!

Wednesday, 17 December 2014

Random walks on surfaces in Python

Here I describe a procedure for computing random walks on surfaces (that is, manifolds of dimension two). The procedure can be easily generalised to higher dimensions, but surfaces are more commonly encountered in practice.

Let $\vec x(u,v) = (x(u,v),y(u,v),z(u,v))$ represent a parameterised surface, and $(u_0,v_0)$ the coordinates of the initial position of a diffusing particle. We illustrate with the example of a torus of major radius $R$ and minor raidus $r$, which can be parameterised by \[ \vec x = [\cos{u}(R+r\cos{v}),\sin{u}(R+r\cos{v}),r\sin{v}]. \] Of course, we cannot simply perform a random walk in the coordinate space $(u,v)$. Not only would this depend on the choice of parameterisation, but in general there exists no parameterisation for which a random walk in the coordinate space coincides with a random walk on the actual surface.

Instead, in order to perform a random walk on a surface, we need to find a pair of vectors $e_1(u,v), e_2(u,v)$ in the coordinate space which represent orthonormal directions on the surface. To do this, we begin by computing the Riemannian metric, which is defined by the following matrix: \[{g_{ij}} = \left[ {\begin{array}{*{20}{c}} {{g_{11}}}&{{g_{12}}}\\ {{g_{21}}}&{{g_{22}}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\frac{{\partial \vec x}}{{\partial u}} \cdot \frac{{\partial \vec x}}{{\partial u}}}&{\frac{{\partial \vec x}}{{\partial u}} \cdot \frac{{\partial \vec x}}{{\partial v}}}\\ {\frac{{\partial \vec x}}{{\partial v}} \cdot \frac{{\partial \vec x}}{{\partial u}}}&{\frac{{\partial \vec x}}{{\partial v}} \cdot \frac{{\partial \vec x}}{{\partial v}}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {{{(R + r\cos{v})}^2}}&0\\ 0&{{r^2}} \end{array}} \right].\] Using the Gramm-Schmidt process, we can take our vectors to be \begin{align*} e_1 & = \frac{1}{\sqrt{g_{11}}}(1,0),\\ e_2 & = \frac{1}{\sqrt{g}\sqrt{g_{11}}}(-g_{21},g_{11}), \end{align*} where $g$ is the determinant of the above matrix. In the specific case of the torus, this works out to be \begin{align*} e_1 & = ((R+r\cos{v})^{-1},0), \\ e_2 & = (0,r^{-1}). \end{align*} Now, all we have to do to choose at random a vector $e_0$ from the four vectors ${\pm e_1, \pm e_2}$, and the new coordinates of the particle will be given by \[ (u_1,v_1) = (u_0,v_0) + \delta e_0, \] where $\delta$ is a small number representing the step size.

You'll find the Python code here: