Lorenz's Maths Blog

Thursday, 2 October 2025

Thom's gradient conjecture - proof of the infinite dimensional homogeneous case

A key element to proving Thom's gradient conjecture for parabolic evolution equations is to show that it holds for a certain quadratic on an infinite dimensional space. Orthogonal to its finite dimensional kernel, the quadratic operator is bounded below, but not above. The boundedness below allows it to dominate the higher order terms orthogonal to the kernel. The lack of boundedness above is not an issue since it is the direction of the gradient rather than its magnitude which is important for Thom's conjecture. In this note we provide a simplified proof for the quadratic case, and more generally analytic homogeneous functions, under some additional restrictions.

Before the general case had been proven in the finite dimensional case, Thom had already observed that the result holds for a homogeneous polynomial of any degree $m$. The argument is as follows. Let $f:\mathbb{R}^n \to \mathbb{R}$ be a homogeneous polynomial of degree $m \ge 2$. Write $x = r u$ with $r = \|x\| > 0$, $u \in \mathbb{S}^{n-1}$. By homogeneity and Euler’s identity, and recalling that the derivative of a differentiable homogeneous function is homogenous of degree one less than the original function: \[ \nabla f(r u) = r^{m-1}\nabla f(u), \qquad \langle \nabla f(z), z \rangle = m f(z). \] Define the projected function $F := f|_{\mathbb{S}^{n-1}}$. The gradient flow $\dot{x} = -\nabla f(x)$ in these coordinates is \[ \dot{r} = - m f(u)\, r^{m-1}, \qquad \dot{u} = - r^{m-2}\, \nabla_{\mathbb{S}} F(u), \] where the second equation loses a power of r since u = x/r. \[ \frac{d}{dt} F(u(t)) = \left\langle \nabla_{\mathbb{S}} F(u(t)), \, \dot{u}(t) \right\rangle = -\, r^{\,m-2} \, \|\nabla_{\mathbb{S}} F(u)\|^2 \;\le\; 0. \] Introduce a transformed time variable \[ s(t) := \int_{0}^{t} r(\tau)^{\,m-2}\, d\tau, \] Then in the transformed time, using the chain rule: \[ \frac{d u}{d s} = - \nabla_{\mathbb{S}} F(u), \qquad u(0) = u_0. \] Thus, the projected flow is the gradient flow of $F$ on the sphere. It's easy to see that the restriction to the unit sphere of an analytic function is also analytic. Then the standard argument of Lojasiewicz shows that the trajectory of the projection onto the sphere has finite length, which implies that Thom's conjecture holds in this case. So which aspects of this proof this go through when we switch to a function $E:H \to \mathbb{R}$ on an infinite dimensional Hilbert space? It fails when we need to use the Lojasiwicz inequality, which no longer holds in general even for an analytic function. To continue, we'll have to impose some additional restrictions.

Define the zero set $Z := \{ x \in \mathbb{R}^n : f(x) = 0 \}$. Since $f$ is homogeneous, $Z$ has the "cone" property that if $u \in Z$, then $\lambda u \in Z$ for all $\lambda \in \mathbb{R}$. Analytic homogeneous functions also have the property that \[ |f(x)| \;\ge\; C\,\mathrm{dist}(x,Z)^{\,m} \] This is just the Lojasiewicz inequality, except that we know the exponent must be $m$, because the distance to the zero set scales like $r$, while the left hand side scales like $r^m$.

Returning to trying to prove Thom's conjecture for analytic homogeneous functions in infinite dimensions, we know this growth condition will no longer hold in general. But let's try imposing this as a condition and see if it's sufficient to complete the proof. Because $Z$ is a cone, this is equivalent to the bound on $\mathbb{S}^{n-1}$ \[ |F(u)| \;\ge\; C\,\mathrm{dist}(u,\Sigma)^{\,m}, \qquad (u \in \mathbb{S}^{n-1}), \tag{1} \] where $\Sigma := Z \cap \mathbb{S}$. That is, we get the same growth inequality for the projected flow on the sphere.

This growth inequality for the projected flow is a different but equivalent form of the Lojasiewicz inequality. Thus, we've again succeeded in showing the Lojasiewicz inequality for the projected flow.

Finally, because of the loss of compactness when moving to infinite dimensions, we'll also need to assume a Palais-Smale condition to ensure that that the flow on the sphere converges to a unique point. Thus we assume that any sequence $u_k\subset \mathbb{S}$ with $F(u_k)$ bounded and $\|\nabla_{\mathbb{S}}F(u_k)\|\to 0$ admits a convergent subsequence in $\mathbb{S}$. In cases where the zero set on the sphere $\Sigma$ is compact (this occurs when the zero set $Z$ of $f$ is finite dimensional for example), the Palais-Smale condition should follow from our distance growth condition (1). Of course, this is exactly the case when trying to prove Thom's conjecture for parabolic evolution equations, as the Hessian has a finite dimensional kernel.

It's important to note that our growth condition (1) is a lower bound only, and we need not assume any boundedness above.

Sunday, 18 August 2024

Thom's gradient conjecture for nonlinear evolution equations - plagiarism and academic dishonesty

At one point in time, I had the idea to generalise Thom's gradient conjecture to parabolic evolution equations on infinite dimensional spaces. Good ideas for papers are hard to come by, and many academics end up working on overly specific or esoteric results that aren't that substantial simply to generate papers. For example, many papers focus on studying a single example of an equation, rather than a broad class of equations. But my idea was both novel, interesting and lead to very general results about all parabolic evolution equations - not merely a specific one. I had to give up my spare time on evenings and weekends to work on the paper, as I no longer work in academia and I'm not paid to do pure mathematics research.

I uploaded my work to the Arxiv on the 29th of April, 2021.

Thom's Gradient Conjecture Paper

So I was astonished recently to come across a paper by Beomjun Choi and Pei-Ken Hung, uploaded on 27th May 2024, with almost exactly the same title and abstract as my own paper. Like my paper, they use the Lyapunov–Schmidt reduction approach to reduce the problem to the finite dimensional case. Beomjun Choi is from the department of mathematics at Pohang University of Science and Technology in Korea, and Pei-Ken Hung is from the department of mathematics at the University of Illinois Urbana-Champaign in the United States.

Someone Steals my Thom's Gradient Conjecture Paper

Even more astonishing was that, despite it's profound similarity to my own paper, it made absolutely no mention of my paper at all. My paper is not mentioned in the abstract, it's not mentioned in the introduction, and it's not mentioned in the extensive references.

I then noticed that the same two authors also uploaded another paper, 14 months earlier, on Thom's gradient conjecture in infinite dimensions called "Asymptotics for slowly converging evolution equations". Again totally without attribution despite the fact that some of the ideas were taken from my own preprint.

Earlier paper also borrowing from my ideas without attribution

On Beomjun Choi's personal page, he notes his preprint as follows:

Thom's gradient conjecture for nonlinear evolution equations, with P.-K. Hung. (2024). This is major improvement over the previous preprint 'Asymptotics for slowly converging evolution equations' with P.-K. Hung.

In other words, he mentions how his latest preprint draws upon his own previous preprint, improving and completing the attempted results in that paper. Yet he makes no mention of how both preprints heavily draw on ideas from my own preprint.

This is quite simply disgraceful.

Before beginning to write a paper, the first thing a scientist or mathematician must do is a literature review to make sure no one has already published the result. Otherwise, you'll spend months or years doing again something that has already been done. Work that copies or duplicates other people's work can't be published, and academics need publications to get jobs. But even besides that, one begins with a literature review simply to find all the relevant papers which might be useful in their work.

Beomjun Choi and Pei-Ken Hung decided to write a paper on the topic of Thom's gradient conjecture in infinite dimensions. A search on the Arxiv for "Thom's Gradient Conjecture" at that time would have shown my paper in the number one position. How could they remain totally unaware that this was my idea, and I had already written a paper on it?

I decided to contact the authors.

"We have indeed read your preprint when we wrote our paper," they said.

Indeed, they didn't stumble upon exactly the same idea as me by coincidence. Rather, they lifted the idea from my own work, benefiting from my own mind, time and labour.

But they didn't just "read my paper", as they put it. They got the whole idea to generalise Thom's gradient conjecture to infinite dimensional parabolic evolution equations, and to do so using Lyapunov-Schmidt reduction, from my paper. They stole enormously from my work by taking my idea, and chose not to even mention that they had taken this idea from me. Among an extensive list of 58 references, the paper they got the whole idea from is not even mentioned. It takes a considerable investment of time to prepare a paper like the one they have. Why did they devote so much time to my idea, rather than one of the many great ideas of their own? Clearly, they believed my idea was much better than any of their own ideas. So much so that they were prepared to risk any reputational damage that might result from appropriating someone else's preprint. Yet I wasn't deserving of any acknowledgement or credit in their paper? Not even a footnote in small font?

Beomjun and Pei-Kin both work full time as pure mathematics researchers, and there's two of them. It's likely that the pair of them worked full time on their paper for somewhere between 1 and 3 years. It's pretty embarrassing that they need to steal ideas from an individual who can only do research with a small amount of spare time outside of work.

The authors also told me that they believed there was an error in my paper. They apparently believed that, if there is a mistake in a mathematician's preprint, another mathematician may swoop in, write their own very similar paper, and get 100% of the credit for the result on account of being technically the first over the finish line with a totally correct paper. This is not a point of view I subscribe to. If I hadn't written and uploaded this preprint, their paper simply would not exist today. Period. So my contribution to their paper is substantial. In fact, when a professor provides an idea for a paper to his PhD student of postdoc, he or she is often not only an author but first author, even if the student did almost all of the work and ironed out all the details.

Mathematicians and other researchers regularly use preprints to share early and unfinished work with other researchers. It's simply taken as a matter of integrity that other researchers will not attempt to finish the work first and steal credit from the person who had the good idea for the paper in the first place.

Now, there are a few different kinds of errors in mathematics. When you write 20 or 30 pages of symbolic calculations, it's likely there are some small mistakes and typographical errors. But these mistakes are not considered to invalidate the work, as they are easily fixed. Next, there are things that are indeed substantially wrong, but also do not really invalidate the paper as the author can usually fix the argument or find an alternative, "plugging the gap". Sometimes, it is obvious that the overall thrust of the proof is correct, even if some of the details need fixing. Finally, there are errors that are absolutely critical, where the whole methodology of the paper simply will not work, and the authors need to put the entire paper in the bin and go back to the drawing board.

So where on the spectrum did the error they had identified in my paper lie? The error was simply in one of the details (which I've since corrected) - and the overall approach of using Lyapunov–Schmidt reduction to reduce the problem to the finite dimensional case is sound. Indeed, they used it themselves after getting the idea from my paper.

Now, I knew when I uploaded the paper that there might be an oversight. I lacked the time to proofread it extensively, and not working as an academic anymore, I didn't have coauthors to check it, nor a pair of peer reviewers to do further checking. (I decided not to submit it to a journal, as the editorial process and reformatting requirements are quite time consuming, and I don't have a lot to gain). But, I didn't think it mattered. By uploading a draft, I was still sharing my idea with the mathematical community, and if anyone took an interest in the paper and communicated to me any issues, I could fix those issues at that time. Isn't the purpose of the Arxiv partly to allow people to upload draft papers, which might not be in their final state yet? Indeed, the behaviour of these authors totally undermines the ability of researchers to discuss their work with the community until it is totally complete and published, for fear of having their idea stolen. I believe the academic community refers to this activity as "scooping" someone else's paper. If I continue to upload preprints that might not be perfected yet, perhaps Beomjun and Pei-Ken or others like them will continue to swoop in, publish their own versions of my papers and congratulate themselves on getting all the credit. Contrary to what these two authors believe, a paper that is unfinished or not as yet correct is not worthless. On the contrary, the final version of any paper will draw heavily on earlier incorrect iterations it passed through. By uploading draft papers, an author is valuably participating in the academic process.

There are two courtesies that are common in research. The first is that, if the idea for the paper came from someone else, the authors will usually thank such and such for suggesting the topic of the paper. The second is that, if you find an error in someone's paper, you notify them so they can address it. The author will then add a note in their paper thanking such and such for pointing out that there was an error in a previous version. In general, people will often have an acknowledgement section where they will say things like, "I'd like to thank such and such for interesting conversations..." It is a long standing and important academic tradition to acknowledge other contributing persons.

Had they contacted me to point out any error, I would have set about rectifying it and thanked them in my paper for pointing it out. Yet they never contacted me, either about an error, or to express interest in the work I had done. Nor to suggest working together on this or a related paper. Had I been unable to correct the paper by myself in the limited amount of time I could commit to it, perhaps they could have become coauthors and shared credit. Since I was the one who had the idea for the paper, and had already put a lot of time into it, it seems eminently reasonable that I should be one of the authors on any subsequent publication.

I can guess why they never contacted me to point out the error or discuss collaboration. They didn't want to tip me off to the fact that they were attempting to scoop my work, until they had already completed theirs.

Instead, they began quietly preparing their own paper on the exact same topic. Their mentality is that, if they can get the first completely "correct" proof published, all the credit goes to them, rather than me. I do not agree with this petty and destructive view of the world, where one may steal an idea from someone else, complete it first, and become the winner. Eventually, they uploaded a paper that made absolutely no mention of the debt they owed me from having lifted the entire idea for this project out of my own work. Instead, they're going to head off to a journal with a paper on exactly the same topic, which they weren't the ones to think of, and making no mention of my considerable efforts on this project, as if the whole idea ought to belong to them.

There are similarities to the incident with the Chinese mathematicians who claimed that there were "gaps" in Perelman's proof of the Poincare conjecture, and rushed off to publication with the gaps filled, hoping to claim the lion's share of credit on account of being first to totally complete the proof, as if Perelman had little to do with it. Now it could be that Perelman's work didn't really have a flaw, whereas my paper has a flaw which ought to be fixed. Nonetheless, I do not agree that people may conduct themselves in a manner to maximise their own reputations, with no regard for their fellow scientists. That they may, upon seeing someone else's allegedly unfinished work, swoop in and finish it and obtain all the credit on account of being technically the first to have a correct and finished paper. Perelman was quoted as saying, "I can’t say I’m outraged. Other people do worse. Of course, there are many mathematicians who are more or less honest. But almost all of them are conformists. They are more or less honest, but they tolerate those who are not honest".

The idea to generalise Thom's conjecture to parabolic evolution equations in infinite dimensions was mine alone. It's clear that they got the idea for the paper by lifting it straight out of my own. Having no respect for me or my work, these two "scholars" have intended to take all credit for my idea, not even acknowledging my work.

They simply pretend my paper doesn't exist.

Other mathematicians will have whatever opinions they might have. But to me, this behaviour is not acceptable.

And it doesn't make me regret leaving academia one bit.

Actually, they remind me of the penguins from an Attenborough documentary, who construct their own nests by stealing stones from the nests of other penguins when they're not looking.

Pebble stealing penguins

The sole goal of these researchers (and more like them), is to increase the length of their own publication list. Not respecting colleagues. Not working together towards a common goal. Just focusing on increasing their own status at the expense of everyone else's. If this is how the academic mathematics community intends to treat my contribution, which I make with my very limited spare time, I'll likely stop writing papers and do something else with that time. This is bad news for Beomjun Choi and Pei-Ken Hung, as they're apparently reliant on me for ideas.

A maths academic conducting research:

UPDATE:

The two authors have replied to me to make the following claim: "The infinite dimensional Thom’s conjecture is an open problem and well-known in the community. It was not your idea to consider it."

That's surprising, because when I decided to work on this topic I found no existing research, or I would have cited it. They have provided four references to prove that the topic is well-known. Below, I show that not one of these references mentions Thom's conjecture for parabolic evolution equations. Not one of them mentions Thom's conjecture in an infinite dimensional context. All four papers simply site the original finite dimensional version of Thom's conjecture to prove results that are entirely unrelated to the topic of the paper they plagiarized from me.

Arnold Thom Gradient Conjecture for the arrival time

This is the only one of the four papers that has the remotest similarity to the topic of their own (and my) paper. And the similarity is remote. They are studying the solution of a particular elliptic equation. Not a parabolic equation, not the class of all parabolic equations - a single elliptic equation. They're not even studying the elliptic equation itself, but the function that solves it. They show that, if one were to for some reason consider the gradient flow of this finite-dimensional solution function, it would satisfy the original finite dimensional Thom conjecture. At no point do they consider Thom's conjecture for any parabolic evolution equation or any infinite dimensional context. Nowhere do they suggest Thom's conjecture for all parabolic evolution equations in infinite dimensions, much less study it. Interestingly, this paper is not cited by Choi and Hung in their own paper. If they consider this to be their primary source for this "well-known" idea, why have they not cited it? The remaining three papers aren't cited either. Probably because they have no relevance whatsoever to Thom's conjecture in infinite dimensions.

Gradient flow of the norm squared of a moment map

Searching this paper for the word "Thom", there is only one match, in the references. Finding the place in the text where this reference is cited, we find a discussion with no relationship whatsoever to Thom's conjecture for parabolic equations.

The Ricci flow for simply connected nilmanifolds

This paper mentions Thoms' conjecture (the finite dimensional one) in only one place, in a discussion with no relationship whatsoever to Thom's conjecture for parabolic equations on infinite dimensional spaces.

The Ricci flow in a class of solv manifolds

Searching this paper for the word "Thom", there is only one match, in the references. Finding the place in the text where this reference is cited, we find a discussion with no relationship whatsoever to Thom's conjecture for parabolic equations on infinite dimensional spaces.

Saturday, 20 February 2021

Thom's Gradient Conjecture for Parabolic Systems and the Yang-Mills Flow

Thom's gradient conjecture, proved in this paper, asserts that convergent gradient flows of analytic functions on $\mathbb{R^n}$ cannot spiral forever. More precisely, the projection of the flow onto the unit sphere must converge.

In my paper linked below, I show that this result holds also for gradient flows of analytic functions on infinite dimensional Hilbert spaces, provided that the second derivative is a Fredholm operator. This is similar in spirit to the extension by L. Simon of the Lojasiewicz inequality to the same domain. I also show that the result holds for geometric flows with a Gauge symmetry, such as the Yang-Mills flow.

Thom's Gradient Conjecture for Parabolic Systems and the Yang-Mills Flow

An infinite dimensional curve selection lemma

Let $X \subset V$ be a semianalytic set with $0 \in \overline X$, i.e. there exists a sequence $x_n \in X$ with $x_n \to 0$. If we allow $V$ to be a finite dimensional Hilbert space for a moment, the curve selection lemma tells us that there exists an analytic curve $\gamma(t):[0,\varepsilon) \to V$ with $\gamma(0)=0$ and $\gamma([0,\varepsilon)) \subseteq X$.

Since the curve selection lemma is central to many proofs concerning semianalytic sets on finite dimensional spaces, it's interesting to consider when a similar result might hold for sets defined through inequalities involving analytic functions on infinite dimensional spaces.

The curve selection lemma often functions as a kind of compactness result that allows us to restrict attention to a one-dimensional curve. Like the Lojasiewicz inequality, it won't hold in general in infinite dimensions and this failure can be linked to the non-compactness of the unit sphere. For example, suppose we have $\mathcal{E}(u) = \|u\|^3 - c(u)\|u\|^2$. For an orthonormal basis $\{e_i\}$, we can arrange that the coefficient $c(e_i) \to 0$ as $i \to \infty$, as we cycle through the infinite number of dimensions available. Thus, the set $\{ \mathcal{E}(u) > 0 \}$ contains a sequence approaching the origin but contains no analytic curve emanating from the origin.

First, note that the desired curve exists if and only if there exists at least one sequence $x_n \to 0$ with $x_n \in N \subset X$, where $N$ is a finite dimensional analytic manifold, since the ordinary curve selection lemma can then be applied.

We consider now the special case of a function with a Hessian that is elliptic.

Let $V$ be a Hilbert space and let $U \subseteq V$ be an open subset. Let $\mathcal{E} \in C^2(U)$ be an analytic function and assume the $0 \in U$ is a critical point, i.e. $\mathcal{E}'(0) = 0$. We suppose that $\mathcal{E}''(0)$ is a Fredholm operator, that is, it has finite-dimensional kernel and cokernel, and closed range. We also assume for convenience that $\mathcal{E}(0) = 0$. We define the set $W^\varepsilon = \{u:\mathcal{E}(u)\neq 0, \varepsilon\|\mathcal{E}_\theta\| \leq |\mathcal{E}_r| \}$.

Let $P$ be the orthogonal projection onto $\ker \mathcal{E}''(0)$ and $P'$ the adjoint projection. We define the finite dimensional analytic manifold \[ S = \{u \in U| (I-P')\mathcal{E}'(u)=0 \}, \] and denote by $Q$ the nonlinear projection onto $S$ (see [1] for details). We have the following Taylor series. \[ \mathcal{E}(u) = \mathcal{E}(Qu) + \frac{1}{2}\langle \mathcal{E}''(Qu)(u-Qu),u-Qu \rangle + o(\|u-Qu\|^3). \] $\bf{Lemma}$ Define the set $K \subseteq U$ by $K = \mathcal{E}(u) + \mathcal{H}(u) \ \sigma \ 0$ where $\sigma \in \{<,\leq,>,\geq\}$, and $\mathcal{H}$ is an analytic function consisting only of terms of order 3 and higher. Suppose $0 \in Cl(K \cap W^\varepsilon)$. Then there exists an analytic curve $\gamma(t):[0,\varepsilon) \to K \cap W^\varepsilon$ with $\gamma(0)=0$.

$\bf{Proof}$ Since $\mathcal{H}$ consists only of higher order terms which can be incorporated into the higher order terms of $\mathcal{E}$, we assume that $\mathcal{H}=0$. We also assume for readability that $\sigma$ is $>$, since the other cases are analogous. We have \begin{align*} K & = \{u \in U | \mathcal{E}(Qu) + (\mathcal{E}(u) - \mathcal{E}(Qu)) > 0\} \\ & = \{u \in U | \mathcal{E}(Qu) + \frac{1}{2}\langle \mathcal{E}''(Qu)(u-Qu),u-Qu \rangle + o(3) > 0\}. \end{align*} From 12.15 of [1], we know that $\|(I-P')\mathcal{E}'\| \geq c||u-Qu||$. Then from the triangle inequality and the definition of $W^\varepsilon$, we know that \[ |\mathcal{E}_r| \geq c||\mathcal{E}'|| \geq c||u-Qu|| \;\; (*). \] We can assume that $\mathcal{E}(Qu) \leq 0$ in a neighbourhood of $0$, since otherwise we can apply the usual curve selection lemma to the finite dimensional manifold $S$. We can write the quadratic term as \[ \frac{1}{2}\langle \mathcal{E}''(Qu)\hat{u},\hat{u} \rangle ||u-Qu||^2, \] where $\hat{u} = (u - Qu) / \| u - Qu \|$. If there exists $u_0$ such that the quadratic term is positive, then it is trivial to find the required curve. Thus, we may assume that \[ \frac{1}{2}\langle \mathcal{E}''(Qu)\hat{u},\hat{u} \rangle \leq 0 \] in $W^\varepsilon$. By assumption there exists a sequence $u_n \in K \cap W^\varepsilon$ with $u_n \to 0$. Since $\mathcal{E}(u_n) > 0$, The only remaining case is \[ \frac{1}{2}\langle \mathcal{E}''(Qu_n)\hat{u}_n,\hat{u}_n \rangle \to 0. \] Since the derivative must grow linearly along $V_1$, this can only happen if the radial component of $\mathcal{E}''(Qu_n)(u_n-Qu_n)$ is going to zero. This however violates $(*)$, since we are inside $W^\varepsilon$.

We remark that unlike in the finite dimensional case a curve selection lemma will not hold for the set $S$ outside of $W^\varepsilon$, as the Hessian cannot control the behaviour of the higher order terms where the linear growth in the derivative has no radial component. However, a curve selection lemma may hold for other expressions such as those involving the derivative $\mathcal{E}'$.

[1] Chill, R., Fasangova, E., Gradient Systems

Saturday, 6 February 2021

The Lojasiewicz inequality for non-analytic functions

A function $f:\mathbb{R}^n \to \mathbb{R}$ satisfies a Lojasiewicz inequality at $0$ if in a neighbourhood of $0$ we have \[ |\nabla f| \geq c|f|^\rho, \] for some $c>0$ and $\rho \in [\frac{1}{2},1)$. It is well-known that the Lojasiewicz inequality holds for analytic functions. While analyticity is sufficient for the Lojasiewicz inequality to hold, it is not necessary. Trivial examples like $f(x) = x^2 + e^{1/x}$ demonstrate this. What then is an appropriate weaker condition?

A function $f:\mathbb{R}^n \to \mathbb{R}$ is analytic at $0$ if it is locally equal to its Taylor series $T(x)$, i.e., $f(x)=T(x)$. For a non-analytic function let's write \[ f(x) = T(x) + \omega(x), \] where $\omega$ has a Taylor series which is identically zero at the origin. In other words, $\omega$ is the "non-analytic" part of the function. For the Lojasiewicz inequality to hold, $\omega$ need not be zero, and it is in fact only necessary that $\omega$ is dominated by the function's Taylor series in a certain sense.

To see this, observe that if the Lojasiewicz inequality does not hold, then for any sequences $c_n \to 0$ and $\rho_n \to 1$, we can find a sequence $x_n \to 0$ such that \[ |\nabla f(x_n)| < c_n|f(x_n)|^{\rho_n}. \] We can choose the sequence $x_n$ to converge to $0$ as fast as we like.

Let $\mathcal{C}$ be the set of smooth curves emanating from $0$, parameterised by arc length. Consider the sets \[ \mathcal{C}_{a,k}^\varepsilon = \{\gamma \in \mathcal{C}; |\nabla f(\gamma(t))| \geq at^k \; \forall \; t \in [0,\varepsilon) \}, \] \[ X_{a,k}^\varepsilon = \cup_{\gamma \in \mathcal{C}_{a,k}^\varepsilon} \gamma([0,\varepsilon)). \] Clearly the Lojasiewicz inequality holds inside any such set $X_{a,k}^\varepsilon$, even for a function which is not analytic. Thus the sequence $x_n$ is eventually outside $X_{a,k}^\varepsilon$ for any $k \in N$ arbitrarily large and any $a,\varepsilon$ arbitrarily small. Intuitively, we might guess that the sequence $x_n$ is (in some approriate sense) asymptoting to the analytic variety \[ Z(\nabla T) = \{x: \nabla T = 0\}.

\] If the sequence $x_n$ lies on an analytic curve through the origin, then on that curve we must have $\nabla T = 0$. The analytic variety $Z(\nabla T)$ admits a Whitney stratification into a finite number of analytic manifolds at $0$. We hope that we can arrange that the sequence $x_n$ is asymptoting to $Z(\nabla T)$ faster than any given polynomial in $r$. From a previous post, we know this is not true for an arbitrary sequence. Since $T$ and $\nabla T$ are analytic they satisfy Lojasiewicz inequalities. One form of which is \[ \|\nabla T(x)\| \ge C\, \mathrm{dist}(x,Z(\nabla T))^\alpha, \] \[ \|T(x)\| \ge C\, \mathrm{dist}(x,Z(T))^\alpha. \] Note that $Z(T)$ contains $Z(\nabla T)$ so we can use $Z(T)$ for both. We also use suboptimal constants in exchange for the simplicity of having the same constants in both inequalities.

$\bf{Theorem:}$ A non-analytic function $f = T + \omega:\mathbb{R}^n \to \mathbb{R}$ satisfies the Lojasiewicz inequality if the flat or non-analytic part $\omega$ satisfies

\[ \lim_{x \to 0} \frac{\,|\omega(x)| + \|\nabla \omega(x)\|\,}{\operatorname{dist}(x,Z(T))^N} \;=\; 0 \] for all positive integers $N$. The same Lojasiewicz exponent as for $T$ may be used.

It's important to realise that this condition is not satisfied by just any function with zero Taylor series at the origin.

To derive the gradient inequality for $f=T+\omega$ under this assumption, fix a small parameter $\eta>0$. By our earlier argument, we can restrict attention to a polynomial neighbourhood $\mathcal{H}$ of $z(F)$ (so that $\operatorname{dist}(x,Z(f))\le r^k$ for some $k$). Then using our growth condition on $\omega$ and $\nabla \omega$ and the Lojasiewicz inequalities for $T$ and $\nabla T$, we can achieve that for all $x\in \mathcal{H}$, \[ |\omega(x)| \le \eta\,|T(x)|, \qquad \|\nabla\omega(x)\| \le \eta\,\|\nabla T(x)\|. \] On $x\in \mathcal{H}$ the triangle inequality for $f$ implies \[ |f(x)| = |T(x)+\omega(x)| \le |T(x)| + |\omega(x)| \le (1+\eta)\,|T(x)|, \] and for the gradients one has \[ \|\nabla f(x)\| = \|\nabla T(x) + \nabla\omega(x)\| \ge \|\nabla T(x)\| - \|\nabla\omega(x)\| \ge (1-\eta)\,\|\nabla T(x)\|. \] Using the other form of the Lojasiewicz inequality for the analytic function $T$, there exist constants $c>0$ and $\rho \in \bigl[\tfrac{1}{2},1\bigr)$ such that \[ \|\nabla T(x)\| \ge c\,|T(x)|^{\rho} \] for $x$ sufficiently close to the origin. Combining this estimate with the inequalities on $\mathcal{H}$ yields \[ \|\nabla f(x)\| \ge (1-\eta)\,c\,|T(x)|^{\rho} \ge (1-\eta)\,c\,(1+\eta)^{-\rho}\,|f(x)|^{\rho}. \] Hence $f$ satisfies a Lojasiewicz inequality on $\mathcal{H}$: \[ \|\nabla f(x)\| \ge c'\,|f(x)|^{\rho},\quad x \in \mathcal{H}, \] where the modified constant is \[ c' = \frac{(1-\eta)\,c}{(1+\eta)^{\rho}}. \] Since $\eta$ may be chosen arbitrarily small, the constant $c'$ can be made as close to $c$ as desired.

Saturday, 24 October 2020

A subsequence converging to an analytic curve

This is a question that confronted me while I was trying to find a proof of a different result. Consider a sequence $x_n \in \mathbb{R}^m$ with $x_n \to 0$. Under what circumstances does a subsequence converge to an analytic curve $\gamma:[0,\varepsilon) \to \mathbb{R}^m$ with $\gamma(0)=0$? Let me make this notion precise: the sequence must converge according to all derivatives, that is, faster than any power of $r = \|x\|$. It should be noted that the eventual subsequence we extract need not lie on the curve, because of the existence of nonanalytic functions which converge more quickly than any polynomial. By the compactness of $S^{m-1}$ we can pass to a subsequence such that $s_1^n = \|x_n\|$ is converging to some point $s_1 \in S^{m-1}$. So we begin to construct a curve by starting with \[ \gamma(t) = tv_1, \] where $v_1=s_1$. Next, let $s^2_n$ be the intersection of $\gamma(t) = v_1t +v_2^nt^2$ with $S^{n-1}$, where the vector $v_2^n$ is chosen such that $\gamma(t)$ contains $x_n$. Passing to a subsequence, we have $s^2_n \to s_2$. We then have the curve \[ \gamma(t) = v_1t +v_2t^2, \] where $v_2$ is chosen so that the curve intersects $S^{m-1}$ at $s_2$. We assume for the moment that such a $v_2$ exists, and examine this assumption shortly. Note that the distance from $x_n$ to the curve is bounded by $ct_n^2$, where $\gamma(t_n)$ is the point of the curve closest to $x_n$, and since $s^2_n \to s_2$, $c$ can be made as small as we like by truncating the sequence. Iterating this process, we arrive at a curve \[ \gamma(t) = v_1t +v_2t^2 + \ldots + v_kt^k, \] and a subsequence $x_n$ such that $d(x_n,\gamma(t_n)) \leq ct_n^k \leq cr_n^k$, where $r_n =||x_n||$. Now we return to the question of whether the vector $v_2$ (and $v_3,\ldots,v_k$) actually exists. It can happen that as $s^2_n \to s_2$, $v_2^n$ becomes unbounded and consequently $v_2$ doesn't exist. This means that the sequence is converging to $v_1$ slower than $t^2$. This corresponds to the case where the curve must be written as a Puiseux series, rather than a Taylor series. In this case, we multiply the Taylor series by $t$, i.e. we consider \[ \gamma(t) = v_1t^2 +v_2t^3 + \ldots + v_kt^{k+1}, \] and again try to construct $v_2$. It could happen that $v_2 = 0$, in which case we attempt to construct $v_3$, continuing to multiply by $t$ whenever a vector fails to exist. Eventually we will obtain the first two non-zero terms of our curve \[ \gamma(t) = v_1t^{1+l} + v_jt^{j+l} \] for some $j \geq 2$. Otherwise, all subsequences of the sequence $x_n$ must be asymptoting to $v_1$ slower than any rational power $\rho$ of $r$, $\rho > 1$, $\rho \to 1$. From here, we can construct all remaining terms using our original process, i.e. all remaining vectors $v_i$ will exist. Thus, after multiplying by $t$ enough times, we will eventually be able to construct a curve \[ \gamma(t) = v_1t^{1+l} +v_2t^{2+l} + \ldots + v_kt^{k}, \] which satisfies our requirements. However, it's unfortunately possible for $x_n$ to be asymptoting to $v_1$ slower than any rational power or $r$. In cases where the more information is known about the sequence $x_n$, it may be possible to pass to looking for a 2 dimensional manifold the sequence is converging to. If the pathological case is encountered again, pass to looking for a three dimensional manifold, and so on until the dimension of the space is reached.

Thursday, 10 September 2015

The two envelopes problem

This post concerns the two envelopes problem.

To summarise, suppose we know that one envelope contains some money, and another envelope contains twice that amount of money. However, we do not know which is which. We choose one of the envelopes, which contains an unknown amount $x$. There is a 50% chance that this is the larger envelope, and a 50% chance it is the smaller. Thus it would seem that the expected value for the other envelope should be $\frac{1}{2}(2x+\frac{1}{2}x)>x$, so that the other envelope is always the larger one. But since this argument would apply equally well to each envelope, it is obviously incorrect.

The problem is a little tricky, but the error with the argument is clear once you spot it.

The mistake is that one cannot talk about expectation value in the absence of a prescribed probability distribution. Suppose someone puts some money in an envelope. What is the expected value for the amount of money in the envelope? It's clearly a nonsense question. Assuming every positive amount has equal probability, then the expected value would seem to be $\infty / 2$. Similarly, suppose someone puts some money in one envelope, and twice that amount in another envelope. What is the expected value of the amount of money in either envelope? Again, no reasonable answer can be given. Therefore, having supposed that the first envelope contains x, the probability distribution for the second envelope is simply unknown. The only thing we can say about it is that values other than x/2 and 2x are not possible.

While it is true that the other envelope must be either x/2 or 2x, it is not true that each of these must be equally likely. In fact, if the person preparing the envelopes only ever chooses from one of two values, the possible values for the envelopes span only a factor of two, so x/2 and 2x cannot both be possible. Furthermore, the probability distribution for the amount contained in the first envelope and the probability distribution for whether it is the larger or the smaller of the envelopes, are not independent random variables. Although it is correct to say that $x$ has a 50% chance of being the smaller value, and a 50% chance of being the larger value, $x$ may be a different value in each case!

In summary, one must begin with the (not independent) probability distributions for the two envelopes, before being able to talk about expectation values.