Lecture 5 ed

lecture 5

In this chapter, we always assume that \(X = Y = \boldsymbol{R}^n, c(x,y) = \frac{ |x-y|^2}{2}\).

Then have

\[ I[\pi] = \int_{\boldsymbol{R}^n \times \boldsymbol{R}^n}\frac{ |x-y|^2}{2} \; \rm d \pi \]

We assume \(\mu, \nu\) are two Borel measure in \(\boldsymbol{R}^n\) , and \(\mu, \nu\) has finite second moment, which means

\[ M_2 = \int_\boldsymbol{R}^n \frac{ |x|^2}{2} \, \rm d \mu(x) + \frac{ |y|^2}{2} \, \rm d \nu(y) < \infty \]

from this condition, we have

\[ I[\pi] < +\infty \quad \forall \pi \in \Pi(\mu, \nu) \]

because

\[ I[\pi] = \int_{\boldsymbol{R}^n \times \boldsymbol{R}^n}\frac{ |x-y|^2}{2} \, \rm d \pi \leq \int \frac{ |x|^2}{2} + \frac{ |y|^2}{2} \, \rm d \pi \leq M_2 \]

5.1 The Dual problem(2.1.2)

From previous course, we obtain

\[ \begin{align} \inf_{\pi \in \Pi} I[\pi] &= \sup_{\Phi_c} J(\varphi, \psi) \\ \text{and} \quad (\varphi, \psi) \in \Phi_c &\Leftrightarrow \varphi(x) + \psi(y) \leq \frac{ |x-y|^2}{2} \\ &\Leftrightarrow \langle x,y \rangle \leq \left[ \frac{ |x|^2}{2} - \varphi(x) \right] + \left[ \frac{ |y|^2}{2} - \psi(y) \right] \end{align} \]

for convienence, we consider

\[ \widetilde{\varphi}(x) = \frac{|x|^2}{2} - \varphi(x) ; \quad \widetilde{\psi}(x) = \frac{|y|^2}{2} - \psi(y) \]

rewrite \(\inf I[\pi],\; \sup J[\varphi, \psi]\), we obtain:

\[ \begin{align} \inf I[\pi] &= \inf_{\pi \in \Pi} \int \frac{|x-y|^2}{2} \, \rm d \pi \\ &= \inf_{\pi \in \Pi} \int \frac{|x|^2}{2} + \frac{|y|^2}{2} - \langle x, y \rangle \, \rm d \pi \\ &= \inf_{\pi \in \Pi} M_c - \int \langle x, y \rangle \, \rm d \pi \\ &= M_c - \sup_{\pi \in \Pi} \int \langle x, y \rangle \, \rm d \pi \\ \sup_{\Phi_c} J &= M_c - \inf \lbrace J \left( \widetilde{\varphi}(x), \widetilde{\psi}(y) \right) : \left(\widetilde{\varphi}(x), \widetilde{\psi}(y) \right) \in \widetilde\Phi \rbrace \\ \text{where} \; \widetilde\Phi &= \lbrace ( \widetilde{\varphi}, \widetilde{\psi} ) \in L^1 \times L^1, \langle x, y \rangle \leq \widetilde{\varphi}(x) + \widetilde{\psi}(y) \; \text{for} \; \mu, \nu \; \text{a.e.} \rbrace \end{align} \]

then

\[ \bbox[yellow]{K-dual \Leftrightarrow \sup_{\pi \in \Pi (\mu, \nu)} \int \langle x, y \rangle \, \rm d \pi = \inf_{\varphi, \psi \in \widetilde\Phi } J (\varphi, \psi)} \]


5.2 Double Convexification Trick

recalling:

\[ \begin{align} \varphi &\to \varphi^c(y) = \inf \{ c(x,y)-\varphi(x) \} \\ & \to \varphi^{cc}(x) = \inf \{ c(x,y) - \varphi^c(y)\} \end{align} \]

assume \((\varphi, \psi) \in \widetilde{\Phi}\), we have for \(\nu-a.e. \; y \in Y:\)

\[ \psi(y) \geq \sup_{x} \{ \langle x, y \rangle -\varphi(x) \} = \varphi^\ast(y) \\ \implies J(\varphi, \psi) \geq J(\varphi, \varphi^\ast) \quad and\; (\varphi, \varphi^\ast) \in \widetilde{\Phi} \]

for \(\mu-a.e. \; x \in X:\)

\[ \varphi(x) \geq \sup_{y} \{ \langle x, y \rangle -\varphi^\ast(y) \} = \varphi^{\ast\ast}(y) \\ \implies J(\varphi, \psi) \geq J(\varphi, \varphi^\ast) \geq J(\varphi^{\ast\ast}, \varphi^\ast) \quad and\; (\varphi^{\ast\ast}, \varphi^\ast) \in \widetilde{\Phi} \]

so we conclude that

\[ \inf_{(\varphi, \psi) \in \widetilde{\Phi} } J(\varphi, \psi) \geq \inf_{\varphi \in L^1(\rm d \mu) } J(\varphi^{\ast\ast}, \varphi^\ast) \]

and it is worthy to point out that \(\varphi^{\ast\ast}, \varphi^\ast\) have special properties: they are convex lower semi-continuous functions.

5.3 Digression: Some facts from convex analysis (p54)

5.3.1 Definition

  • A function is called proper if it is not identically \(\infty\).

  • A proper function \(\varphi : \boldsymbol{R}^n \to \boldsymbol{R} \cup \{ -\infty\}\) is called convex if \[ \varphi( \alpha x + (1-\alpha)y) \leq \alpha \varphi(x) + (1-\alpha)\varphi(y) \quad \forall \alpha \in [0,1] \] it is strictly convex if the equality in above implies \(x = y\) or \(\alpha = 0 \; \text{or} \; 1\)

  • \(\rm {Dom}(\varphi)\) as the set of points where \(\varphi\) is finite.

    Then we have:

    • if \(\varphi\) is proper convex \(\implies \rm{Dom} (\varphi)\) is a nonempty convex set.
    • \(\rm{Dom} (\varphi)\) may be open/closed, but we have its boundary \(\partial \rm{Dom}(\varphi)\) is a small set.

we always assume \(\varphi\) is proper and convex in later discussion.

5.3.2 Differentiability

if \(\varphi\) is proper and convex in \(\boldsymbol{R}^n\) , then it is continuous in \(\rm{Dom} (\varphi)\) and is a lip function. By using Rademacher’s theorem, \(\nabla \varphi\) is almost everywher exists in \(\rm{Int}(\rm{Dom(\varphi)})\). Furthermore, the set where \(\nabla \varphi\) not exist is a small set.

5.3.3 The graph lies above its tangent

For all points \(x\) where \(\varphi\) is differentiable, we have

\[ \varphi(z) - \varphi(x) \geq \nabla \varphi (x) (z-x) \quad \forall z \in \boldsymbol{R}^n \]

In particular, \(\nabla \varphi\) is monotone: whenever \(\varphi\) is differentiable at \(x, z\) ,

\[ \langle \nabla \varphi(x) - \nabla \varphi(z), x -z \rangle \geq 0 \]

5.3.4 Subdifferentiability

The subdifferential of \(\varphi\) at point \(x\) is a set which denote by \(\partial \varphi(x)\):

\[ y \in \partial \varphi(x) \Leftrightarrow \forall z \in \boldsymbol{R}^n : \varphi(z) \geq \varphi(x)+ \langle y, z-x\rangle \]

so \(x \mapsto \partial \varphi(x)\) is a multivalued function take values on \(\boldsymbol{R}^n\)

We shall identify the subdifferntial mapping \(\partial \varphi\) with its graph.

  • We have \(\varphi\) convex \(\implies\) the graph of \(\varphi=\{ (x,y) \in \mathbb{R}^n \times \mathbb{R} | y \geq \varphi(x)\}\) is convex. From Hahn-Banach theorem, we have \(\partial \varphi\) is always nonempty.
  • Especially, if \(\varphi\) is differentiable, then \(\partial \varphi\) is a singleton set, the point is \(\nabla \varphi(x)\).
  • If \(\varphi\) is lower semi-continuous, then \(\partial \varphi\) is continuous i.e. 

\[ \left. \begin{align} x_k &\to x \\ y_k \left(\in \partial \varphi(x_k) \right) &\to y \end{align} \right. \implies y \in \partial \varphi(x) \]

> normal cone

5.3.5 Monotonicity

It is an immediate consequence of the definition of subdifferential that \(\partial \varphi\) is monotone:

\[ \forall y_1 \in \partial \varphi(x_1), y_2 \in \partial \varphi (x_2): \langle y_2 -y_1, x_2 - x_1 \rangle \geq 0 \]

5.3.6 Conjugate functions

For any proper function \(\varphi: \mathbb{R}^n \to \mathbb{R} \cup \{\infty \}\), we define its convex conjugate function, or Legendre transformation, by

\[\varphi^\ast(y) = \sup _{x \in \mathbb{R}^n} \langle x, y \rangle - \varphi (x) \]

Then, \(\varphi^\ast\) is a proper convex lower semi-continuous convex function.

From the definition,

\[ \forall x, y \in \mathbb{R}^n: \varphi(x) + \varphi^\ast(y) \geq \langle x, y \rangle \]

Theorem 5.1 (Characterization of Subdifferential) Let \(\varphi\) be a proper convex lower semi-continuous function in \(\mathbb{R}^n\). Then, for any \(x,y \in \mathbb{R}^n\), we have

\[ \varphi(x) + \varphi^\ast(y) = \langle x, y \rangle \Leftrightarrow y \in \partial \varphi(x) \Leftrightarrow x \in \partial \varphi^\ast(y) \]

Proof. \[ \begin{align} \varphi(x) + \varphi^\ast(y) = \langle x, y \rangle &\Leftrightarrow \langle x, y \rangle \geq \varphi(x) + \varphi^\ast(y) \\ &\Leftrightarrow \langle x, y \rangle \geq \varphi(x) + \langle y, z \rangle - \varphi(z) , \forall z \in \mathbb{R}^n\\ &\Leftrightarrow \varphi(z) \geq \varphi(x) + \langle y, z-x \rangle \\ &\Leftrightarrow y \in \partial \varphi(x) \\ \end{align} \]

The second \(\Leftrightarrow\) will be proved later.

5.3.7 Regularization

5.3.8 Duality and lower semi-continuity

Proposition 5.1 (Legendre duality for lower semi-continuous convex function) Let \(\varphi: \mathbb{R}^n \to \mathbb{R} \cup \{+ \infty\}\) is proper, then the following are equivalent:

(1). \(\varphi\) is a lower semi-continuous convex function

(2). there exists a proper function \(\psi\) such that \(\psi^\ast = \varphi\)

(3). \(\varphi^{\ast\ast} = \varphi\)

Proof. \((3) \to (2) \to (1)\) is obvious.

\((1) \to (3)\)

  • Since \(\varphi(x) \geq \langle x, y \rangle - \varphi^\ast(y), \; \forall y\), we have

    \[\varphi(x) \geq \sup_y \langle x, y \rangle - \varphi^\ast(y) = \varphi^{\ast\ast}\]

    So we only need to prove \(\varphi^{\ast\ast} \geq \varphi\):

  • First we assume \(x \in \rm{Int}(\rm{Dom}\varphi)\), since \(\partial \varphi(x) \neq \emptyset\), we can choose \(y_x \in \partial \varphi(x)\).

    Then, we have

    \[\langle x, y_x \rangle = \varphi(x) + \varphi^\ast(y_x) \implies \varphi^{\ast\ast}(x) \geq_{y = y_x} \langle x, y_x \rangle -\varphi^\ast(y) = \varphi(x) \]

    So \(\varphi^{\ast\ast}(x) = \varphi(x)\) on \(\rm{Int}(\rm{Dom}\varphi)\).

  • To treat the general case, we can regularize \(\varphi\) by inf convolution:

    Define \[f_\epsilon = \frac{ |x|^2}{2\epsilon}, \quad \varphi_\epsilon = \varphi\square f_\epsilon\]

    Then we have \[ \forall x \in \mathbb{R}^n \; \varphi(x) = \lim_{\epsilon \to 0} \varphi_\epsilon (x), \quad \rm{Dom}\varphi_\epsilon = \mathbb{R}^n, \quad \varphi_\epsilon \leq \varphi \]

    From previous discussion, we know \(\varphi_\epsilon^{\ast\ast} = \varphi_\epsilon\)

    Since \[ \varphi_\epsilon \leq \varphi \implies \varphi_\epsilon^\ast \geq \varphi^\ast \implies \varphi_\epsilon^{\ast\ast} \leq \varphi^{\ast\ast} \\ \implies \varphi^{\ast\ast}(x) \geq \liminf_{\epsilon \to 0} \varphi_\epsilon^{\ast\ast} = \liminf_{\epsilon \to 0} \varphi_\epsilon = \varphi(x) \\ \]