Stein’s method, logarithmic Sobolev and transport inequalities
Arturo Jaramillo and HongJuan Zhou
University of Kansas
November 2017
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 1 / 35
Introduction
For d ≥ 1, let γ(dx) denote the standard Gaussian measure in Rd .
Theorem (Classical logarithmic Sobolev inequality for γ)For every probability measure ν of the form ν(dx) = h(x)γ(dx), withh : Rd → R+, we have that the relative entropy and Fisher information ofν with respect to γ, defined by
H(ν|γ) :=∫Rd
h(x) log(h(x))γ(dx), I(ν|γ) :=∫Rd
|∇h(x)|2h(x) γ(dx),
satisfy
H(ν|γ) ≤ 12 I(ν|γ).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 2 / 35
Introduction
For d ≥ 1, let γ(dx) denote the standard Gaussian measure in Rd .
Theorem (Classical logarithmic Sobolev inequality for γ)For every probability measure ν of the form ν(dx) = h(x)γ(dx), withh : Rd → R+, we have that the relative entropy and Fisher information ofν with respect to γ, defined by
H(ν|γ) :=∫Rd
h(x) log(h(x))γ(dx), I(ν|γ) :=∫Rd
|∇h(x)|2h(x) γ(dx),
satisfy
H(ν|γ) ≤ 12 I(ν|γ).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 2 / 35
Introduction
For d ≥ 1, let γ(dx) denote the standard Gaussian measure in Rd .
Theorem (Classical logarithmic Sobolev inequality for γ)For every probability measure ν of the form ν(dx) = h(x)γ(dx), withh : Rd → R+, we have that the relative entropy and Fisher information ofν with respect to γ, defined by
H(ν|γ) :=∫Rd
h(x) log(h(x))γ(dx), I(ν|γ) :=∫Rd
|∇h(x)|2h(x) γ(dx),
satisfy
H(ν|γ) ≤ 12 I(ν|γ).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 2 / 35
Introduction
Objective:Prove a sharper version of the logarithmic Sobolev inequality that includesthe so called “Stein discrepancy”, which is a type of measure of how close
is a measure to the standard d-dimensional distribution.
Note: in the sequel, we will assume that ν(dx) = h(x)γ(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 3 / 35
Introduction
Objective:Prove a sharper version of the logarithmic Sobolev inequality that includesthe so called “Stein discrepancy”, which is a type of measure of how close
is a measure to the standard d-dimensional distribution.
Note: in the sequel, we will assume that ν(dx) = h(x)γ(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 3 / 35
Preliminaries
A matrix-valued map τν : Rd → Rd×d is said to be a Stein kernel for ν, iffor every smooth ϕ : Rd → R,∫
Rdx · ∇ϕ(x)ν(dx) =
∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx).
where 〈A,B〉HS := tr(A∗B) is the Hilbert-Schmidt inner product on Rd×d .
Remark- The matrix τν(x) can be taken to be symmetric.- In the case where ν = γ, we can take τν(x) := Id = identity matrix, since∫
Rdx · ∇ϕ(x)ν(dx) =
∫Rd
∆ϕ(x)ν(dx) =∫Rd〈Id ,Hess[ϕ](x)〉HS ν(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 4 / 35
Preliminaries
A matrix-valued map τν : Rd → Rd×d is said to be a Stein kernel for ν, iffor every smooth ϕ : Rd → R,∫
Rdx · ∇ϕ(x)ν(dx) =
∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx).
where 〈A,B〉HS := tr(A∗B) is the Hilbert-Schmidt inner product on Rd×d .
Remark- The matrix τν(x) can be taken to be symmetric.
- In the case where ν = γ, we can take τν(x) := Id = identity matrix, since∫Rd
x · ∇ϕ(x)ν(dx) =∫Rd
∆ϕ(x)ν(dx) =∫Rd〈Id ,Hess[ϕ](x)〉HS ν(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 4 / 35
Preliminaries
A matrix-valued map τν : Rd → Rd×d is said to be a Stein kernel for ν, iffor every smooth ϕ : Rd → R,∫
Rdx · ∇ϕ(x)ν(dx) =
∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx).
where 〈A,B〉HS := tr(A∗B) is the Hilbert-Schmidt inner product on Rd×d .
Remark- The matrix τν(x) can be taken to be symmetric.- In the case where ν = γ, we can take τν(x) := Id = identity matrix,
since∫Rd
x · ∇ϕ(x)ν(dx) =∫Rd
∆ϕ(x)ν(dx) =∫Rd〈Id ,Hess[ϕ](x)〉HS ν(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 4 / 35
Preliminaries
A matrix-valued map τν : Rd → Rd×d is said to be a Stein kernel for ν, iffor every smooth ϕ : Rd → R,∫
Rdx · ∇ϕ(x)ν(dx) =
∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx).
where 〈A,B〉HS := tr(A∗B) is the Hilbert-Schmidt inner product on Rd×d .
Remark- The matrix τν(x) can be taken to be symmetric.- In the case where ν = γ, we can take τν(x) := Id = identity matrix, since∫
Rdx · ∇ϕ(x)ν(dx) =
∫Rd
∆ϕ(x)ν(dx) =∫Rd〈Id ,Hess[ϕ](x)〉HS ν(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 4 / 35
Improved log-Sobolev inequality
Whenever τν exists, we define the Stein discrepancy of ν with respect toγ, as
S(ν|γ) :=(∫
Rd‖τν(x)− Id‖2HSγ(dx)
) 12.
The main result of the talk is the following
Theorem (Improved logarithmic Sobolev inequality, HSI)
H(ν|γ) ≤ 12S(ν|γ)2 log
(1 + I(ν|γ)
S(ν|γ)2
)In the sequel, we will assume that 0 < S(ν|γ), I(ν|γ) <∞.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 5 / 35
Improved log-Sobolev inequality
Whenever τν exists, we define the Stein discrepancy of ν with respect toγ, as
S(ν|γ) :=(∫
Rd‖τν(x)− Id‖2HSγ(dx)
) 12.
The main result of the talk is the following
Theorem (Improved logarithmic Sobolev inequality, HSI)
H(ν|γ) ≤ 12S(ν|γ)2 log
(1 + I(ν|γ)
S(ν|γ)2
)
In the sequel, we will assume that 0 < S(ν|γ), I(ν|γ) <∞.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 5 / 35
Improved log-Sobolev inequality
Whenever τν exists, we define the Stein discrepancy of ν with respect toγ, as
S(ν|γ) :=(∫
Rd‖τν(x)− Id‖2HSγ(dx)
) 12.
The main result of the talk is the following
Theorem (Improved logarithmic Sobolev inequality, HSI)
H(ν|γ) ≤ 12S(ν|γ)2 log
(1 + I(ν|γ)
S(ν|γ)2
)In the sequel, we will assume that 0 < S(ν|γ), I(ν|γ) <∞.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 5 / 35
Basic results of the Stein kernel
Assume that τν = τ i ,jν 1≤i ,j≤d exists and is symmetric. Recall that τν
satisfies ∫Rd
x · ∇ϕ(x)ν(dx) =∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx). (1)
Thus, for 1 ≤ i , j ≤ d fixed, we can take ϕ(x) = xi and ϕ(x) = xi xj in (1),in order to obtain∫
xν(dx) = 0, and∫
xi xjν(dx) =∫τ i ,jν (x)ν(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 6 / 35
Basic results of the Stein kernel
Assume that τν = τ i ,jν 1≤i ,j≤d exists and is symmetric. Recall that τν
satisfies ∫Rd
x · ∇ϕ(x)ν(dx) =∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx). (1)
Thus, for 1 ≤ i , j ≤ d fixed, we can take ϕ(x) = xi
and ϕ(x) = xi xj in (1),in order to obtain∫
xν(dx) = 0, and∫
xi xjν(dx) =∫τ i ,jν (x)ν(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 6 / 35
Basic results of the Stein kernel
Assume that τν = τ i ,jν 1≤i ,j≤d exists and is symmetric. Recall that τν
satisfies ∫Rd
x · ∇ϕ(x)ν(dx) =∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx). (1)
Thus, for 1 ≤ i , j ≤ d fixed, we can take ϕ(x) = xi and ϕ(x) = xi xj in (1),in order to obtain
∫xν(dx) = 0, and
∫xi xjν(dx) =
∫τ i ,jν (x)ν(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 6 / 35
Basic results of the Stein kernel
Assume that τν = τ i ,jν 1≤i ,j≤d exists and is symmetric. Recall that τν
satisfies ∫Rd
x · ∇ϕ(x)ν(dx) =∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx). (1)
Thus, for 1 ≤ i , j ≤ d fixed, we can take ϕ(x) = xi and ϕ(x) = xi xj in (1),in order to obtain∫
xν(dx) = 0, and∫
xi xjν(dx) =∫τ i ,jν (x)ν(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 6 / 35
The Ornstein-Uhlenbeck semigroup
Let Ptt≥0 denote the Ornstein-Uhlenbeck semigroup in Rd , withinfinitesimal generator
Lf = ∆f − x · ∇f , for f ∈ C2(Rd ;R).
It is well known that Pt can be written as
Pt f (x) =∫Rd
f(e−tx +
√1− e−2ty
)γ(dy).
This expression is called Mehler formula. From it, we can easily obtain
∇Pt f = e−tPt(∇f ),
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 7 / 35
The Ornstein-Uhlenbeck semigroup
Let Ptt≥0 denote the Ornstein-Uhlenbeck semigroup in Rd , withinfinitesimal generator
Lf = ∆f − x · ∇f , for f ∈ C2(Rd ;R).
It is well known that Pt can be written as
Pt f (x) =∫Rd
f(e−tx +
√1− e−2ty
)γ(dy).
This expression is called Mehler formula.
From it, we can easily obtain
∇Pt f = e−tPt(∇f ),
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 7 / 35
The Ornstein-Uhlenbeck semigroup
Let Ptt≥0 denote the Ornstein-Uhlenbeck semigroup in Rd , withinfinitesimal generator
Lf = ∆f − x · ∇f , for f ∈ C2(Rd ;R).
It is well known that Pt can be written as
Pt f (x) =∫Rd
f(e−tx +
√1− e−2ty
)γ(dy).
This expression is called Mehler formula. From it, we can easily obtain
∇Pt f = e−tPt(∇f ),
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 7 / 35
The Ornstein-Uhlenbeck semigroup
By using Mehler’s formula, as well as an integration by parts argument, wecan show that
Pt(∇f )(x) = 1√1− e−2t
∫Rd
yf (e−tx +√
1− e−2ty)γ(dy).
The generator L satisfies the following integration by parts formula∫Rd
f (x)Lg(x)γ(dx) = −∫∇f (x) · ∇g(x)γ(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 8 / 35
The Ornstein-Uhlenbeck semigroup
By using Mehler’s formula, as well as an integration by parts argument, wecan show that
Pt(∇f )(x) = 1√1− e−2t
∫Rd
yf (e−tx +√
1− e−2ty)γ(dy).
The generator L satisfies the following integration by parts formula∫Rd
f (x)Lg(x)γ(dx) = −∫∇f (x) · ∇g(x)γ(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 8 / 35
Formulas for I(ν|γ)
The Fisher information I(ν|γ) can be written in terms of L as follows
I(ν|γ) =∫Rd
|∇h(x)|2h(x) γ(dx)
=∫Rd|∇ log h(x)|2h(x)γ(dx)
= −∫Rd
(L log h(x)
)h(x)γ(dx).
Thus, by setting v := log h, we get
I(ν|γ) = −∫RdLv(x)ν(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 9 / 35
Formulas for I(ν|γ)
The Fisher information I(ν|γ) can be written in terms of L as follows
I(ν|γ) =∫Rd
|∇h(x)|2h(x) γ(dx)
=∫Rd|∇ log h(x)|2h(x)γ(dx)
= −∫Rd
(L log h(x)
)h(x)γ(dx).
Thus, by setting v := log h, we get
I(ν|γ) = −∫RdLv(x)ν(dx).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 9 / 35
Organization of the proof
Define vt := log Pth and νt(dx) = Pthγ(dx).
Replacing h by Pth andusing the symmetry of Pt in the previous expressions, we get
I(νt |γ) = −∫RdLPtvt(x)ν(dx).
For proving the HSI, we use the integrated Bruijn’s formula
H(ν|γ) =∫ ∞
0I(νt |γ)dt.
The result is obtained by obtaining different type of bounds for Iγ(Pth),depending on whether t ≈ 0 or t ≈ ∞.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 10 / 35
Organization of the proof
Define vt := log Pth and νt(dx) = Pthγ(dx). Replacing h by Pth andusing the symmetry of Pt in the previous expressions, we get
I(νt |γ) = −∫RdLPtvt(x)ν(dx).
For proving the HSI, we use the integrated Bruijn’s formula
H(ν|γ) =∫ ∞
0I(νt |γ)dt.
The result is obtained by obtaining different type of bounds for Iγ(Pth),depending on whether t ≈ 0 or t ≈ ∞.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 10 / 35
Organization of the proof
Define vt := log Pth and νt(dx) = Pthγ(dx). Replacing h by Pth andusing the symmetry of Pt in the previous expressions, we get
I(νt |γ) = −∫RdLPtvt(x)ν(dx).
For proving the HSI, we use the integrated Bruijn’s formula
H(ν|γ) =∫ ∞
0I(νt |γ)dt.
The result is obtained by obtaining different type of bounds for Iγ(Pth),depending on whether t ≈ 0 or t ≈ ∞.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 10 / 35
Organization of the proof
Define vt := log Pth and νt(dx) = Pthγ(dx). Replacing h by Pth andusing the symmetry of Pt in the previous expressions, we get
I(νt |γ) = −∫RdLPtvt(x)ν(dx).
For proving the HSI, we use the integrated Bruijn’s formula
H(ν|γ) =∫ ∞
0I(νt |γ)dt.
The result is obtained by obtaining different type of bounds for Iγ(Pth),depending on whether t ≈ 0 or t ≈ ∞.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 10 / 35
Decay of I(νt |γ) and S(νt |γ)
The following results are the main ingredients for the proof of the HSI
TheoremFor every t > 0,
I(νt |γ) ≤ e−2t I(ν0|γ),
and
I(νt |γ) ≤ e−4t
1− e−2t ‖τν − Id‖22,ν = e−4t
1− e−2t S(ν0|γ)2.
Moreover, the Stein discrepancy satisfies
S(νt |γ) ≤ e−2tS(ν0|γ).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 11 / 35
Proof of the LSI inequality
Using the previous bounds, we have that for every u > 0,
H(ν|γ) =∫ u
0Iγ(Pth)dt +
∫ ∞u
Iγ(Pth)dt
≤ I(ν|γ)∫ u
0e−2tdt + S(ν|γ)2
∫ ∞u
e−4t
1− e−2t dt
≤ 12 I(ν|γ)(1− e−2u) + 1
2S(ν|γ)2(−e−2u − log(1− e−2u)).
Optimizing in u (computations are easier if we define 1− e−2u = r), weobtain the result.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 12 / 35
Bound for I(νt |γ), when t is large
We have that
Iγ(Pth) = −∫RdLPtvt(x)ν(dx) = −
∫Rd
[∆Ptvt(x)− x · vt(x)]ν(dx)
=∫Rd〈τν(x)− Id ,Hess(Ptvt)〉HS ν(dx)
To rewrite the Hessian, notice that
∂i ,jPtvt(x) = e−2tPt(∂i ,jvt)(x)
= e−2t√
1− e−2t
∫Rd
yi∂vt∂xj
(e−tx +√
1− e−2ty)γ(dx)
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 13 / 35
Bound for I(νt |γ), when t is largeFrom here it follows that∫
Rd〈τν(x)− Id ,Hess(Ptvt(x))〉HS ν(dx)
= e−2t√
1− e−2t
∫Rd
∫Rd
[(τν(x)−Id )y ·∇vt(e−tx+
√1− e−2ty)
]ν(dx)γ(dy)
This implies, after two suitable applications of the Cauchy-Schwarzinequality, that
Iγ(Pth) ≤ e−2t√
1− e−2t
∫Rd
∫Rd|(τν(x)− Id )y |
× |∇vt(e−tx +√
1− e−2ty)|ν(dx)γ(dy)
≤ e−2t√
1− e−2t
(∫Rd‖τν(x)− Id‖2ν(dx)
∫Rd
Pt |∇vt |2(x)ν(dx)) 1
2
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 14 / 35
Bound for I(νt |γ), when t is large
Thus, since∫Rd
Pt |∇vt |2(x)ν(dx) =∫Rd
Pt |∇vt(x)|2h(x)γ(dx)
=∫Rd|∇vt(x)|2Pth(x)γ(dx) = Iγ(Pth),
we get that
Iγ(Pth) ≤ e−2t√
1− e−2t
(∫Rd‖τν(x)− Id‖2ν(dx)
) 12Iγ(Pth)
12 ,
which implies the desired inequality
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 15 / 35
Sketch of the proof of S(νt |γ) ≤ e−2tS(ν0|γ)
The idea consists on finding a Stein kernel for νt . This is obtained usingintegration by parts, and is given by
τνt (x) := e−2t Pthτν(x)Pth + (1− e−2t)Id .
Therefore,∫Rd‖τνt − Id‖2HS ≤ e−4
∫Rd
‖Pt [h(τν − Id )](x)‖2HSPth(x) γ(dx).
By the Cauchy-Schwarz inequality,
‖Pt [h(τν − Id )](x)‖2HS ≤ Pt [h‖τν − Id‖2HS ](x)Pth(x).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 16 / 35
Sketch of the proof of S(νt |γ) ≤ e−2tS(ν0|γ)
The idea consists on finding a Stein kernel for νt . This is obtained usingintegration by parts, and is given by
τνt (x) := e−2t Pthτν(x)Pth + (1− e−2t)Id .
Therefore,∫Rd‖τνt − Id‖2HS ≤ e−4
∫Rd
‖Pt [h(τν − Id )](x)‖2HSPth(x) γ(dx).
By the Cauchy-Schwarz inequality,
‖Pt [h(τν − Id )](x)‖2HS ≤ Pt [h‖τν − Id‖2HS ](x)Pth(x).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 16 / 35
Sketch of the proof of S(νt |γ) ≤ e−2tS(ν0|γ)
The idea consists on finding a Stein kernel for νt . This is obtained usingintegration by parts, and is given by
τνt (x) := e−2t Pthτν(x)Pth + (1− e−2t)Id .
Therefore,∫Rd‖τνt − Id‖2HS ≤ e−4
∫Rd
‖Pt [h(τν − Id )](x)‖2HSPth(x) γ(dx).
By the Cauchy-Schwarz inequality,
‖Pt [h(τν − Id )](x)‖2HS ≤ Pt [h‖τν − Id‖2HS ](x)Pth(x).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 16 / 35
Consequently,∫Rd‖τνt (x)− Id‖2HSν
t(dx) ≤ e−4t∫Rd
Pt [h‖τν − Id‖HS ](x)γ(dx)
≤ e−4t∫Rd
h(x)‖τν(x)− Id‖2HSγ(dx)
= e−4t∫Rd‖τν − Id‖2HSν(dx),
which gives the desired inequality.
Remark: The Stein’s kernel τνt admits the probabilistic representation
τνt (x) = E[e−2tτν(F ) + (1− e−2t)Id | Ft = x ], νt(dx) -a.e.
where on some probability space (Ω,F ,P), F has distribution ν,Ft := e−tF +
√1− e−2tZ , where Z is a d-dimensional Gaussian vector,
independent of F .
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 17 / 35
Consequently,∫Rd‖τνt (x)− Id‖2HSν
t(dx) ≤ e−4t∫Rd
Pt [h‖τν − Id‖HS ](x)γ(dx)
≤ e−4t∫Rd
h(x)‖τν(x)− Id‖2HSγ(dx)
= e−4t∫Rd‖τν − Id‖2HSν(dx),
which gives the desired inequality.Remark: The Stein’s kernel τνt admits the probabilistic representation
τνt (x) = E[e−2tτν(F ) + (1− e−2t)Id | Ft = x ], νt(dx) -a.e.
where on some probability space (Ω,F ,P), F has distribution ν,Ft := e−tF +
√1− e−2tZ , where Z is a d-dimensional Gaussian vector,
independent of F .
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 17 / 35
Introduction
Objective:
Give an improved form of exponential decay of entropy.Apply Stein’s discrepancy in deriving concentration inequalities.Explore the relationship between transport distances and Stein’sdiscrepancy. The WSH inequality, as an improvement of theTalagrand quadratic transportation cost inequality, provides a sharperbound on the Wasserstein distance W2, which involves Stein’sdiscrepancy and relative entropy. Finally we bound Wp distance byStein’s discrepancy.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 18 / 35
Exponential decay of entropy from HSI
The classical logarithmic Sobolev inequality ensures the exponential decayof the relative entropy
H(νt |γ) ≤ e−2tH(ν|γ) ,
along the O-U semigroup, i.e., dνt = Pthdγ.
Now, applying HSI produces a reinforcement of this exponential decayunder the finiteness of the Stein discrepancy.
CorollaryLet ν with Stein discrepancy S(ν|γ) = S. For any t ≥ 0,
H(νt |γ) ≤ e−4t
e−2t + 1−e−2t
S2 H(ν|γ)H(ν|γ) ≤ e−4t
1− e−2t S2(ν|γ) .
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 19 / 35
Exponential decay of entropy from HSI
The classical logarithmic Sobolev inequality ensures the exponential decayof the relative entropy
H(νt |γ) ≤ e−2tH(ν|γ) ,
along the O-U semigroup, i.e., dνt = Pthdγ.Now, applying HSI produces a reinforcement of this exponential decayunder the finiteness of the Stein discrepancy.
CorollaryLet ν with Stein discrepancy S(ν|γ) = S. For any t ≥ 0,
H(νt |γ) ≤ e−4t
e−2t + 1−e−2t
S2 H(ν|γ)H(ν|γ) ≤ e−4t
1− e−2t S2(ν|γ) .
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 19 / 35
Sketch of proof: Applying HSI inequality to νt implies that
H(νt |γ) ≤ e−4tS2
2 log(1 + e4t I(νt |γ)S2 ).
Set U(t) = e4t
S2 H(νt |γ), then
e2U − 1− 4U ≤ −U ′ .
−2U + 2U2 ≤ −U ′ .
Setting V (t) = e−2tU(t), we get 2e2tV 2(t) ≤ −V ′(t), such that afterintegration,
e2t − 1 ≤ 1V (t) −
1V (0) .
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 20 / 35
Stein discrepancy and concentration inequalities
For the standard Gaussian measure γ, for any 1-Lipschitz functionu : Rd → R with mean zero,
γ(u ≥ r) ≤ e−r2/2 ,
or equivalently, ‖u‖p,γ := (∫Rd |u|pdγ)1/p ≤ C√p, p ≥ 1. Now let ν have
Stein kernel τν , do we have a similar result for ‖u‖p,ν?
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 21 / 35
Theorem (Moment bounds and Stein discrepancy)
Let ν have Stein kernel τν . There exists a positive constant C such that forany 1-Lipschitz function u : Rd → R with
∫Rd udν = 0, and every p ≥ 2,
(∫Rd|u|pdν
)1/p≤ C
(Sp(ν|γ) +√p
(∫Rd‖τν‖p/2
op dν)1/p
)
Here, the p-Stein discrepancy is given by
Sp(ν|γ) =(∫
Rd‖τν − Id‖pHSdν
)1/p.
Taking into account of ‖τν‖op ≤ 1 + ‖τν − Id‖HS, we have
‖u‖p,ν ≤ C(
Sp(ν|γ) +√p +√p√
Sp(ν|γ))
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 22 / 35
Theorem (Moment bounds and Stein discrepancy)
Let ν have Stein kernel τν . There exists a positive constant C such that forany 1-Lipschitz function u : Rd → R with
∫Rd udν = 0, and every p ≥ 2,
(∫Rd|u|pdν
)1/p≤ C
(Sp(ν|γ) +√p
(∫Rd‖τν‖p/2
op dν)1/p
)
Here, the p-Stein discrepancy is given by
Sp(ν|γ) =(∫
Rd‖τν − Id‖pHSdν
)1/p.
Taking into account of ‖τν‖op ≤ 1 + ‖τν − Id‖HS, we have
‖u‖p,ν ≤ C(
Sp(ν|γ) +√p +√p√
Sp(ν|γ))
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 22 / 35
An example of illustration of the theorem
Consider X a centered random variable on a probabilit space with values inRd . Let X1, . . . ,Xn be independent copies of X . Assume X has the law νadmitting a Stein kernel τν . Set Tn = 1√
n∑n
k=1 Xk . A Stein kernel τνn ofthe law νn of Tn is
τνn = E(
1n
n∑k=1
τν(Xk)|Tn
).
Hence,
Sp(νn|γ) ≤ E(‖1
n
n∑k=1
(τν(Xk)− Id)‖pHS
)1/p
≤ Kpn−1/2Sp(ν|γ)
which follows from Rosenthal’s inequality. Kp = O(p).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 23 / 35
An example of illustration of the theorem
Consider X a centered random variable on a probabilit space with values inRd . Let X1, . . . ,Xn be independent copies of X . Assume X has the law νadmitting a Stein kernel τν . Set Tn = 1√
n∑n
k=1 Xk . A Stein kernel τνn ofthe law νn of Tn is
τνn = E(
1n
n∑k=1
τν(Xk)|Tn
).
Hence,
Sp(νn|γ) ≤ E(‖1
n
n∑k=1
(τν(Xk)− Id)‖pHS
)1/p
≤ Kpn−1/2Sp(ν|γ)
which follows from Rosenthal’s inequality. Kp = O(p).Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 23 / 35
For any 1-Lipschitz function u : Rd → R such that E(u(Tn)) = 0, byTheorem 5,
‖u(Tn)‖Lp ≤ C√p(1 + n−1/2√pSp + n−1/4√
pSp) .
By Markov’s inequality, one can deduce a concentration inequality for Tn.For example, if Sp = O(pα) for some α > 0, then ‖u(Tn)‖Lp ≤ C√p forany p ≤ n
12α+2 . Then
P(|u(Tn)| ≥ r) ≤(C√p
r
)p,
with p ∼ r2
4C2 . Optimizing p gives
P(u(Tn) ≥ r) ≤ C ′e−r2/C ′ ,
for all 0 ≤ r ≤ rn where rn ∼ n1
4α+4 .
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 24 / 35
For any 1-Lipschitz function u : Rd → R such that E(u(Tn)) = 0, byTheorem 5,
‖u(Tn)‖Lp ≤ C√p(1 + n−1/2√pSp + n−1/4√
pSp) .
By Markov’s inequality, one can deduce a concentration inequality for Tn.For example, if Sp = O(pα) for some α > 0, then ‖u(Tn)‖Lp ≤ C√p forany p ≤ n
12α+2 . Then
P(|u(Tn)| ≥ r) ≤(C√p
r
)p,
with p ∼ r2
4C2 .
Optimizing p gives
P(u(Tn) ≥ r) ≤ C ′e−r2/C ′ ,
for all 0 ≤ r ≤ rn where rn ∼ n1
4α+4 .
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 24 / 35
For any 1-Lipschitz function u : Rd → R such that E(u(Tn)) = 0, byTheorem 5,
‖u(Tn)‖Lp ≤ C√p(1 + n−1/2√pSp + n−1/4√
pSp) .
By Markov’s inequality, one can deduce a concentration inequality for Tn.For example, if Sp = O(pα) for some α > 0, then ‖u(Tn)‖Lp ≤ C√p forany p ≤ n
12α+2 . Then
P(|u(Tn)| ≥ r) ≤(C√p
r
)p,
with p ∼ r2
4C2 . Optimizing p gives
P(u(Tn) ≥ r) ≤ C ′e−r2/C ′ ,
for all 0 ≤ r ≤ rn where rn ∼ n1
4α+4 .
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 24 / 35
Recall the theorem’s conclusion(∫Rd|u|pdν
)1/p≤ C
(Sp(ν|γ) +√p
(∫Rd‖τν‖p/2
op dν)1/p
).
Sketch of the proof: Set φ(t) =∫Rd (Ptu)2qdν. Differentiating along the
semigroup and using the definition of stein kernel τν yields
φ′(t) = 2q∫
(Ptu)2q−1〈Id− τν ,Hess(Ptu)〉HS
−2q(2q − 1)∫
(Ptu)2q−2〈τν ,∇Ptu ⊗∇Ptu〉HSdν
Using |∇u| ≤ 1 and |∇Ptu| ≤ e−t , we bound
−φ′(t) ≤ c1(t, q)∫|Ptu|2q−1‖τν−Id‖Hsdν+c2(t, q)
∫|Ptu|2q−1‖τν‖opdν
From young inequality, one can deduce
−φ′(t) ≤ C(t)φ(t) + D(t).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 25 / 35
Recall the theorem’s conclusion(∫Rd|u|pdν
)1/p≤ C
(Sp(ν|γ) +√p
(∫Rd‖τν‖p/2
op dν)1/p
).
Sketch of the proof: Set φ(t) =∫Rd (Ptu)2qdν. Differentiating along the
semigroup and using the definition of stein kernel τν yields
φ′(t) = 2q∫
(Ptu)2q−1〈Id− τν ,Hess(Ptu)〉HS
−2q(2q − 1)∫
(Ptu)2q−2〈τν ,∇Ptu ⊗∇Ptu〉HSdν
Using |∇u| ≤ 1 and |∇Ptu| ≤ e−t , we bound
−φ′(t) ≤ c1(t, q)∫|Ptu|2q−1‖τν−Id‖Hsdν+c2(t, q)
∫|Ptu|2q−1‖τν‖opdν
From young inequality, one can deduce
−φ′(t) ≤ C(t)φ(t) + D(t).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 25 / 35
Recall the theorem’s conclusion(∫Rd|u|pdν
)1/p≤ C
(Sp(ν|γ) +√p
(∫Rd‖τν‖p/2
op dν)1/p
).
Sketch of the proof: Set φ(t) =∫Rd (Ptu)2qdν. Differentiating along the
semigroup and using the definition of stein kernel τν yields
φ′(t) = 2q∫
(Ptu)2q−1〈Id− τν ,Hess(Ptu)〉HS
−2q(2q − 1)∫
(Ptu)2q−2〈τν ,∇Ptu ⊗∇Ptu〉HSdν
Using |∇u| ≤ 1 and |∇Ptu| ≤ e−t , we bound
−φ′(t) ≤ c1(t, q)∫|Ptu|2q−1‖τν−Id‖Hsdν+c2(t, q)
∫|Ptu|2q−1‖τν‖opdν
From young inequality, one can deduce
−φ′(t) ≤ C(t)φ(t) + D(t).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 25 / 35
Wasserstein distance and Stein discrepancy
Theorem (Wasserstein distance and Stein discrepancy)For every centered probability measure ν on Rd ,
W2(ν, γ) ≤ S(ν|γ)
Remark: The measure ν is not assumed to admit a density w.r.t Lebesguemeasure on Rd .
Sketch of proof:Step 1: Assume dν = hdγ, and let vt = log Pth and dνt = Pthdγ. Thenfrom a result of Otto and Villani (2000),
d+
dt W2(ν, νt) ≤(∫
Rd|∇vt |2dνt
)1/2.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 26 / 35
Wasserstein distance and Stein discrepancy
Theorem (Wasserstein distance and Stein discrepancy)For every centered probability measure ν on Rd ,
W2(ν, γ) ≤ S(ν|γ)
Remark: The measure ν is not assumed to admit a density w.r.t Lebesguemeasure on Rd .Sketch of proof:Step 1: Assume dν = hdγ, and let vt = log Pth and dνt = Pthdγ. Thenfrom a result of Otto and Villani (2000),
d+
dt W2(ν, νt) ≤(∫
Rd|∇vt |2dνt
)1/2.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 26 / 35
Then
W2(ν, γ) ≤∫ ∞
0
(∫Rd|∇vt |2dνt
)1/2dt ≤ S(ν|γ)
∫ ∞0
e−2t√
1− e−2tdt.
Step 2: For the general case, we do a regularization procedure. Namely, fixε > 0 and introduce Fε = e−εF +
√1− e−2εZ where F and Z are
independent with laws ν and γ.The distribution of Fε, νε admits smooth density hε w.r.t γ.νε has a stein kernel τνε(x) = E(e−2ετν(F ) + (1− e−2ε)Id|Fε = x).S(νε|γ) ≤ e−2εS(ν|γ).As ε→ 0, Fε → F in L2, so W2(νε, γ)→W2(ν, γ).
W2(ν, γ) = limε→0
W2(νε, γ) ≤ lim supε→0
S(νε|γ) ≤ S(ν|γ).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 27 / 35
Then
W2(ν, γ) ≤∫ ∞
0
(∫Rd|∇vt |2dνt
)1/2dt ≤ S(ν|γ)
∫ ∞0
e−2t√
1− e−2tdt.
Step 2: For the general case, we do a regularization procedure. Namely, fixε > 0 and introduce Fε = e−εF +
√1− e−2εZ where F and Z are
independent with laws ν and γ.
The distribution of Fε, νε admits smooth density hε w.r.t γ.νε has a stein kernel τνε(x) = E(e−2ετν(F ) + (1− e−2ε)Id|Fε = x).S(νε|γ) ≤ e−2εS(ν|γ).As ε→ 0, Fε → F in L2, so W2(νε, γ)→W2(ν, γ).
W2(ν, γ) = limε→0
W2(νε, γ) ≤ lim supε→0
S(νε|γ) ≤ S(ν|γ).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 27 / 35
Then
W2(ν, γ) ≤∫ ∞
0
(∫Rd|∇vt |2dνt
)1/2dt ≤ S(ν|γ)
∫ ∞0
e−2t√
1− e−2tdt.
Step 2: For the general case, we do a regularization procedure. Namely, fixε > 0 and introduce Fε = e−εF +
√1− e−2εZ where F and Z are
independent with laws ν and γ.The distribution of Fε, νε admits smooth density hε w.r.t γ.νε has a stein kernel τνε(x) = E(e−2ετν(F ) + (1− e−2ε)Id|Fε = x).S(νε|γ) ≤ e−2εS(ν|γ).As ε→ 0, Fε → F in L2, so W2(νε, γ)→W2(ν, γ).
W2(ν, γ) = limε→0
W2(νε, γ) ≤ lim supε→0
S(νε|γ) ≤ S(ν|γ).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 27 / 35
Talagrand inequality and WSH inequality
The Talagrand quadratic transportation cost inequality bounds theWasserstein distance using relative entropy.
W 22 (ν, γ) ≤ 2H(ν|γ) .
Applying HSI inequality produces
Theorem (Gaussian WSH inequality)Let dν = hdγ be a centered probability measure on Rd with smoothdensity h w.r.t γ. Assume that S(ν|γ) and H(ν|γ) are positive and finite.Then
W2(ν, γ) ≤ S(ν|γ) arccos(e−H(ν|γ)S2(ν|γ) ).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 28 / 35
Talagrand inequality and WSH inequality
The Talagrand quadratic transportation cost inequality bounds theWasserstein distance using relative entropy.
W 22 (ν, γ) ≤ 2H(ν|γ) .
Applying HSI inequality produces
Theorem (Gaussian WSH inequality)Let dν = hdγ be a centered probability measure on Rd with smoothdensity h w.r.t γ. Assume that S(ν|γ) and H(ν|γ) are positive and finite.Then
W2(ν, γ) ≤ S(ν|γ) arccos(e−H(ν|γ)S2(ν|γ) ).
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 28 / 35
Sketch of proof: By HSI inequality and decay feature of Stein’sdiscrepancy,
H(νt |γ) ≤ 12S2(ν|γ) log
(1 + I(νt |γ)
S2(ν|γ)
).
Exponentiating both sides,√I(νt |γ) ≤ I(νt |γ)
S(ν|γ)√
e2H(νt |γ)S2(ν|γ) − 1
.
By the result of Otto and Villani(2000), that is, the derivative ofWasserstein distance is bounded by square root of Fisher information,
d+
dt W2(ν, νt) ≤ −ddt H(νt |γ)
S(ν|γ)√
e2H(νt |γ)S2(ν|γ) − 1
= − ddt
(S(ν|γ) arccos
(e−H(νt |γ)S2(ν|γ)
)).
Integrating between t = 0 and t =∞ yields the result.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 29 / 35
Sketch of proof: By HSI inequality and decay feature of Stein’sdiscrepancy,
H(νt |γ) ≤ 12S2(ν|γ) log
(1 + I(νt |γ)
S2(ν|γ)
).
Exponentiating both sides,√I(νt |γ) ≤ I(νt |γ)
S(ν|γ)√
e2H(νt |γ)S2(ν|γ) − 1
.
By the result of Otto and Villani(2000), that is, the derivative ofWasserstein distance is bounded by square root of Fisher information,
d+
dt W2(ν, νt) ≤ −ddt H(νt |γ)
S(ν|γ)√
e2H(νt |γ)S2(ν|γ) − 1
= − ddt
(S(ν|γ) arccos
(e−H(νt |γ)S2(ν|γ)
)).
Integrating between t = 0 and t =∞ yields the result.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 29 / 35
Sketch of proof: By HSI inequality and decay feature of Stein’sdiscrepancy,
H(νt |γ) ≤ 12S2(ν|γ) log
(1 + I(νt |γ)
S2(ν|γ)
).
Exponentiating both sides,√I(νt |γ) ≤ I(νt |γ)
S(ν|γ)√
e2H(νt |γ)S2(ν|γ) − 1
.
By the result of Otto and Villani(2000), that is, the derivative ofWasserstein distance is bounded by square root of Fisher information,
d+
dt W2(ν, νt) ≤ −ddt H(νt |γ)
S(ν|γ)√
e2H(νt |γ)S2(ν|γ) − 1
= − ddt
(S(ν|γ) arccos
(e−H(νt |γ)S2(ν|γ)
)).
Integrating between t = 0 and t =∞ yields the result.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 29 / 35
HWI Inequality and the comparison with HSI inequality
Otto and Villani (2000) give the HWI inequality, which states, for alldν = hdγ,
H(ν|γ) ≤W2(ν, γ)√
I(ν|γ)− 12W 2
2 (ν, γ).
Q: can we produce an inequality involving H,W2, I,S?
Here is a possibleway for the computation
Entγ(h) =∫ t
0Iγ(Psh)ds + Entγ(Pth)
≤ Iγ(h)∫ u
0e−2sds + S2(ν|γ)
∫ t
u
e−4s
1− e−2s ds
+ e−2t
2(1− e−2t)W 22 (ν, γ) ,
following from the proof idea of HSI inequality and the reverse Talagrandinequality along the semigroup.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 30 / 35
HWI Inequality and the comparison with HSI inequality
Otto and Villani (2000) give the HWI inequality, which states, for alldν = hdγ,
H(ν|γ) ≤W2(ν, γ)√
I(ν|γ)− 12W 2
2 (ν, γ).
Q: can we produce an inequality involving H,W2, I,S? Here is a possibleway for the computation
Entγ(h) =∫ t
0Iγ(Psh)ds + Entγ(Pth)
≤ Iγ(h)∫ u
0e−2sds + S2(ν|γ)
∫ t
u
e−4s
1− e−2s ds
+ e−2t
2(1− e−2t)W 22 (ν, γ) ,
following from the proof idea of HSI inequality and the reverse Talagrandinequality along the semigroup.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 30 / 35
Integrating out the integrals and setting α = 1− e−2u ≤ 1− e−2t = β,
H(ν|γ) ≤ 12 inf
0<α≤β≤1Φ(α, β) ,
where
Φ(α, β) = αI(ν|γ)+(α−logα)S2(ν|γ)+1− ββ
W 22 (ν, γ)+(log β−β)S2(ν|γ) .
When α = β, HWI is obtained. When β = 1, HSI is obtained.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 31 / 35
HWI Vs. HSI
Consider the probability measure dνn(x) = ρn(x)dx , where
ρn(x) = 1√2π
((1− an)e−x2/2 + nane−n2x2/2
),
an ∈ [0, 1] and an = o( 1log n ).
A direct computation shows thatH(νn|γ)→ 0. Moreover,
I(νn|γ) =∫Rρ′n(x)2
ρn(x) dx − 1 ∼ n2an.
S2(νn|γ) =∫R(τn(x)− 1)2ρn(x)dx ≤ an → 0.
W2(νn, γ) ≤ √an. Also, W2(νn, γ) ≤ can for some constant c > 0.
The bound of HWI, W2(νn, γ)√
I(νn|γ)− 12W 2
2 (νn, γ) ∼ na3/2n →∞.
The bound of HSI, S2(νn|γ) log(1 + I(νn|γ)S2(νn|γ)) ∼ 2an log n→ 0.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 32 / 35
HWI Vs. HSI
Consider the probability measure dνn(x) = ρn(x)dx , where
ρn(x) = 1√2π
((1− an)e−x2/2 + nane−n2x2/2
),
an ∈ [0, 1] and an = o( 1log n ). A direct computation shows that
H(νn|γ)→ 0.
Moreover,I(νn|γ) =
∫Rρ′n(x)2
ρn(x) dx − 1 ∼ n2an.
S2(νn|γ) =∫R(τn(x)− 1)2ρn(x)dx ≤ an → 0.
W2(νn, γ) ≤ √an. Also, W2(νn, γ) ≤ can for some constant c > 0.
The bound of HWI, W2(νn, γ)√
I(νn|γ)− 12W 2
2 (νn, γ) ∼ na3/2n →∞.
The bound of HSI, S2(νn|γ) log(1 + I(νn|γ)S2(νn|γ)) ∼ 2an log n→ 0.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 32 / 35
HWI Vs. HSI
Consider the probability measure dνn(x) = ρn(x)dx , where
ρn(x) = 1√2π
((1− an)e−x2/2 + nane−n2x2/2
),
an ∈ [0, 1] and an = o( 1log n ). A direct computation shows that
H(νn|γ)→ 0. Moreover,I(νn|γ) =
∫Rρ′n(x)2
ρn(x) dx − 1 ∼ n2an.
S2(νn|γ) =∫R(τn(x)− 1)2ρn(x)dx ≤ an → 0.
W2(νn, γ) ≤ √an. Also, W2(νn, γ) ≤ can for some constant c > 0.
The bound of HWI, W2(νn, γ)√
I(νn|γ)− 12W 2
2 (νn, γ) ∼ na3/2n →∞.
The bound of HSI, S2(νn|γ) log(1 + I(νn|γ)S2(νn|γ)) ∼ 2an log n→ 0.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 32 / 35
Wp distance and Stein discrepancy
Proposition (Wp distance and Stein discrepancy)
Let ν be a centered probability measure on Rd with Stein kernel τν in thesense
∫Rd xφdν =
∫Rd τν∇φdν for every smooth test funciton φ. For every
p ≥ 1, set
‖τν − Id‖p,ν =
d∑i ,j=1
∫Rd|τ ijν − δij |pdν
1/p
.
(1) Let p ∈ [1, 2). Then
Wp(ν, γ) ≤ Cpd1−1/p‖τν − Id‖p,ν .
(2) Let p ∈ [2,∞). Then if ν has finite moments of order p, then
Wp(ν, γ) ≤ Cpd1−2/p‖τν − Id‖p,ν .
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 33 / 35
Idea of the proof: As usual, write vt = log Pth and dνt = Pthdγ. Aversion of ∇vt , t > 0 is given by
∇vt(x) = e−2t√
1− e−2tE ((τν(F )− Id)Z |Ft = x)
= e−2t√
1− e−2tE
d∑j=1
(τ ijν (F )− δij)Zj |Ft
.where F and Z are indepdent with laws ν and γ respectively, andFt = e−tF +
√1− e−2tZ .
Moreover,
Wp(ν, γ) ≤∫ ∞
0(∫Rd|∇vt |pdνt)1/pdt.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 34 / 35
Idea of the proof: As usual, write vt = log Pth and dνt = Pthdγ. Aversion of ∇vt , t > 0 is given by
∇vt(x) = e−2t√
1− e−2tE ((τν(F )− Id)Z |Ft = x)
= e−2t√
1− e−2tE
d∑j=1
(τ ijν (F )− δij)Zj |Ft
.where F and Z are indepdent with laws ν and γ respectively, andFt = e−tF +
√1− e−2tZ . Moreover,
Wp(ν, γ) ≤∫ ∞
0(∫Rd|∇vt |pdνt)1/pdt.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 34 / 35
Bibliography
Ledoux, Michel and Nourdin, Ivan and Peccati, Giovanni (2015).Stein’s method, logarithmic Sobolev and transport inequalities.Geometric and Functional Analysis. 1 256–306.
Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 35 / 35