yusuf hakan kalayci - github pages · 2020. 9. 15. · yusuf hakan kalayci (boun) short title march...

21
Deep Exponential Families Yusuf Hakan Kalayci Bogazici University [email protected] March 22, 2018 Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 1 / 21

Upload: others

Post on 25-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Deep Exponential Families

    Yusuf Hakan Kalayci

    Bogazici University

    [email protected]

    March 22, 2018

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 1 / 21

  • Overview

    1 Exponential FamilyDefinitionSufficient StatisticsMean and Variance

    2 Deep Exponential FamiliesWhat is DEFModeling DocumentExample

    3 InferenceVariational InferenceBlack Box Variational InferenceInference on DEFs

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 2 / 21

  • Exponential Family

    Definition

    A probability density function p(x |θ), x ∈ X ⊂ Rq, which is labeled byθ ∈ Θ ⊂ Rk , is said to belong to the k-parameter exponential family ifit is of the form

    p(x |θ) = h(x) · exp

    k∑j=1

    ηj(θ) · Tj(x)− B(θ)

    where

    ηj ,B are real valued: Θ→ RTj , h are real valued: Rq → R.

    We also say that family is regular whenever X does not depend on θ andnon-regular otherwise.

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 3 / 21

  • Canonical Form

    Re-parametrization of η = η(θ) as the natural parameter gives us thefollowing density function

    p(x |η) = h(x) · exp[ηTT (x)− A(η)

    ]here:

    η = (η1, ..., ηk)T : natural parameter

    T (x) = (T1(x), ...Tk(x))T : sufficient statistics

    A(η) = log

    ∫ ∫...

    ∫︸ ︷︷ ︸

    q times

    h(x) · exp[ηTT (x)

    ]dx : logarithm of

    normalization constant i.e. log normalizer.

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 4 / 21

  • Bernoulli Distribution

    p(x |π) = πx(1− π)1−x

    = exp

    [log(

    π

    1− π)x + log(1− π)

    ](1)

    η = log( π1−π )

    T (x) = x

    A(η) = − log(1− π) = log(1 + eη)h(x) = 1

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 5 / 21

  • Sufficient Statistics

    Definition

    Statistic is any function on the sample space that is not a function of theparameter.

    T (x) is sufficient for θ if there is no information in x regarding θ beyondthat in T (x). Bayesian Approach:

    p(θ|T (x), x) = p(θ|T (x))

    Frequentist Approach:

    p(x |T (x), θ) = p(x |T (x))

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 6 / 21

  • Mean and Variance

    ∂A

    ∂ηT=

    ∂ηT

    {log

    ∫h(x) · exp

    {ηTT (x)

    }ν(dx)

    }=

    ∫T (x)h(x) · exp

    {ηTT (x)

    }ν(dx)∫

    h(x) · exp {ηTT (x)} ν(dx)

    =

    ∫T (x)h(x) · exp

    {ηTT (x)− A(η)

    }ν(dx)

    = E[T (X )].

    (2)

    ∂2A

    ∂η∂ηT= Var [T (X )] (3)

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 7 / 21

  • Deep Exponential Families

    Deep Exponential Families are:

    flexible family of distributions,

    hierarchical latent variable model,

    built from layers using exponential family distributions,

    designed to capture hidden patterns from coarse to fine grained.

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 8 / 21

  • Deep Exponential Families

    for each data xn, L layers ofhidden variables {zn,1, ...zn,L}each zn,l = {zn,l ,1, ...zn,l ,Kl}each weight Wl is shared acrossdata and is a collection of Klvectors with Kl+1 dimension.

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 9 / 21

  • Deep Exponential Families (EKSIK)

    zn,L,k ∼ expfamL(η)zn,l ,k ∼ expfaml(gl(wTl ,kzn,l+1))xn,i ∼ expfam1(g1(wT1,kzn,1))W ∼ expfam(ξ)

    With the help of Exponential Familywe can say:

    E[T (zn,l ,k)] = ∇ηA(gl(wTl ,kzn,l+1))

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 10 / 21

  • Modeling Documents

    Aim: clustering words into topics in a hierarchical way and Bayesianmanner(probability distributions of topics) in order to analyze largevolumes of text.We can think an example of modeling document problem in DEF as:

    documents: vectors of term counts,

    topics: first latent layer,

    super topics: second latent layer,

    concepts: third latent layer,

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 11 / 21

  • Modeling Documents

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 12 / 21

  • Poisson DEF Example

    Canonical Exponential Family form of Poisson Distribution:

    p(z) = (z!)−1 · exp(ηz − exp(η))

    E[Z ] = exp(η)

    Link function is chosen as logarithm. Then conditional distribution:

    p(zl ,k |zl+1,wl ,k) = (zl ,k !)−1 · exp(log zTl+1wl ,kzl ,k − zTl+1wl ,k)

    Since A is exponential function,

    E[zl ,k ] = ∇ηA(log zTl+1wl ,k) = zTl+1wl ,k

    In case of document modeling, the value of zn,2,k represents ”howmany times super topic k is represented in nth example”.

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 13 / 21

  • Variational Inference

    In VI, we seek to minimize KL divergence to the posterior of ourmodel from an approximate distribution qλ.

    minimizing KL(qλ||p) = Eq[log q(z |λ)]− Eq[log p(z |x)]⇐⇒

    maximizing L(qλ) = Eq[log p(x , z)− log q(z |λ)].

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 14 / 21

  • Black Box Variational Inference

    Aim: find ∇λL without requiring gradients of the model and take a step inthe direction of that gradient and maximize L.Key Observations:

    write ∇λL as an expectation under the q such that

    ∇λL = Eq[∇λ log q(z |λ)(log p(x , z)− log q(z |λ))]

    estimate this expectation with Monte Carlo sampling.

    ∇λL ≈1

    S

    S∑s=1

    ∇λ log q(zs |λ)(log p(x , xs)− log q(zs |λ))

    where zs ∼ q(z |λ).

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 15 / 21

  • The log-derivative Trick

    ∇λ log q(z |λ) =∇λq(z |λ)q(z |λ)

    We can show that Eq[∇λ log q(z |λ)] = 0 and then

    ∇λL = ∇λEq[log p(x , z)− log q(z |λ)]

    = ∇λ∫

    q(z |λ) · (log p(x , z)− log q(z |λ))dz

    =

    ∫∇λq(z |λ) · (log p(x , z)− log q(z |λ))dz

    =

    ∫q(z |λ) · [∇λ log q(z |λ) · (log p(x , z)− log q(z |λ)− 1)]dz

    = Eq[∇λ log q(z |λ) · (log p(z , x)− log q(z |λ))]− Eq[∇λ log q(z |λ)](4)

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 16 / 21

  • Stochastic Optimization

    f (x) be a function to be maximized

    ht(x) ∼ H(x)E[H(x)] = ∇x f (x)ρt be the learning rate

    xt+1 ← xt + ρtht(xt)

    converges to a maximum point of f(x). When the learning rate schedulefollows the Robbins-Monro condition

    ∞∑t=1

    pt =∞

    ∞∑t=1

    p2t

  • Inference on DEFs

    For the approximate distribution q, we use the mean field variationalfamily.

    q(z ,W ) = q(W0)L−1∏l=1

    q(Wl)N∏

    n=1

    q(zn, l)

    Each q(Wl) and q(zn,l) are fully factorized and each factor is exponentialfamily with corresponding layer. i.e.

    q(zn,l ,k) = expfaml(zn,l ,k , λn,l ,k)

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 18 / 21

  • BBVI on DEFs

    To update the variational parameters λn,l ,k using BBVI, we need tocalculate following gradients for each coordinate.

    ∇λn,l,kL = Eq[∇λn,l,k log q(zn,l ,k)(log pn,l ,k(x , z ,W )− log q(zn, l , k))]

    The only quantity that we need to calculate for DEF islog pn,l ,k(x , z ,W ) which is the terms in the log-joint that containszn,l ,k

    log pn,l ,k(x , z ,W ) = log p(zn,l ,k |zn,l+1,Wl ,k) + log p(zn,l−1|zn,l ,Wl−1)

    And similarly for W’s this will be

    log pl ,i ,j(x , z ,W ) = log p(wl ,i ,j |ξl ,i ,j) + log p(zl ,j |zl+1,wl ,j)

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 19 / 21

  • References

    Ranganath, Tang, Charlin, Blei (2016)

    Deep Exponential Families

    https: // www. cs. princeton. edu/ ~ rajeshr/ papers/ def_ aistats. pdf

    Ranganath, Gerrish, Blei (2013)

    Black Box Variational Inference

    https: // arxiv. org/ pdf/ 1401. 0118. pdf

    Robbins, Monro (1951)

    A Stochastic Approximation Method

    https: // projecteuclid. org/ euclid. aoms/ 1177729586

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 20 / 21

    https://www.cs.princeton.edu/~rajeshr/papers/def_aistats.pdfhttps://arxiv.org/pdf/1401.0118.pdfhttps://projecteuclid.org/euclid.aoms/1177729586

  • The End

    Yusuf Hakan Kalayci (BOUN) Short title March 22, 2018 21 / 21

    Exponential FamilyDefinitionSufficient StatisticsMean and Variance

    Deep Exponential FamiliesWhat is DEFModeling DocumentExample

    InferenceVariational InferenceBlack Box Variational InferenceInference on DEFs