mathematical statistics teamstat.sys.i.kyoto-u.ac.jp/.../05/20180304e_poster.pdf · mathematical...

Post on 25-Jun-2020

11 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

数理統計学チーム 下平英寿Mathematical Statistics Team

Hidetoshi Shimodaira

• Motivation : Assessing the confidence of each obtained cluster• Consider approx. unbiased p-values as frequentist confidence measures

Null hypothesis = obtained cluster is NOT true• Example : lung data (73 tumors, 916 genes; Garber et al., 2001)

• Two asymptotic theories

Key point: Multiscale Bootstrap(Shimodaira, 2002; 2004)

ü Low computational cost : ü Double bootstrap method has same

accuracy but high comp. cost

Y. Terada and H. Shimodaira/Selective inference via multiscale bootstrap 5

Algorithm 1 Computing approximately unbiased p-values1: Specify several n! ! N values, and set !2 = n/n!. Set the number of bootstrap replicates

B, say, 1000.2: For each n!, perform bootstrap resampling to generate Y " for B times and compute"!2 (H|y) = CH/B and "!2 (S|y) = CS/B by counting the frequencies CH = #{Y " ! H}and CS = #{Y " ! S}. (We actually work on X "

n! instead of Y ".) Compute #!2 (H|y) =!!̄#1("!2 (H|y)) and #!2 (S|y) = !!̄#1("!2 (S|y)).

3: Estimate parameters $H(y) and $S(y) by fitting models

#!2 (H|y) = %H(!2|$H) and #!2 (S|y) = %S(!2|$S),

respectively. The parameter estimates are denoted as $̂H(y) and $̂S(y). If we have severalcandidate models, apply above to each and choose the best model based on AIC value.

4: Approximately unbiased p-values of selective inference (pSI) and non-selective inference(pAU) are computed by one of (A) and (B) below.

(A) Extrapolate #!2 (H|y) and #!2 (S|y) to !2 = "1 and 0, respectively, by

zH = %H("1|$̂H(y)) and zS = %S(0|$̂S(y)),

and then compute p-values by

pSI(H|S, y) =!̄(zH)

!̄(zH + zS)and pAU(H|y) = !̄(zH).

(B) Specify k ! N, !20 ,!

2#1 > 0 (e.g., k = 3 and !2

#1 = !20 = 1). Extrapolate #!2 (H|y)

and #!2 (S|y) to !2 = "1 and 0, respectively, by

zH,k = %H,k("1|$̂H(y),!2#1) and zS,k = %S,k(0|$̂S(y),!2

0),

where the Taylor polynomial approximation of %H at &2 > 0 with k terms is:

%H,k(!2|$̂H(y), &2) =

k#1!

j=0

(!2 " &2)j

j!

'j%H(!2|$̂H(y))

'(!2)j

"""""!2="2

,

and that of %S is defined similarly. Then compute p-values by

pSI,k(H|S, y) =!̄(zH,k)

!̄(zH,k + zS,k)and pAU,k(H|y) = !̄(zH,k).

Model fitting to psi

Non-selectivep-valueSelective

p-value

For n

on-s

moo

th su

rfac

es

(suc

h as

con

es a

nd p

olyh

edra

)

Extrapolation

The hypothesis is tested only for

the clusters appeared in the obtained tree

Need selective inference for pvclust

Pvclust (Suzuki and Shimodaira, 2006)

fetal_lung

����BQRrmal

����BQRrmal

����BQRrmal

����BQRrmal

����BQRrmal

����B$

GHQR

����B$

GHQR

����B$

GHQR

����BQRGH

����37

B$GHQR

����07B$G

HQR

����B$

GHQR

����B$

GHQR

����B$

GHQR

���B$G

HQR

����B$

GHQR

����B$

GHQR

����B$

GHQR

����B$

GHQR

����B$

GHQR

���B$G

HQR

����B$

GHQR

���B$G

HQR

����B$

GHQRBF

����B$

GHQRBS

����B$

GHQR

����B$

GHQR

����B$

GHQR

����37

B$GHQR

����B$

GHQRBS

����B$

GHQRBF

����B6

&/&

����B6

&/&

����B6

&/&

����B6

&/&

����BQRGH

����B$

GHQR

����B$

GHQR

���BFRPELQHG

���B$G

HQR

����B6

&&

����B6

&&

����B6

&&

�ï6&

&���B6&

&����B6

&&

����BQRGH

����B6

&&

����BQRGH

���B6&

&����B6

&&

����B6

&&

����BQRGH

����B6

&&BS

����B6

&&BF

����B$

GHQR

����B$

GHQR

��B/&/&

����B/&/&

����B/&/&

����B$

GHQR

����B/&/&

���B6&

&����B$

GHQR

����B$

GHQR

����BQRGH

����B$

GHQR

����07�B$

GHQR

����07�B$

GHQR����B$

GHQR

���B$G

HQR

����B$

GHQR

0.0

0.2

0.4

0.6

���

1.0

+HLJKW

100100 100

100 10051 100100100 �� 100 100100

��1001003100 �� 100

100�� �� ���� 34 �53 ����100 �� ���� �� ����

14 �� 10060 100 5� ��5 ��015 2 ���� 1063 10042 ��0 240 20����

00 00

22 101123

VL

100100 100

100 100�� 100100100 100 100 100100

��10010062100 �� 100

100�� �� ���� �� ���� ����100 �� ���� �� ����

61 �� 100�� 100 ���� ���� ������ �� ���� ���� 100�� ��64 ���� 63����

���� ����

�� ������

au

100100 100

100 10066 100100100 �� 100 100100

��10010041100 �� 100

100�� �� ���� �� 3060 ����100 �� ���� �� ����

52 �� 100�� 100 ��31 ��21 ��2645 �� ��40 3656 ��62 ��21 4420 55����

153 163

�� 30��26

ES

12 3

4 56 ��� 10 11 1213

141516���� �� 20

2122 23 2425 26 ���� ��3031 32 3334 35 36��

�� �� 4041 42 4344 4546 ������ 50 5152 5354 5556 ���� ��60 616263

6465 66��

�� ������

HGJH��

↑ Hypothesis region

Selection region ↓

Theorem (large sample theory): Boundary surfaces of H and S are smooth and “nearly parallel”, ⇒ The proposed p-value is second order accurate with error Theorem (nearly flat surfaces): Boundary surfaces are “nearly flat”, approaching flat but allowing non-smooth such as cones and polyhedra⇒ the proposed p-value is justified as unbiased with error O(�2)

O(n�1)

O(B)

O(B2)

Algorithm : Computing approx. unbiased selective p-values

(1) Large sample theory with “nearly parallel surfaces”

(2) Asymptotic theory of “nearly flat surfaces”

h(u) = h0 + hiui + hijuiuj + hijkuiujuk + · · · ,

h0 = O(1), hi = O(n�1/2), hij = O(n�1/2), hijk = O(n�1), . . .

(u, v) are O(pn), but h0 = O(n1/2), hi = O(1) are multiplied by O(n�1/2)

supu2Rm |h(u)| = O(�),RRm |h(u)| du < 1,

RRm |Fh(!)| d! < 1

n ! 1

� ! 0(Shimodaira 2008)

Smooth surface

non-smooth surface

(Pham, Sheridan and Shimodaira, arXiv:1704.06017)• What drives the growth of real-world networks across diverse fields? • Using two interpretable and universal mechanisms: (1) Talent: the intrinsic ability of a node

to attract connections (fitness 𝜂!) and (2) Experience: the preferential attachment (PA) function 𝐴" governing the extent to which having more connections makes a node more/less attractive for forming new connections in the future.

Selective inference (Terada and Shimodaira, arXiv:1711.00949)

Key finding: Although both talent and experience contribute to the growth process, the ratios of their contributions vary greatly.

Probabilistic Multi-view Graph Embedding (PMvGE)

Theorem (universal approximation): Inner product + Neural networks≈ Arbitrary similarity + Arbitrary transformation

(Mercer’s theorem + universal approximation of NN)

Neural network data vector (in view-d)Strength of association (≧0)

Advantages:(1) PMvGE with neural networks is proved to be highly expressive. (2) PMvGE approximately non-linearly generalizes CDMCA

(Shimodaira 2016, Neural Networks) which already generalizes various existing methods such as CCA, LPP, and Spectral graph embedding.

(3) Likelihood-based estimation of neural networks is efficiently computed by mini-batch Stochastic Gradient Descent.

Probability that node 𝑣! gets a new edge at time 𝑡 =

Multimodal Eigenwords (Fukui, Oshikiri and Shimodaira, Textgraphs 2017)

- “brown” + “white” =

- “day” + “night” =

• A multimodal word embedding that jointly embeds words and corresponding visual information

• We employed Cross-Domain Matching Correlation Analysis (CDMCA; Shimodaira 2016) for extending a CCA-based word embedding (Dhillon et al. 2015) to deal with complex associations

• Feature learning via graph embedding

• Feature vectors reflect both semantic and visual similarities

• The method enables multimodal vector arithmetic between images and words

Our proposed method:

Multimodal vector arithmetic:

Mathematical Statistics TeamShimodaira LabGraduate School of Informatics, Kyoto University

• Inductive inference, Resampling methods, Information geometry• Generalization error under Missing, Covariate-shift, etc.• Multi-modal data representation, Graph Embedding, and Multivariate Analysis• Network growth mechanism (Preferential attachment and fitness)• Phylogenetics, Gene expression (hierarchical clustering)• Image, word embedding (search and reasoning)

Hidetoshi Shimodaira (Team Leader @ Kyoto Univ), Kazuki Fukui (RA), Akifumi Okuno (RA), Tetsuya Hada (RA), Masaaki Inoue (RA), Geewook Kim (RA), Thong Pham (Postdoctoral Researcher @ AIP), Tomoharu Iwata (Visiting Scientist @ NTT), Yoshikazu Terada (Visiting Scientist @ Osaka Univ), Shinpei Imori (Visiting Scientist @ Hiroshima Univ), Kei Hirose (Visiting Scientist @ Kyushu Univ )

Statistics and Machine Learning: Methodology and Applications

(Okuno, Hada and Shimodaira, arXiv:1802.04630)Multi-view feature learning with many-to-many associations via neural networks for predicting new associations

PAFit: an R Package for Estimating Preferential Attachment and Node Fitness in Temporal Complex Networks

Talent vs. Experience

Experience

APPAPPTalent

Experience Talent

LFB LFB

top related