andoni beyondlsh mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf ·...

18
Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)

Upload: others

Post on 13-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Beyond    Locality  Sensitive  Hashing  

Alex  Andoni    (Microsoft  Research)  

Joint  with:  Piotr  Indyk  (MIT),  Huy  L.  Nguyen  (Princeton),  Ilya  Razenshteyn  (MIT)  

Page 2: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Nearest  Neighbor  Search  (NNS)  •     

Page 3: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Motivation  •  Generic  setup:  

•  Points  model  objects  (e.g.  images)  •  Distance  models  (dis)similarity  measure  

•  Application  areas:    •  machine  learning:  k-­‐NN  rule  •  image/video/music  recognition,  deduplication,  bioinformatics,  etc…  

•  Distance  can  be:    •  Hamming,  Euclidean,  …  

•  Primitive  for  other  problems:  •  find  the  similar  pairs,  clustering…  

000000 011100 010100 000100 010100 011111

000000 001100 000100 000100 110100 111111

Page 4: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Approximate  NNS  •     

q

r p

cr

Page 5: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Locality-­‐Sensitive  Hashing  •     

q

p

1

[Indyk-Motwani’98]

q

“not-­‐so-­‐small”

Page 6: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Locality  sensitive  hash  functions  

•     

6

[Indyk-Motwani’98]

1

Page 7: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Algorithms  and  Lower  Bounds  Space Time Comment Reference

[IM’98]

[PTW’08, PTW’10]

[IM’98]

[DIIM’04, AI’06]

[MNP’06]

[OWZ’11]

[PTW’08, PTW’10]

[MNP’06]

[OWZ’11]

Page 8: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

LSH  is  tight…  

leave  the  rest  to  cell-­‐probe  lower  bounds?  

Page 9: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Main  Result  

•     

9

Page 10: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

A  look  at  LSH  lower  bounds  

•     

10

[O’Donnell-­‐Wu-­‐Zhou’11]  

Page 11: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Why  not  NNS  lower  bound?  

•     

11

Page 12: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Our  algorithm:  intuition  

•     

12

Page 13: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Nice  Configuration:  “sparsity”  

•     

13

Page 14: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Reduction:  into  spherical  LSH  

•     

14

Page 15: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Two-­‐level  algorithm  

•     

Page 16: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Details  

•     

16

Page 17: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Practice  •  Practice  uses  data-­‐dependent  partitions!  

•  “wherever  theoreticians  suggest  to  use  random  dimensionality  reduction,  use  PCA”  

•  Lots  of  variants  •  Trees:  kd-­‐trees,  quad-­‐trees,  ball-­‐trees,  rp-­‐trees,  PCA-­‐trees,  sp-­‐trees…  

•  no  guarantees:  e.g.,  are  deterministic  

•  Is  there  a  better  way  to  do  partitions  in  practice?  

• Why  do  PCA-­‐trees  work?  •  [Abdullah-­‐A-­‐Kannan-­‐Krauthgamer]:  if  have  more  structure  

17

Page 18: andoni beyondLSH mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf · Algorithms’and’Lower’Bounds’ Space Time Comment Reference [IM’98] [PTW’08, PTW’10] [IM’98]

Finale  

•     

18