the solr power

17
Tareque Hossain Sr. Software Engineer The Power

Upload: tareque-hossain

Post on 26-Jan-2015

733 views

Category:

Technology


5 download

DESCRIPTION

Motivation for using solr as a NoSQL backend

TRANSCRIPT

Page 1: The solr power

Tareque  Hossain  Sr.  Software  Engineer  

 

The   Power  

Page 2: The solr power

What  about  it?  

•  We  always  associate  solr  with  searching  •  solr  can  also  serve  as  your  non-­‐relational  data  layer  

Page 3: The solr power

NoSQL  ?  solr  ?  

Page 4: The solr power
Page 5: The solr power

Why  solr?  

•  Hey  solr  is  already  part  of  my  stack  •  I  love  solr  •  It’s  fast,  scalable  and  there  are  some  great  python              interfaces  out  there  

Page 6: The solr power

When  would  you  consider  it?  

•  You  have  a  DB  that’s  frequently  read  and  infrequently  written  

•  You  want  robust  search  &  filtering  on  your  data  

•  You  want  to  leverage  the  faceting  feature  •  You  want  a  decently  scalable  data  layer  

Page 7: The solr power

What’s  not  so  cool?  

•  Doesn’t  support  transactions  •  Not  all  SQL  queries  can  be  translated  into  solr  queries  

•  Generating  indices  can  take  a  long  time  •  Searching  and  indexing  at  the  same  time  brings  down  performance  

Page 8: The solr power

But..  

•  You  don’t  have  to  give  up  your  relational  data  layer  

•  Create  a  non-­‐relational  layer  on  top  of  your  relational  data  layer  

•  Get  best  of  the  both  worlds  

Page 9: The solr power

So  what’s  the  use  case?  

•  We  deal  with  medical  survey  data  •  Say:  – About  300  multiple  choice  questions  – Responses  can  be  multi-­‐dimensional  – 7000+  different  answer  choices  per  question  – 2000+  respondents  per  survey  –  15+  surveys  and  growing  

Page 10: The solr power

Osteoarthritis  Rheumatoid  Arthritis  

Traumatic  Arthritis  

Psoriatic  Arthritis   Other  

Less  than  a  year  ago   þ   ☐   ☐   ☐   ☐  

More  than  a  year  ago   ☐   ☐   þ   ☐   ☐  

When  were  you  diagnosed  with  the  following  types  of  Arthri5s?  

What  a  survey  question  looks  like  

Page 11: The solr power

When  were  you  diagnosed  with  the  following  types  of  Arthri5s?  

Osteoarthritis  Rheumatoid  Arthritis  

Traumatic  Arthritis  

Psoriatic  Arthritis   Other  

Less  than  a  year  ago   1   0   0   0   0  

More  than  a  year  ago   0   0   1   0   0  

Storing  a  single  response  

Page 12: The solr power

When  were  you  diagnosed  with  the  following  types  of  Arthri5s?  

Osteoarthritis  Rheumatoid  Arthritis  

Traumatic  Arthritis  

Psoriatic  Arthritis   Other  

Less  than  a  year  ago   63   155   19   27   268  

More  than  a  year  ago   190   46   8   213   325  

Aggregating  over  2000  responses  

Page 13: The solr power

The  Document  Structure  

•  Each  survey  response  =  solr  document  •  Up  to  3000  boolean  variables  per  document  indicating  chosen  answers  

•  Added  meta  information:  age,  profession,  interests  

Page 14: The solr power

Querying  

•  Filter  by  age,  interest,  profession  •  Facet  across  boolean  field  •  Result:  what  group  of  people  chose  what  group  of  answers  

 

Page 15: The solr power

Why  solr  is  awesome..  

•  Faceting  across  boolean  field  uses  very  little  memory  

•  Combining  3000  fields  for  2000  documents  takes  1  ~  2  ms  

•  Allowed  us  to  reduce  API  response  time  from  a  variable  of  2  ~  15  seconds  (sucked!)  to  an  almost  constant  ~50  ms  

 

Page 16: The solr power

Good  to  know..  

•  sunburnt:  Awesome  python  solr  interface          github.com/tow/sunburnt  

•  Programmatic  querying  as  well  as  raw  queries  

•  Supports  most  advanced  solr  options  •  If  you  only  required  facets,  specify  rows=0  

Page 17: The solr power

Questions?  

•  wisertogether.com  •  slideshare.net/tarequeh/the-­‐solr-­‐power  •  @tarequeh