what is hydra?

Post on 20-Aug-2015

1.444 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

WHAT IS HYDRA? Findability Day 2012

Hydra is technology

Hydra brings structure

What is unstructured data?

•  A linguistic excuse?

News articles

Plain text that contains invaluable metadata for search, such as:

•  Title

•  Author byline

•  Lead paragraph

Hydra is about your data

•  Enrich your documents with metadata, to power your search

•  Language  detec+on  •  Sen+ment  analysis  

•  Headline  extrac+on  •  Regular  expression  matching  and  extrac+on  

•  Filter out unwanted documents

•  Collect statistics

•  Export to Staging environments

Before Hydra

Before Hydra

Hydra scales

Hydra Design Objectives

Scalability

•  Possible to connect any number of processing machines

Fault tolerance

•  Failiure of a stage affects only a single document

•  Failiures can be automaticly detected

Robustness

•  Stages and nodes are completely independent (no domino-

effect)

Development ease

•  Allow test driven pipeline development

 

What about Hadoop and Big Data?

Usecases for document enrichment

•  Pagerank  •  Analy+cs  Hadoop & Map/Reduce advantages •  Huge  scalability  •  Ability  to  work  on  en+re  document  set  at  once  

Hadoop & Map/Reduce drawbacks •  Batch  processing  •  Time-­‐to-­‐index  

Hydra integrated with Hadoop    

Blue – First round of indexing only Red – Second round of indexing Purple – All documents

Hydra in summary

Hydra

•  can chew through almost anything

•  has many heads

•  regenerates

•  scales

Hydra is Open Source

•  Other committers

•  The role of Findwise

For more information:

•  http://www.findwise.com/hydra

•  http://findwise.github.com/Hydra

•  Email: joel.westberg@findwise.com

Joel Westberg joel.westberg@findwise.com

@joelwes

top related