a near real time search and alert engine powered by solr lucene

17
1 A nearrealtime search and alert service based on SolR Lucene April 2013 www.visibium.com

Upload: visibium

Post on 19-May-2015

1.497 views

Category:

Technology


6 download

DESCRIPTION

The trade-off between scale and update rate that search engines face on the Web 2.0. How enhanced indexing and smart filtering enable near-real-time engines. SolR Lucene ultra-fast search server and the user-defined "websphere" (feeds and filters).

TRANSCRIPT

Page 1: A near real time search and alert engine powered by SolR Lucene

1

A near‐real‐time search and alert service based on SolR Lucene

April 2013                                                                                  www.visibium.com

Page 2: A near real time search and alert engine powered by SolR Lucene

2

The need

What’s new with NFC 

technology?

What is said on my 

competitors?

What is said by my 

competitors?

What’s said on my brand?

What’s said on key executives of my company?

What’s said on my last marketing 

campaign?

What’s said on my product launch? What’s said on 

my last ad campaign?

Page 3: A near real time search and alert engine powered by SolR Lucene

3

The need

What’s new with NFC 

technology?

What is said on my 

competitors?

What is said by my 

competitors?

What’s said on my brand?

What’s said on key executives of my company?

What’s said on my last marketing 

campaign?

What’s said on my product launch? What’s said on 

my last ad campaign?

Industry watch

Competition watch

Brand protection

Campaign Impact analysis

I need to permanently 

search the Web 2.0 on certain 

topics

Page 4: A near real time search and alert engine powered by SolR Lucene

4

The need

What’s new with NFC 

technology?

What is said on my 

competitors?

What is said by my 

competitors?

What’s said on my brand?

What’s said on key executives of my company?

What’s said on my last marketing 

campaign?

What’s said on my product launch? What’s said on 

my last ad campaign?

I need to permanently 

search the Web 2.0 on certain 

topics

I know where to look

I know what I’m looking for…

… and I want to get an alert whena new matching content is posted.

Within minutes, not the day after.

Page 5: A near real time search and alert engine powered by SolR Lucene

5

The Problem

I need to permanently 

search the Web 2.0 on certain 

topics

I want to get an alert whena new matching content is posted…

Some websites take days to get indexed by the major search 

engines (Google, Bing, Yahoo!…)

Alert services are as good as their indexing rate is. 

A day, not a minute, is the norm (except for breaking news and weather alerts). 

… within minutes, not the day after.

Real “real‐time search” engines(OneRiot, Wowd, 

Crowdeye, Collecta) failed as the technology involved massive 

R&D costs

Google closed its real time search service in 2011

Page 6: A near real time search and alert engine powered by SolR Lucene

6

The State of the Union

… within minutes, not the day after.Narrow look, deep digging Broad look, shallow digging

Page 7: A near real time search and alert engine powered by SolR Lucene

7

The State of the Union

… within minutes, not the day after.Narrow look, deep digging Broad look, shallow digging

Social Web Monitoring & Trending solutions• Look at big chunks of 

the Web• Detect trends, mood, 

new topics, influencers, etc.

Near‐real‐time search engines• Typically look at the 

most popular content feeds, and run indexing at frequent intervals (hence the near‐real‐time)

• Some offer powerful query tools.

Page 8: A near real time search and alert engine powered by SolR Lucene

8

The State of the Union

… within minutes, not the day after.Narrow look, deep digging Broad look, shallow digging

Social Web Monitoring & Trending solutions• Look at big chunks of the Web• Detect trends, mood, new 

topics, influencers, etc.• Typically can’t single out 

contributions on a match to a user‐defined query.

Near‐real‐time search engines• Typically look at the 

most popular content feeds, and run indexing at frequent intervals (hence the near‐real‐time)

• Some offer powerful query tools to users.

Page 9: A near real time search and alert engine powered by SolR Lucene

9

Let’s dig deep

Deep dig is about using powerful query toolswhich require full‐text indexing (among other things).

The lesser data the “nearer” real time.

So…

Full text indexing carriesa trade‐off betweenscale and update rate.

Page 10: A near real time search and alert engine powered by SolR Lucene

10

Let’s dig deeper

Deep dig is about using powerful query toolswhich require full‐text indexing (among other things).

Full text indexing carriesa trade‐off betweenscale and update rate.

The lesser data the “nearer” real time.

So … 2 directions fora nearer real time

Enhanced indexing

Smart selection of data to index

Page 11: A near real time search and alert engine powered by SolR Lucene

11

Enhanced indexing

What do Apple, Netflix, Wikipedia, LinkedIn eBay and Twitter have in common?

Page 12: A near real time search and alert engine powered by SolR Lucene

12

Enhanced indexing with SolR Lucene

What do Apple, Netflix, Wikipedia, LinkedIn eBay and Twitter have in common?

Page 13: A near real time search and alert engine powered by SolR Lucene

13

Enhanced indexing with SolR Lucene

Picking up the right tools for the job

Page 14: A near real time search and alert engine powered by SolR Lucene

14

Limiting the indexed data

Content feeds• Twitter public stream 

(fire hose)• Twitter private feeds• Facebook updates• Syndicated content 

(RSS)• Blogs, forums• News

SEARCH

• Watch• Queries

Matchingresults

Basic architecture

• Alerts• Dispatch

Page 15: A near real time search and alert engine powered by SolR Lucene

15

Limiting the indexed data

Selective architecture

SEARCH

Content feeds• Twitter public stream 

(fire hose)• Twitter private feeds• Facebook updates• Syndicated content 

(RSS)• Blogs, forums• News

Filtered data

index

FILTERS

• Geo (e.g. local search engine)• Audience (e.g. most popular)• Buzz (e.g. #tags)

• Watch• Queries

Matchingresults

• Alerts• Dispatch

Page 16: A near real time search and alert engine powered by SolR Lucene

16

Smart selection of data to index

User‐defined filters

SEARCH

Content feeds• Twitter public stream 

(fire hose)• Twitter private feeds• Facebook updates• Syndicated content 

(RSS)• Blogs, forums• News

Filtered data

index

FILTERS

User‐defined filters

• Watch• Queries• Refined queries (reprocessing)

Matchingresults

• Alerts• Dispatch

Page 17: A near real time search and alert engine powered by SolR Lucene

17

Visibium

• A near‐real‐time search and alert service• User‐defined feeds and filters• Full‐text indexing• Advanced queries• Refined search reprocessing• Powered by SolR Lucene

Monitor the slice of the web you really care about

© Visibium, 2011‐2013

www.visibium.com