keynote: solr- past, present & future
TRANSCRIPT
Solr: Past, Present & Future Yonik Seeley LucidWorks
Origins of Solr • CNET driven to find alterna6ves to discon6nued commercial enterprise search product
• Plan A: ATOMICS (Apache TO MySQL In CNET Search) – Standalone server speaking XML over HTTP – Meet majority of “search” needs – hLp://conferences.oreillynet.com/cs/mysqluc2005/view/e_sess/7066
• Plan B: “Something based on Lucene” – Started Summer 2004 – First prototype called “Fusion”, later renamed SOLAR (Search On Lucene And Resin)
Origins of the first Solr admin UI
New admin UI
Timeline (up to 1.4)
Ini6al prototype
CNET produc6on
CNET contributes Solr to ASF
Solr graduates
from Incubator
Simple face6ng
replica6on
highligh6ng, dismax
Spellchecking, CSV, Luke
MLT, Update Request
Processors
QParsers Search Components
Mul6-‐core
Distributed Search
Data Import Handler
JMX
1.3 1.4
Sta6s6cs Component
Java Replica6on
Terms and TermVector Components
Mul6-‐select face6ng
Dynamic Clustering
1.1 1.0
1.2
4.0 3.1
Solr 4 • Solr Cloud – Distributed Indexing – No single points of failure – Near Real Time friendly (push replica6on)
• NoSQL feature set – Update Durability – Real-‐6me get – Atomic Updates – Op6mis6c Concurrency
• Pseudo-‐join, Pivot Face6ng, Pseudo-‐fields, etc
What search solu6on/version are you currently using?
Recent Enhancements
Document Rou6ng
80000000-‐bfffffff
00000000-‐3fffffff
40000000-‐7fffffff
c0000000-‐ffffffff
shard1 shard4
shard3 shard2
id = BigCo!doc5
1f27 3c71
(MurmurHash3)
q=my_query shard.keys=BigCo!
1f27 0000 1f27 ffff to
(hash)
shard1
numShards=4 router=compositeId
Seamless Online Shard Splijng
Shard2_0
Shard1
replica leader
Shard2
replica leader
Shard3
replica leader
Shard2_1
1. New sub-‐shards created in “construc6on” state 2. Leader starts forwarding applicable updates, which
are buffered by the sub-‐shards 3. Leader index is split and installed on the sub-‐shards 4. Sub-‐shards apply buffered updates then become
“ac6ve” leaders and old shard becomes “inac6ve”
update
Cloud Enhancements • Request forwarding – In a mul6-‐collec6on cluster, any node can handle/forward requests for any collec6on
• Collec6on Aliases http://localhost:8983/solr/admin/collections ?action=CREATEALIAS &name=northeast &collections=NY,NJ,PA,CT,ME,MA,NH,RI,VT
• Coming Soon: Shard Aliases
Schema REST API
• Restlet is now integrated with Solr • Get a specific field curl http://localhost:8983/solr/schema/fields/price {"field":{ "name":"price", "type":"float", "indexed":true, "stored":true }}
• Get all fields curl http://localhost:8983/solr/schema/fields
• Get En6re Schema! curl http://localhost:8983/solr/schema
Dynamic Schema
• Add a new field (Solr 4.4) curl -‐XPUT http://localhost:8983/solr/schema/fields/strength -‐d ‘
{"type":”float", "indexed":"true”} ‘
• Works in distributed (cloud) mode too! • Future: More schemaless – Reality: there is no such thing for Lucene based systems – Type guessing for fields we haven’t seen before
Future • Greater scalability • More “NoSQL” – More ways to update & manipulate documents
• Analy6cs – More powerful face6ng, func6ons, sta6s6cs
• Improved Rela6onal queries • More dynamic (sejngs & configura6on) • Con6nued focus on ease of use
Thank You!