reactive elasticsearch with akka streams

12
Reactive Elasticsearch with Akka Streams 第二十回 #渋谷java Naoki Takezoe @takezoen BizReach, Inc

Upload: takezoe

Post on 21-Jan-2018

4.599 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Reactive Elasticsearch with Akka Streams

Reactive Elasticsearchwith Akka Streams

第二十回 #渋谷java

Naoki Takezoe@takezoen

BizReach, Inc

Page 2: Reactive Elasticsearch with Akka Streams

Motivation

● Real-time indexing in Elasticsearch● Resiliece of indexing process● Tuning indexing load automatically

Page 3: Reactive Elasticsearch with Akka Streams

Basic structure

Source Flow Sink

Read Write

Pull from down stream

Push from upstream

Page 4: Reactive Elasticsearch with Akka Streams

Akka Streams

● An implementation of Reactive Streams based on Scala and Akka○ Java API is also available

● Akka HTTP○ Covers HTTP server and client applications

● Alpakka○ Provides various Akka Streams connectors

Page 5: Reactive Elasticsearch with Akka Streams

Example

// Sourceval source = Source(1 to 100)

// Sinkval sink = Sink.foreach { x: Int => println(x) }

// Just connect Source and Sink directlysource.runWith(sink)

// Put Flow between Source and Sinksource.map(x => x * 2).runWith(sink)

Page 6: Reactive Elasticsearch with Akka Streams

Elasticsearch connector is availavle in Alpakka!

Page 7: Reactive Elasticsearch with Akka Streams

ElasticsearchSource

val source = ElasticsearchSource.typed[Book]( "source", // Index name "book", // Type name """{"match_all": {}}""", // Query ElasticsearchSourceSettings(5) // Buffer size)

Page 8: Reactive Elasticsearch with Akka Streams

Implementation of ElasticsearchSource

● Scroll Elasticsearch and buffer documents● Push docs when pulled from down stream● Read next window if read buffer is empty

Page 9: Reactive Elasticsearch with Akka Streams

ElasticsearchSink

val sink = ElasticsearchSink.typed[Book]( "sink", // Index name "book", // Type name ElasticsearchSinkSettings(5) // Buffer size)

Page 10: Reactive Elasticsearch with Akka Streams

Implementation of ElasticsearchSink

● Pull until buffer is full if buffer isn't full● Indexing by bulk request buffered docs● Run these processes in parallel

Page 12: Reactive Elasticsearch with Akka Streams

TODO

● Automatic pull amount tuning○ By Looking up to metrics of Elasticsearch cluster

● Partial retry○ Retry failed documents when bulk request is failed

partially● Sliced scroll

○ Elasticsearch 5.x supports sliced scroll that makes possible to scroll in parallel