flink技术栈及其适用场景3. rocksdb: kv store rocksdb issue-1988 performance problem ....

27
Flink 技术栈及其适用场景 @ 2017/05 [email protected]

Upload: others

Post on 21-May-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Flink技术栈及其适用场景@ 2017/05

[email protected]

Who am i

• @

• Sohu -> Alibaba -> Huawei

• Work on high performance computing Spark Flink

2

Flink

Streaming Compute Frameworks

Flink

Apache Gearpump (incubating) -Intel

Amazon Kinesis Streams

Aliyun StreamCompute

Azure stream-analytics

Apache Edgent (incubating) -IBM

-Twitter

4

Flink

RuntimeDistributed Streaming DataFlow

DataStream APIStream Processing

DataSet APIBatch Processing

CEP

complex event

process

Table

&

SQL

Table

&

SQL

ML

machinelearning

Gelly

graphprocess

LocalSingle JVM

ClusterYarn, Mesos,Standalone

CloudGCE, EC2

Deploy

Core

API

LibraryKafka

casandra

elasticsearch

flume

rabbitmq

kinesis

redis

twitter

5

Open source community Status

0

500

1000

1500

2000

Committer Contributers Commits PR Star Fork

157.3224.1231

1,090.3

30526

1,177.61,254

493

1,945.1

1,064

50

Spark Flink

Flink example

7

case class WordWithCount(word: String, count: Long)

val windowCounts = env .socketTextStream(hostname, port, '\n') .flatMap { w => w.split("\\s") } .map { w => WordWithCount(w, 1) } .keyBy("word") .timeWindow(Time.seconds(5)) .sum("count")

env.execute("Socket Window WordCount")

Data Source

Transform Function

Execution

Flink architecture

Flink Inside

build graphstreaming graph

StreamGraph

sources

sinks

streamNodes

transform

ExectutionGraph resourcesnote 1

slot group, tasksnode 2

slot group, tasksnode 3

slot group, tasks

1

2

3

RuntimeSimple Runtime of Flink

1

4

3

8

2

Source Source

parall:4

5.25 5.3 5.4

parall:4

6.26 6.3 6.4

from

slot group, tasks

2

52

51

53

slot group, tasks

54

62

61

63

slot group, tasks

1 31

resource manager

JobManager

Scheduler

TaskManager…

where

sink, union

State backend and checkpoint

Picture is from Flink community.

persist the running state: 1. memory(default 5M) 2. fs: support hdfs and dir 3. rocksDB: kv store rocksDB issue-1988

performance problem

windows

• Tumbling Window

• Sliding Window

• Session Window

• Global Window

.window(TumblingEventTimeWindows.of(Time.seconds(5))).(<window function>)

.window(SlidingProcessingTimeWindows.of(Time.seconds(10), Time.seconds(5)))

.window(EventTimeSessionWindows.withGap(Time.minutes(10)))..

Window 1 Window 2 Window 3 Window 4

Window 1

Window 3

Window 2

slide

gap

Window 1 Window 2 Window 3 Window 4

time based

Stream SQL

DataStream/DataSetAPI

Run0me:Statefulstreamprocessing

TableAPI

SQL

Declarative DSL

Hight Level Language

Core API

Low-level build block

Calcite in Flink SQLSELECT STREAM CEIL(rowtime TO HOUR) AS rowtime, productId, COUNT(*) AS c, SUM(units) AS units FROM Orders GROUP BY CEIL(rowtime TO HOUR), productId;

rowtime productId c unit

11:00:00 20 3 12

11:00:00 3 2 3

12:00:00 4 4 44

.. .. .. ..

Streaming

now2016.11.11

table

now1970.01.012016.11.01

SELECT STREAM * FROM OrderA WHERE amount > 2

calcite parsetree Table

optimizelogic/physical plan

functiontext DataStream

execute() Task Janinojava object

runtimoperator

Flink SQL•

1. Project Filter Union UDAGG

2. Row-Window over-window

3. Group-Window 4. Calcite 1.12.0 HOP/SESSION/TUMBLE Flink

• 1. join

join join PR

2. SQL

sort/limit inner-query distinct PR 3. dynamic table

Retraction ——FLINK-6047

our work plan

• Security enhance

• Stream SQL develop

• performance optimization

Security enhance

• auth client and jm, jm and tm, tm and tm, with hadoop/zk/kafka

• security transfer: SSL enable

• web security header

• zk ACL

total fixed: nearly one hundred issues

slide window

window(SlidingEventTimeWindows.of(Time.seconds(20), Time.seconds(5)))

20/5=44 /

others• exactly-once

• fault-tolerance

• stateful

• gelly

• FlinkML

• storm compatible API

• RM: yarn, mesos, kubernetes, docker

use case

• Real-time risk control

• ETL

• Real-time analyze

• Anti-fraud

• cloud service

Real-time risk control• millisecond level less than 100ms

• nearly million events per second

• rules Stream API, Window API, SQL

• responseAPI Gateway: -> user connection -> request(source data) -> Flink process(distributed) -> response(sink data) -> routing to origin connection -> user connection, flush data

Anti-fraud• based on the rules

• based on the offline analyze model

• graph framework: GraphX, Graphite, neo4j, Gelly

data ���������

Users

• community detection

• GN

• Label Propagation Algorithm

• KCore Subgraph

• Local Expansion

• Particle Competition

• Game Theory

• PageRank

cloud service• AWS: Kinesis

• Aliyun: StreamCompute

• Azure: stream-analytics

StreamSQL

Serverless

Mesos/Kubernetes

What I image like: 1. offline model used online 2. API Gateway 3. cloud hidden the detail 4. Stream ML and Graph

DataSet:training data

model

sink

n

n

FlinkML

Gelly

DataStream:source

DataStream:source x

API Gateway

Users

predict()model

Stream API CEP SQL

cloudVPC

SDK

My translated book

4

Welcome to Huawei. 1. UED 2. Scala developer shij[email protected]