Download - 准实时海量数据分析系统架构探究
![Page 2: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/2.jpg)
准实时海量数据分析系统架构探究
主要内容• 离线vs准实时
• 它山之石
• 它山之石各自亮点
• 核心技术挑战
• 数据平台如何出发
Data eXchange Platform | zhouchen.zm
![Page 3: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/3.jpg)
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
离线vs准实时
离线海量查询 准实时海量查询
离线批量查询 交互式查询
响应速度在小时/天级别 响应速度在秒/分级别
行存储 列存储
批量读/批量写 只读
大量网络间数据交换 尽量少的网络间数据交换
支持所有类型的Join 只支持大表对小表的Join
支持UDF 一般不支持UDF
内部使用较多 易对外产品化
![Page 4: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/4.jpg)
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
Puzzle
准实时可以替代掉离线吗?
![Page 5: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/5.jpg)
准实时海量数据分析系统架构探究
• Google Dremel
• Cloudera Impala
• Apache Drill
• LinkedIn SenseiDB
• Google PowerDrill (In-memory)
• UC Berkeley Shark (In-memory)
Data eXchange Platform | zhouchen.zm
类似的技术
![Page 6: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/6.jpg)
Hive
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 7: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/7.jpg)
Hive
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 8: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/8.jpg)
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• SQL:INSERT INTO TABLE pv_users
SELECT pv.pageid, u.age
FROM page_view pv JOIN user u ON (pv.userid = u.userid);
pageid userid time
1 111 9:08:01
2 111 9:08:13
1 222 9:08:14
userid age gender
111 25 female
222 32 male
pageid age
1 25
2 25
1 32
X =
page_viewuser
pv_users
![Page 9: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/9.jpg)
key value
111 <1,1>
111 <1,2>
222 <1,1>
pagei
d
userid time
1 111 9:08:01
2 111 9:08:13
1 222 9:08:14
userid age gender
111 25 female
222 32 male
page_view
user
key value
111 <2,25>
222 <2,32>
Map
key value
111 <1,1>
111 <1,2>
111 <2,25>
key value
222 <1,1>
222 <2,32>
ShuffleSort
pagei
pageid
Reduce
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 10: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/10.jpg)
key value
111 <1,1>
111 <1,2>
222 <1,1>
time
9:08:01
9:08:13
9:08:14
gender
female
male
page_viewpv_users
key value
111 <2,25>
222 <2,32>
Map
key value
111 <1,1>
111 <1,2>
111 <2,25>
key value
222 <1,1>
222 <2,32>
ShuffleSort
pagei
d
age
1 25
2 25
pageid age
1 32
Reduce
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 11: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/11.jpg)
Dremel-嵌套列存储
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 12: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/12.jpg)
Dremel-多级分布式查询树
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 13: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/13.jpg)
Dremel-近似计算
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 14: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/14.jpg)
Dremel-其它
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• “in situ”—不需要导入
• 只读数据集
![Page 15: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/15.jpg)
Impala
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• Frontend– Parser
– Planner
• Backend– Coordinator
– ExecNode Tree
• StateStore
• Sparrow
![Page 16: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/16.jpg)
Impala
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 17: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/17.jpg)
Impala
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 18: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/18.jpg)
Impala
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 19: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/19.jpg)
Impala
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 20: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/20.jpg)
Impala
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 21: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/21.jpg)
Impala
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 22: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/22.jpg)
Dremel vs Impala
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• 相同点– 嵌套列存储– 分布式聚集(Aggregation)– 都不需要导入数据
• 不同点– Dremel闭源, 以BigQuery商业化; Impala以APL开源– Impala号称可以直接从DataNode读取数据– Impala在backend采用llvm做Just-In-Time优化– Impala无frontend负载均衡– Dremel可以让用户选择抛弃长尾, 极大提高响应速度– Dremel采用多级分布式查询树, Impala采用类MPP架构– Dremel成熟; Impala不稳定, 实测达不到其宣传的效率
![Page 23: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/23.jpg)
PowerDrill
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• In memory
• 组合范围分区
• 双层字典
• 非常高效的压缩
• 记录重排序
• 近似count distinct
![Page 24: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/24.jpg)
PowerDrill: 双层字典
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
SELECT search_string, COUNT(*) as c FROM dataWHERE search_string IN ("la redoute", "voyages sncf")GROUP BY search_stringORDER BY c DESC LIMIT 10;
![Page 25: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/25.jpg)
PowerDrill vs Dremel
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
PowerDrill Dremel
存储 In-Memory On-Disk
格式 列式, 组合范围分区 列式, 也可处理行式
ETL 需要加载 无须加载
响应时间 毫秒~秒级 秒~分钟级
量级 TB PB
压缩 大量压缩优化 普通字典压缩或无压缩
I/O效率 skip 掉绝大部分I/O, Google : 92.41%
能skip无用列
![Page 26: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/26.jpg)
SenseiDB
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• 全文索引
• 流式增量更新数据
• 响应速度快
• faceting Group-by Aggregation
• 生产级别
• APL开源
![Page 27: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/27.jpg)
列存储
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• 扫描更少的列
• 更易于压缩
![Page 28: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/28.jpg)
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
Puzzle
每列存成一个文件vs
所有列按列存储成一个文件vs
每组Column Family存储一个文件vs
水平切分成行组, 组内列存储成一个文件哪种方案更好?
![Page 29: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/29.jpg)
顺序扫描与随机I/O
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
如果一张表有10,000行, 每行平均为100字节. 那么随机定位某一行, 还不如全部扫描整张表.
• 磁盘设备
• 查询选择率(Selectivity)和与索引的关系
• 查询优化器
• 分布式文件系统的随机I/O
![Page 30: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/30.jpg)
分布式文件系统的I/O
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• 顺序扫描的消耗– 磁盘顺序I/O (Prefetching 可优化)
– Packet传输
– Crc32C
• 随机I/O的消耗– 向ChunkMaster/NameNode定位块位置(可Cache在Client本地)
– 与ChunkServer/DataNode的TCP握手
– 磁盘随机I/O
– Packet传输
– Crc32CTIPS: 如果随机I/O的seek跳转的目标位置小于TCP窗口大小, 则从本地Cache读
![Page 31: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/31.jpg)
恢复行关系
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• 随机I/O消耗
• 内存消耗
• 跨机器读取的网络消耗
![Page 32: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/32.jpg)
压缩
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• 通用压缩: Deflate, Snappy, LZO, Gzip, LZ4, LZMA• 专用字典编码• 熵编码法: 哈夫曼, 算术编码• 公共子串压缩
– 对于有序的串– 前缀树, 后缀树
• 整型列压缩– VInt系, Google Group VarInt– Simple9, Simple16– PForDelta, NewPFD– Golumb Code, Rice Code
![Page 33: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/33.jpg)
TFile & Zebra
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 34: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/34.jpg)
Zebra的优劣
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• 优– 采用C-Store的思想,常用列合成一个Group, 益于恢复行关系
• 劣– 领域设计者难定义Column Group
– 跨group扫描还是远程读取
– 会读取并解析一些未操作的列
![Page 35: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/35.jpg)
RCFile
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 36: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/36.jpg)
RCFile的优劣
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• 优– 由于局部相似性, 压缩率较SequenceFile高10~20%
– 惰性解压, 节省CPU时间
• 劣– Filter操作时不能Skip掉IO
– 没有针对列存储类型相同这一特点采用更高效的压缩算法
PS 不操作的列也会读取, 只是不解压
![Page 37: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/37.jpg)
InfoBright文件格式
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• 按列存储
• 每64K行存成一个Data Pack
• 每个Data Pack对应一个Data Pack Node, 记录该Pack的min, max, count和sum
• 超过2G rotate一个新文件
![Page 38: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/38.jpg)
InfoBright 架构
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
![Page 39: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/39.jpg)
InfoBright查询示例
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
SELECT SUM(B) FROM T WHERE A > 6;
![Page 40: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/40.jpg)
作业调度
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• 吞吐量更高
• 延迟更低
• 考虑Data Locality
• 负载均衡
• 推测执行
• 出错重跑
• Sparrow
![Page 41: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/41.jpg)
SQL第一大难题: Join
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• Join会带来大量网络间数据交换
• 只支持大表对小表的Join– Google BigQuery要Join的小表压缩后不能超过8MB
![Page 42: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/42.jpg)
SQL第二大难题: distinct aggregation
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
•distinct aggregation也会带来大量网络间数据交换 , 是除Join的第二难题
•Google BigQuery只支持count(distinct c)
Google面试题:
给你一年的Google搜索日志和一台有限内存的机器,能否只扫描一遍,估计这一年中不同的独特的搜索(unique queries)的个数
•Hash Table
•Bitmap
•Linear Probabilistic Counter
•HyperLogLog
Big Data Counting: How To Count A Billion Distinct Objects Using Only 1.5KB Of Memory
![Page 43: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/43.jpg)
SQL第三大难题: Top K
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• Top K是准实时系统的第三个挑战
• Graham Cormode and S. Muthukrishnan. An improved data stream summary: The Count-Min sketch and its applications. pages 29–38. 2004.
• Cheqing Jin, Weining Qian, Chaofeng Sha, Jeffrey X. Yu, and Aoying Zhou. Dynamically maintaining frequent items over a data stream. In CIKM ’03: Proceedings of the twelfth international conference on Information and knowledge management, pages 287–294, New York, NY, USA, 2003. ACM.
• Ahmed Metwally, Divyakant Agrawal, and Amr Abbadi. Efficient computation of frequent and top-k elements in data streams. pages 398–412. 2005.
![Page 44: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/44.jpg)
Bitmap索引
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• 第一个商用系统 Model 204, P. O’Neil, 1987
• 比创建B-Tree更快, 更容易
• 查询高效: 按位操作– A < 2 : b0 OR b1
– A > 2 : b3 OR b4 OR b5
• 多维查询高效: bitvector的与或非
• 求count等aggregation高效
![Page 45: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/45.jpg)
Bitmapped Group-set Index
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
SELECT c.mktsegment, o.order_idFROM orders o, customers cWHERE o.cust_id = c.cust_idGROUP BY c.mktsegment, o.order_id
![Page 46: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/46.jpg)
Bitmapped Join Index
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
SELECT o.customer_id, l.unit_price * l.quantityFROM lineitems l, orders oWHERE l.order_id = o.order_id
![Page 47: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/47.jpg)
集团现状与准实时
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• 结构化数据
• 典型数据仓库的星型/雪花型模型
• 目前需要ETL
• 比Google更适合做交互式分析产品及数据化产品
• 冰火鸟在这方面是空白
• 与ODPS结合
– CFile
– 元数据
– 调度
![Page 48: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/48.jpg)
参考文献
准实时海量数据分析系统架构探究
Data eXchange Platform | zhouchen.zm
• [Dremel] Dremel: Interactive Analysis of Web-Scale Datasets
• [PowerDrill] Processing a Trillion Cells per Mouse Click
• [RCFile] RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems
• [InfoBright] BrightHouse A New Database Engine based on Rough Sets
![Page 49: 准实时海量数据分析系统架构探究](https://reader034.vdocuments.pub/reader034/viewer/2022042816/55996af21a28ab046a8b479a/html5/thumbnails/49.jpg)
讨论
Data eXchange Platform | zhouchen.zm
准实时海量数据分析系统架构探究
作者:周忱 | 数据平台-DXP微博:@MinZhou