pinot: realtime distributed olap datastore
Post on 18-Feb-2017
2.657 Views
Preview:
TRANSCRIPT
Agenda
• Pinot @ LinkedIn - Current
• Pinot - Architecture
• Pinot Operations
• Pinot @ LinkedIn - Future
Tuesday, August 18, 15
• 100B documents
• 1B documents ingested per day
• 100M queries per day
• 10’s of ms latency
• 30 tables in prod, 250 * 3 std app nodes
Pinot @ LinkedIn
Tuesday, August 18, 15
Key features
SQL-likeinterface
Columnar storage and
indexing
Real-timedata load
Tuesday, August 18, 15
(S)QL: Filters and Aggs
SELECT count(*) FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND'day' >= 15949 AND 'day' <= 15963 ANDpaid = 'y’ ANDaction = 'stop'
Tuesday, August 18, 15
(S)QL: Group BySELECT count(*) FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND'day' >= 15949 AND 'day' <= 15963 ANDpaid = 'y’ GROUP BY action
Tuesday, August 18, 15
(S)QL: ORDER BY and LIMITSELECT * FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND entityId = 1000 AND action = 'start' ORDER BY creationTime DESC LIMIT 1
Tuesday, August 18, 15
Whats not supported• JOIN: unpredictable performance
• NOT A SOURCE OF TRUTH
• Mutation
Tuesday, August 18, 15
Pinot• Data flow
• Query Execution
• How to use/operate
• Pinot @ LinkedIn - Future
Tuesday, August 18, 15
Broker Helix
Realtime
Historical
Kafka Hadoop
PinotArchitecture
Queries
RawData
Tuesday, August 18, 15
Pinot Segment layout: Other techniques• Indexes: Inverted index, Bitmap, RoaringBitmap
• Compression: Dictionary Encoding, P4Delta
• Multi Valued columns, skip lists,
• Hyperloglog for unique
• T-digest for Percentile, Quantile
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
Brokers
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
4. Process Request &
send response
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
4. Process Request &
send response
5. Gather Response
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
4. Process Request &
send response
5. Gather Response
6. Return Response
Tuesday, August 18, 15
Pinot Query Execution: Single Node Architecture
EXECUTION ENGINE
INVERTED INDEX
BITMAP INDEX
COLUMN FORMAT
PLANNER
Tuesday, August 18, 15
Pinot Query Execution: Single Node Architecture
SELECT campaignId, sum(clicks)FROM Table AWHERE accountId = 121011
AND'day' >= 15949
GROUP BY campaignId
account Id daycampaign Id click
Filter Operator
Projection Operator
Aggregation Group by Operator
Combine Operator
Pinot Segments
Data sources
Matching doc ids
campaignId,Click tuple
Tuesday, August 18, 15
Cluster Management: Deployment
Helix
Brokers
Servers
• Brokers and Servers register themselves in Helix
• All servers start with no use case specific configuration
Controller
Tuesday, August 18, 15
On boarding new use case
Helix
Brokers
Servers
XLNT XLNT
XLNT
Create Table command
Controller
XLNT
XLNTTag
ServersTableName
Brokers3
XLNT_T1
1
Tuesday, August 18, 15
Segment Assignment
Servers
S3
S2
S1
Upload Segment S2
S1
S3
S2
S1
S3
Helix
Brokers
CopiesTableName
2XLNT_T1
Controller
Tuesday, August 18, 15
• AUTO recovery mode: Automatically redistribute segments on failure/addition of new nodes
• Custom mode: Run in degraded mode until node is restarted/replaced.
Pinot - Fault tolerance/Elasticity
Tuesday, August 18, 15
Pinot vs Druid
Druid Pinot
Architecture Realtime + Offline, Realtime only Realtime + Offline Realtime only -> consistency is hard and
schema evolution/Bootstrap is hard
Inverted Index Always On all columns, Fixed
Configurable on per column basis
Allows trade off between scanning v/s inverted index + scanning. More data can be
fit in given memory size
Data organization N/A Sorts dataOrganizing data provides speed/better compression and removes the need for
inverted index
Smart pre- materialization N/A star-tree Allows trade off between latency and space
Query Execution Layer Fixed Plan Split into Planning
and executionSmart choices can be made at runtime
based on metadata/query.
Tuesday, August 18, 15
• Documentation & tooling
• In progress - consistency among real time replicas.
• Improve cost to serve - leverage SSD, partial pre materialization
• ThirdEye - Business Metrics Monitoring
Pinot - Future
Tuesday, August 18, 15
top related