sql rally 2013 columnstore indexes
DESCRIPTION
TRANSCRIPT
Columnstore indexes
• Column Store vs. Row Store
• Columnstore benefits
• Columnstore indexes
• CS indexes Internals
• Adding data to Columnstore index
Row Store and Column Store
In row store, data is stored tuple by tuple.
In column store, data is stored column by column3
Row Store and Column Store
Most of the queries does not process all the attributes of a particular relation.
SELECT c.name, c.address FROM Customers cWHERE c.region = ‘Moskow'
4
id
name
city state age
address
Row Store and Column Store
So column stores are suitable for read-mostly, read-intensive, large data repositories
Row Store Column Store
(+) Easy to add/modify a record (+) Only need to read in relevant data
(-) Might read in unnecessary data (-) Tuple writes require multiple accesses
5
Compression
Trades I/O for CPU
Higher data value locality in column stores
Techniques such as run length encoding far more useful
Schemes
Null Suppression
Dictionary encoding
Run Length encoding
Bit-Vector encoding
Heavyweight schemes
6
Columnar storage structure
Uses VertiPaq
compression
C1 C2 C3 C5 C6C4
…
Pages
Row store:
Column store:
v
Accelerating Data Warehouse Queries with SQL Server 2012 Columnstore Indexes
9
Improved Data Warehouse Query performance
Columnstore indexes provide an easy way to significantly improve data warehouse and decision support query performance against very large data sets
Performance improvements for “typical” data warehouse queries from 10x to 100x
Ideal candidates include queries against star schemas that use filtering, aggregations and grouping against very large fact tables
10
Good Candidates for Columnstore Indexing
Table candidates:
Very large fact tables (for example – billions of rows)
Larger dimension tables (millions of rows) with compression friendly column data
If unsure, it is easy to create a columnstore index and test the impact on your query workload
Query candidates (against table with a columnstore index):
Scan versus seek (columnstore indexes don’t support seek operations)
Aggregated results far smaller than table size
Joins to smaller dimension tables
Filtering on fact / dimension tables – star schema pattern
Sub-set of columns (being selective in columns versus returning ALL columns)
11
Creating a columnstore index
T-SQL
SSMS
12
Defining the Columnstore Index
Columnstore index is nonclustered (secondary)
Base table can be clustered index or heap
One CS index per table
Multiple other nonclustered (B-tree) indexes allowed
But may not be needed
CS index must be partition-aligned if table is partitionedIndexed
viewFiltered index
Clustered index
Heap
Nonclustered index
Nonclustered index
Nonclustered columnstore
index
Base table
OR
Column Segments and Dictionaries
15
C1 C2 C3 C5 C6C4
Set of about 1M rows
Column Segment
segment 1
segment Ndictionaries
…
Memory management
16
SELECT C2, SUM(C4)FROM TGROUP BY C2;
T.C2T.C4
T.C2T.C4
T.C2
T.C2
T.C2T.C1
T.C1
T.C1
T.C1
T.C1T.C3
T.C3
T.C3
T.C3
T.C3
T.C4
T.C4
T.C4
• Memory management is automatic
• Columnstore is persisted on disk
• Needed columns fetched into memory
• Columnstore segments flow between disk and memory
v
Look inside Columnstore Indexes
17
Xvelocity
Microsoft SQL Server family of memory-optimized and in-memory technologies
xVelocity In-Memory Analytics Engine
xVelocity Memory-Optimized Columnstore Indexes
The xVelocity engine is designed with 3 principles in mind:
Performance, Performance, Performance! 18
How Are These Performance Gains Achieved?
Two complimentary technologies:
Storage
Data is stored in a compressed columnar data format (stored by column) instead of row store format (stored by row).
New “batch mode” execution
Vector-based query execution capability
Data can then be processed in batches versus row-by-row
Depending on filtering and other factors, a query may also benefit by “segment elimination” - bypassing million row chunks (segments) of data, further reducing I/O
19
Batch mode processing
Process ~1000 rows at a time
Vector operators implemented
Greatly reduced CPU time (7 to 40X)bi
tmap
of
qual
ifyin
g ro
ws
Column vectors
Batch object
Segment Elimination
column_id
segment_id min_data_id max_data_id
1 1 20120101 20120131
1 2 20120115 20120215
1 3 20120201 20120228
• Segment (rowgroup)= 1 million row chunk• Min, Max kept for each column in a segment• Scans can skip segments based on this
select Date, count(*) from dbo.Purchase where Date >= 20120201 group by Date
skipped
Columnstore format + batch mode Variations
Columnstore indexing alone + traditional row mode in Query Processor
Columnstore indexing + batch mode in Query Processor
Columnstore indexing + hybrid of batch and traditional row mode in Query Processor
23
Plan operators supported in batch mode
Filter
Project
Scan
Local hash (partial) aggregation
Hash inner join
(Batch) hash table build
24
v
Query processing with Columnstore Indexes
25
Maintaining Data in a Columnstore Index
Once built, the table becomes “read-only” and INSERT/UPDATE/DELETE/MERGE is no longer allowed
ALTER INDEX REBUILD / REORGANIZE not allowed
How can I modify index data?
Drop columnstore index / make modifications / add columnstore index
UNION ALL (but be sure to validate performance)
Partition switches (IN and OUT)27
v
Insert data into table with Columnstore Index
28
Summary
SQL Server 2012 offers significantly faster query performance for data warehouse and decision support scenarios
10x to 100x performance improvement depending on the schema and query
I/O reduction and memory savings through columnstore compressed storage
CPU reduction with batch versus row processing, further I/O reduction if segmentation elimination occurs
Easy to deploy and requires less management than some legacy ROLAP or OLAP methods
No need to create intermediate tables, aggregates, pre-processing and cubes
Interoperability with partitioning 29
Resources
Columnar Storage in SQL Server 2012 (PDF)
SQL Server Columnstore Performance Tuning
Inside the SQL Server 2012 Columnstore Indexes
24 HOP Russia 2013 – Dmitry Pilyugin (video - rus)
SQL Server Columnstore Performance Tuning (video)
30
Denis Reznik
Senior Database Architect at The Frayman Group
Microsoft SQL Server MVP
@denisreznik
http://reznik.uneta.com.ua
SQL SERVER 2012 - COLUMNSTORE INDEXES