sql rally 2013 columnstore indexes

27
COLUMNSTORE INDEXES SQL Server 2012 Denis Reznik The Frayman Group [email protected]

Upload: -

Post on 14-Dec-2014

830 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Sql rally 2013   columnstore indexes

COLUMNSTORE INDEXES

SQL Server 2012

Denis ReznikThe Frayman [email protected]

Page 2: Sql rally 2013   columnstore indexes

Columnstore indexes

• Column Store vs. Row Store

• Columnstore benefits

• Columnstore indexes

• CS indexes Internals

• Adding data to Columnstore index

Page 3: Sql rally 2013   columnstore indexes

Row Store and Column Store

In row store, data is stored tuple by tuple.

In column store, data is stored column by column3

Page 4: Sql rally 2013   columnstore indexes

Row Store and Column Store

Most of the queries does not process all the attributes of a particular relation.

SELECT c.name, c.address FROM Customers cWHERE c.region = ‘Moskow'

4

id

name

city state age

address

Page 5: Sql rally 2013   columnstore indexes

Row Store and Column Store

So column stores are suitable for read-mostly, read-intensive, large data repositories

Row Store Column Store

(+) Easy to add/modify a record (+) Only need to read in relevant data

(-) Might read in unnecessary data (-) Tuple writes require multiple accesses

5

Page 6: Sql rally 2013   columnstore indexes

Compression

Trades I/O for CPU

Higher data value locality in column stores

Techniques such as run length encoding far more useful

Schemes

Null Suppression

Dictionary encoding

Run Length encoding

Bit-Vector encoding

Heavyweight schemes

6

Page 7: Sql rally 2013   columnstore indexes

Columnar storage structure

Uses VertiPaq

compression

C1 C2 C3 C5 C6C4

Pages

Row store:

Column store:

Page 8: Sql rally 2013   columnstore indexes

v

Accelerating Data Warehouse Queries with SQL Server 2012 Columnstore Indexes

9

Page 9: Sql rally 2013   columnstore indexes

Improved Data Warehouse Query performance

Columnstore indexes provide an easy way to significantly improve data warehouse and decision support query performance against very large data sets

Performance improvements for “typical” data warehouse queries from 10x to 100x

Ideal candidates include queries against star schemas that use filtering, aggregations and grouping against very large fact tables

10

Page 10: Sql rally 2013   columnstore indexes

Good Candidates for Columnstore Indexing

Table candidates:

Very large fact tables (for example – billions of rows)

Larger dimension tables (millions of rows) with compression friendly column data

If unsure, it is easy to create a columnstore index and test the impact on your query workload

Query candidates (against table with a columnstore index):

Scan versus seek (columnstore indexes don’t support seek operations)

Aggregated results far smaller than table size

Joins to smaller dimension tables

Filtering on fact / dimension tables – star schema pattern

Sub-set of columns (being selective in columns versus returning ALL columns)

11

Page 11: Sql rally 2013   columnstore indexes

Creating a columnstore index

T-SQL

SSMS

12

Page 12: Sql rally 2013   columnstore indexes

Defining the Columnstore Index

Columnstore index is nonclustered (secondary)

Base table can be clustered index or heap

One CS index per table

Multiple other nonclustered (B-tree) indexes allowed

But may not be needed

CS index must be partition-aligned if table is partitionedIndexed

viewFiltered index

Clustered index

Heap

Nonclustered index

Nonclustered index

Nonclustered columnstore

index

Base table

OR

Page 13: Sql rally 2013   columnstore indexes

Column Segments and Dictionaries

15

C1 C2 C3 C5 C6C4

Set of about 1M rows

Column Segment

segment 1

segment Ndictionaries

Page 14: Sql rally 2013   columnstore indexes

Memory management

16

SELECT C2, SUM(C4)FROM TGROUP BY C2;

T.C2T.C4

T.C2T.C4

T.C2

T.C2

T.C2T.C1

T.C1

T.C1

T.C1

T.C1T.C3

T.C3

T.C3

T.C3

T.C3

T.C4

T.C4

T.C4

• Memory management is automatic

• Columnstore is persisted on disk

• Needed columns fetched into memory

• Columnstore segments flow between disk and memory

Page 15: Sql rally 2013   columnstore indexes

v

Look inside Columnstore Indexes

17

Page 16: Sql rally 2013   columnstore indexes

Xvelocity

Microsoft SQL Server family of memory-optimized and in-memory technologies

xVelocity In-Memory Analytics Engine

xVelocity Memory-Optimized Columnstore Indexes

The xVelocity engine is designed with 3 principles in mind:

Performance, Performance, Performance! 18

Page 17: Sql rally 2013   columnstore indexes

How Are These Performance Gains Achieved?

Two complimentary technologies:

Storage

Data is stored in a compressed columnar data format (stored by column) instead of row store format (stored by row).

New “batch mode” execution

Vector-based query execution capability

Data can then be processed in batches versus row-by-row

Depending on filtering and other factors, a query may also benefit by “segment elimination” - bypassing million row chunks (segments) of data, further reducing I/O

19

Page 18: Sql rally 2013   columnstore indexes

Batch mode processing

Process ~1000 rows at a time

Vector operators implemented

Greatly reduced CPU time (7 to 40X)bi

tmap

of

qual

ifyin

g ro

ws

Column vectors

Batch object

Page 19: Sql rally 2013   columnstore indexes

Segment Elimination

column_id

segment_id min_data_id max_data_id

1 1 20120101 20120131

1 2 20120115 20120215

1 3 20120201 20120228

• Segment (rowgroup)= 1 million row chunk• Min, Max kept for each column in a segment• Scans can skip segments based on this

select Date, count(*) from dbo.Purchase where Date >= 20120201 group by Date

skipped

Page 20: Sql rally 2013   columnstore indexes

Columnstore format + batch mode Variations

Columnstore indexing alone + traditional row mode in Query Processor

Columnstore indexing + batch mode in Query Processor

Columnstore indexing + hybrid of batch and traditional row mode in Query Processor

23

Page 21: Sql rally 2013   columnstore indexes

Plan operators supported in batch mode

Filter

Project

Scan

Local hash (partial) aggregation

Hash inner join

(Batch) hash table build

24

Page 22: Sql rally 2013   columnstore indexes

v

Query processing with Columnstore Indexes

25

Page 23: Sql rally 2013   columnstore indexes

Maintaining Data in a Columnstore Index

Once built, the table becomes “read-only” and INSERT/UPDATE/DELETE/MERGE is no longer allowed

ALTER INDEX REBUILD / REORGANIZE not allowed

How can I modify index data?

Drop columnstore index / make modifications / add columnstore index

UNION ALL (but be sure to validate performance)

Partition switches (IN and OUT)27

Page 24: Sql rally 2013   columnstore indexes

v

Insert data into table with Columnstore Index

28

Page 25: Sql rally 2013   columnstore indexes

Summary

SQL Server 2012 offers significantly faster query performance for data warehouse and decision support scenarios

10x to 100x performance improvement depending on the schema and query

I/O reduction and memory savings through columnstore compressed storage

CPU reduction with batch versus row processing, further I/O reduction if segmentation elimination occurs

Easy to deploy and requires less management than some legacy ROLAP or OLAP methods

No need to create intermediate tables, aggregates, pre-processing and cubes

Interoperability with partitioning 29

Page 26: Sql rally 2013   columnstore indexes

Resources

Columnar Storage in SQL Server 2012 (PDF)

SQL Server Columnstore Performance Tuning

Inside the SQL Server 2012 Columnstore Indexes

24 HOP Russia 2013 – Dmitry Pilyugin (video - rus)

SQL Server Columnstore Performance Tuning (video)

30

Page 27: Sql rally 2013   columnstore indexes

Denis Reznik

Senior Database Architect at The Frayman Group

Microsoft SQL Server MVP

[email protected]

@denisreznik

http://reznik.uneta.com.ua

SQL SERVER 2012 - COLUMNSTORE INDEXES