sql rally 2013 columnstore indexes

Post on 14-Dec-2014

830 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

COLUMNSTORE INDEXES

SQL Server 2012

Denis ReznikThe Frayman Groupdenisreznik@live.ru

Columnstore indexes

• Column Store vs. Row Store

• Columnstore benefits

• Columnstore indexes

• CS indexes Internals

• Adding data to Columnstore index

Row Store and Column Store

In row store, data is stored tuple by tuple.

In column store, data is stored column by column3

Row Store and Column Store

Most of the queries does not process all the attributes of a particular relation.

SELECT c.name, c.address FROM Customers cWHERE c.region = ‘Moskow'

4

id

name

city state age

address

Row Store and Column Store

So column stores are suitable for read-mostly, read-intensive, large data repositories

Row Store Column Store

(+) Easy to add/modify a record (+) Only need to read in relevant data

(-) Might read in unnecessary data (-) Tuple writes require multiple accesses

5

Compression

Trades I/O for CPU

Higher data value locality in column stores

Techniques such as run length encoding far more useful

Schemes

Null Suppression

Dictionary encoding

Run Length encoding

Bit-Vector encoding

Heavyweight schemes

6

Columnar storage structure

Uses VertiPaq

compression

C1 C2 C3 C5 C6C4

Pages

Row store:

Column store:

v

Accelerating Data Warehouse Queries with SQL Server 2012 Columnstore Indexes

9

Improved Data Warehouse Query performance

Columnstore indexes provide an easy way to significantly improve data warehouse and decision support query performance against very large data sets

Performance improvements for “typical” data warehouse queries from 10x to 100x

Ideal candidates include queries against star schemas that use filtering, aggregations and grouping against very large fact tables

10

Good Candidates for Columnstore Indexing

Table candidates:

Very large fact tables (for example – billions of rows)

Larger dimension tables (millions of rows) with compression friendly column data

If unsure, it is easy to create a columnstore index and test the impact on your query workload

Query candidates (against table with a columnstore index):

Scan versus seek (columnstore indexes don’t support seek operations)

Aggregated results far smaller than table size

Joins to smaller dimension tables

Filtering on fact / dimension tables – star schema pattern

Sub-set of columns (being selective in columns versus returning ALL columns)

11

Creating a columnstore index

T-SQL

SSMS

12

Defining the Columnstore Index

Columnstore index is nonclustered (secondary)

Base table can be clustered index or heap

One CS index per table

Multiple other nonclustered (B-tree) indexes allowed

But may not be needed

CS index must be partition-aligned if table is partitionedIndexed

viewFiltered index

Clustered index

Heap

Nonclustered index

Nonclustered index

Nonclustered columnstore

index

Base table

OR

Column Segments and Dictionaries

15

C1 C2 C3 C5 C6C4

Set of about 1M rows

Column Segment

segment 1

segment Ndictionaries

Memory management

16

SELECT C2, SUM(C4)FROM TGROUP BY C2;

T.C2T.C4

T.C2T.C4

T.C2

T.C2

T.C2T.C1

T.C1

T.C1

T.C1

T.C1T.C3

T.C3

T.C3

T.C3

T.C3

T.C4

T.C4

T.C4

• Memory management is automatic

• Columnstore is persisted on disk

• Needed columns fetched into memory

• Columnstore segments flow between disk and memory

v

Look inside Columnstore Indexes

17

Xvelocity

Microsoft SQL Server family of memory-optimized and in-memory technologies

xVelocity In-Memory Analytics Engine

xVelocity Memory-Optimized Columnstore Indexes

The xVelocity engine is designed with 3 principles in mind:

Performance, Performance, Performance! 18

How Are These Performance Gains Achieved?

Two complimentary technologies:

Storage

Data is stored in a compressed columnar data format (stored by column) instead of row store format (stored by row).

New “batch mode” execution

Vector-based query execution capability

Data can then be processed in batches versus row-by-row

Depending on filtering and other factors, a query may also benefit by “segment elimination” - bypassing million row chunks (segments) of data, further reducing I/O

19

Batch mode processing

Process ~1000 rows at a time

Vector operators implemented

Greatly reduced CPU time (7 to 40X)bi

tmap

of

qual

ifyin

g ro

ws

Column vectors

Batch object

Segment Elimination

column_id

segment_id min_data_id max_data_id

1 1 20120101 20120131

1 2 20120115 20120215

1 3 20120201 20120228

• Segment (rowgroup)= 1 million row chunk• Min, Max kept for each column in a segment• Scans can skip segments based on this

select Date, count(*) from dbo.Purchase where Date >= 20120201 group by Date

skipped

Columnstore format + batch mode Variations

Columnstore indexing alone + traditional row mode in Query Processor

Columnstore indexing + batch mode in Query Processor

Columnstore indexing + hybrid of batch and traditional row mode in Query Processor

23

Plan operators supported in batch mode

Filter

Project

Scan

Local hash (partial) aggregation

Hash inner join

(Batch) hash table build

24

v

Query processing with Columnstore Indexes

25

Maintaining Data in a Columnstore Index

Once built, the table becomes “read-only” and INSERT/UPDATE/DELETE/MERGE is no longer allowed

ALTER INDEX REBUILD / REORGANIZE not allowed

How can I modify index data?

Drop columnstore index / make modifications / add columnstore index

UNION ALL (but be sure to validate performance)

Partition switches (IN and OUT)27

v

Insert data into table with Columnstore Index

28

Summary

SQL Server 2012 offers significantly faster query performance for data warehouse and decision support scenarios

10x to 100x performance improvement depending on the schema and query

I/O reduction and memory savings through columnstore compressed storage

CPU reduction with batch versus row processing, further I/O reduction if segmentation elimination occurs

Easy to deploy and requires less management than some legacy ROLAP or OLAP methods

No need to create intermediate tables, aggregates, pre-processing and cubes

Interoperability with partitioning 29

Resources

Columnar Storage in SQL Server 2012 (PDF)

SQL Server Columnstore Performance Tuning

Inside the SQL Server 2012 Columnstore Indexes

24 HOP Russia 2013 – Dmitry Pilyugin (video - rus)

SQL Server Columnstore Performance Tuning (video)

30

Denis Reznik

Senior Database Architect at The Frayman Group

Microsoft SQL Server MVP

denisreznik@live.ru

@denisreznik

http://reznik.uneta.com.ua

SQL SERVER 2012 - COLUMNSTORE INDEXES

top related