c-store: class overview spring, 2009

26
C-Store: Class Overview Spring, 2009 Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Feb 27, 2009

Upload: dex

Post on 31-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

C-Store: Class Overview Spring, 2009. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Feb 27, 2009. C-Store: A Column-Oriented DBMS. Instructor: Jianlin Feng ( 冯剑琳 ) Office: Lab Center B111 Teaching: Friday (2-3 and 4-5), D202. Teaching Style: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: C-Store: Class Overview Spring, 2009

C-Store: Class Overview

Spring, 2009Jianlin FengSchool of SoftwareSUN YAT-SEN UNIVERSITYFeb 27, 2009

Page 2: C-Store: Class Overview Spring, 2009

C-Store: A Column-Oriented DBMS

Instructor: Jianlin Feng (冯剑琳 ) Office: Lab Center B111 Teaching: Friday (2-3 and 4-5), D202. Teaching Style:

Try to present the Basic Ideas in a clear and unified manner

Be your guide if you like

Email: [email protected]

Page 3: C-Store: Class Overview Spring, 2009

C-Store: Class Motivation

We are doing Software!!! A database management system (DBMS) is

computer software that manages databases. 3 Turing Award Winners since 1966 Oracle, DB2, SQl Server

Wanna be a Software Architect? Not a Naïve Coder Learning from top software developers Learning from open source code Understanding System Design and Implementation Better

Page 4: C-Store: Class Overview Spring, 2009

C-Store’s Father:Michael Stonebraker A former Professor at Berkeley, an Adjunct Professor at M.I.T. ACM Software System Award, 1988

INGRES, developed by undergraduates POSTGRES, Mariposa, C-Store

ACM SIGMOD Innovation Award, 1994 National Academy of Engineering , 1998

Page 5: C-Store: Class Overview Spring, 2009
Page 6: C-Store: Class Overview Spring, 2009

C-Store: The Home Pagehttp://db.lcs.mit.edu/projects/cstore/ C-Store: A Column-Oriented DBMS download-Source code overview-Project description papers-Publications people-Who are we?

The CStore project is a collaboration between MIT, Yale, Brandeis University. Brown University, and UMass Boston .

Commercialized C-Store: Vertica

Page 7: C-Store: Class Overview Spring, 2009

Course Work: Assignments, and Course Project Reading papers

Each student will be individually responsible for writing up a short summary of every paper.

Reading source codes Team work

5 students Some related project as you like, Or specified by Instructor Doing presentation

Page 8: C-Store: Class Overview Spring, 2009

An example summary

LRVM (Satyanarayanan, et al.) Good points:

1) Providing an abstraction of a greatly needed behavior (transactions) makes system code implementation much easier: this stuff is useful.

2) Returns to UNIX mentality of small and simple building blocks. 3) Performance analysis (Rmem/Pmem) very applicable to stated domain (fs

metadata). Bad points:

1) It would have been nice if they had explicitly stated that set-range can be called multiple times within a transaction; they only comment on it in 5.2 when discussing optimizations (for overlapping region specification).

2) It's unclear why the throughputs are almost equivalent for sequential access even though their CPU utilization is much different. This seems to contradict their scalability concern, as it would seem both systems are IO bound as opposed to to CPU bound; given the rate of CPU improvement, IO would seem to be the greater concern. Of course, it's still good that the very simple RVM performs better.

Page 9: C-Store: Class Overview Spring, 2009

The Starting Point

C-Store: A Column Oriented DBMS

Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik.

VLDB, pages 553-564, 2005.

Page 10: C-Store: Class Overview Spring, 2009

C-Store: the Column Store Project Row Store or Column Store ?

Record 1

Record 2 Column 1 Column 2

Record 3

Column 3

Relation or Tables

Page 11: C-Store: Class Overview Spring, 2009

Example of a Relation

Page 12: C-Store: Class Overview Spring, 2009

The History: Relational Model Codd, E.F. (1970). "A Relational

Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377–387.

Physical Data Independence Row Store Vs. Column Store on the

same Conceptual Model: Relation

Page 13: C-Store: Class Overview Spring, 2009

Row Store: Why?

OLTP (On-Line Transaction Processing) ATM, POS in supermarkets

Characteristics of OLTP applications : Transactions that involve small numbers of record

s (or tuples) Frequent updates (including queries) Many users Fast response times

OLTP Needs Write-Optimized Row Store. Insert and delete a record in one physical write.

Page 14: C-Store: Class Overview Spring, 2009

Row Store: Columns Stored Together

• Record id = <page id, slot #>

Page iRid = (i,N)

Rid = (i,2)

Rid = (i,1)

Pointerto startof freespaceSLOT DIRECTORY

N . . . 2 120 16 24 N

# slotsSlot Array

Data

Page 15: C-Store: Class Overview Spring, 2009

Current DBMS Gold StandardCurrent DBMS Gold Standard

Store Columns in one record contiguously on disk

Use B-tree indexing Use small (e.g. 4K) disk blocks Align fields on byte or word boundaries Conventional (row-oriented) query optimizer

and executor (technology from 1979) Aries-style transactions

Page 16: C-Store: Class Overview Spring, 2009

From OLTP to OLAP and Data Warehouse OLAP (On-Line Analytical Processing, Codd, 199

3) Flexible Reporting for Business Intelligence

Characteristics of OLAP applications : Transactions that involve large numbers of records Frequent Ad-hoc queries and Infrequent updates A few decision making users Fast response times

Data warehouses are designed to facilitate reporting and analysis. Read-Mostly

Page 17: C-Store: Class Overview Spring, 2009

A Flavor of OLAP: Data Cube(Jim Gray, 1996)

Page 18: C-Store: Class Overview Spring, 2009

Data Cube vs. Star Schema

Page 19: C-Store: Class Overview Spring, 2009

Data Warehouse Architecture

Page 20: C-Store: Class Overview Spring, 2009

Other Read-Mostly Applications CRM (Customer Relationship Management )

Siebel (Oracle)

Catalog Search in Electronic Commerce Amazon.com Shopping.com

Page 21: C-Store: Class Overview Spring, 2009

Column Store: Why?

The Intuition: Only read relevant columns Say, Ad-hoc queries read 2 columns out of 20

Column Store is not a new idea Sybase IQ (early ’90s, bitmap index) Addamark (i.e., SenSage, for Event Log data war

ehouse) MonetDB (Hyper-Pipelining Query Execution, CID

R’05)

Page 22: C-Store: Class Overview Spring, 2009

C-Store Technical IdeasC-Store Technical Ideas

Logical Data Model: Relational Model Column Store Only Materialized Views on Each Relation (perhaps

many) Active Data Compression Column-Oriented Query Executor and Optimizer Shared Nothing Architecture Replication-Based Concurrency Control and

Recovery

Page 23: C-Store: Class Overview Spring, 2009

How to Evaluate The C-Store Paper None of the ideas in isolation merit

publication

Judge the complete system by its (hopefully intelligent) choice of Small collection of inter-related powerful ideas That together put performance in a new sandbox

Page 24: C-Store: Class Overview Spring, 2009

Architecture of C-Store (Vertica)On a Single Node

Page 25: C-Store: Class Overview Spring, 2009

C-Store code base version 0.2 http://db.lcs.mit.edu/projects/cstore/cstore0.2.

tar.gz runs on Linux x86 computers

Tested on RedHat Linux This code compiles on old versions Berkeley

DB and gcc. BerkeleyDB.4.2

LZO version 1 (http://www.oberhumer.com/opensource/lzo/)

Page 26: C-Store: Class Overview Spring, 2009

References

Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. C-Store: A Column Oriented DBMS VLDB, pages 553-564, 2005.

VERTICA DATABASE TECHNICAL OVERVIEW WHITE PAPER. http://www.vertica.com/php/pdfgateway?file=VerticaArchitectureWhitePaper.pdf

http://www.sensage.com/English/Products/Event_Data_Warehouse.html