데이터베이스의 끝없는 도전 - towards self-tuning databases -

41
데데데데데데데 데데데 데데 데데데데데데데 데데데 데데 - Towards Self-Tuning Databases - Towards Self-Tuning Databases - - 이 이 이 [email protected] http://vldb.skku.ac.k r 이이이이 & 이이이 데데 데데데데데데데데데 데데 데데 데데데데데데데데데 데데 20 20 데데 데데 데데데 데데 데데 데데데 데데데데데데 데데데데데데 ( ( 데데 데데 ) )

Upload: amal

Post on 25-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

월간 마이크로소프트웨어 창간 20 주년 기념 세미나. 데이터베이스 ( 가제 ). 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -. 이 상 원 [email protected] http://vldb.skku.ac.kr 성균관대 & ㈜ 엑셈. Contents. A Quick Overview of Database History Why Self-Tuning Databases? Database Tuning in the 21 st century - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

데이터베이스의 끝없는 도전데이터베이스의 끝없는 도전- Towards Self-Tuning Databases - Towards Self-Tuning Databases

--이 상 원[email protected]://vldb.skku.ac.kr성균관대 & ㈜엑셈

월간 마이크로소프트웨어 창간 월간 마이크로소프트웨어 창간 20 20 주년 기념 세미나주년 기념 세미나

데이터베이스데이터베이스 (( 가제가제 ))데이터베이스데이터베이스 (( 가제가제 ))

Page 2: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Contents

• A Quick Overview of Database History– Why Self-Tuning Databases?

• Database Tuning in the 21st century

• Towards Self-Tuning Databases

Page 3: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

A Quick Overview of Database History

• File systems(50s)

• Hierarchy/Network databases(60s)

• Relational databases(70s)

• Object databases(80s)

• OR databases(90s)

• XOR databases(00s)

Page 4: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

A Quick Overview of Database History(2)

• The state-of-the-arts functionalities of RDBMSs– A giantic black-hole of application intelligence

– Developer

DB2, Oracle,SQL Server

Active DB(Trigger)

Spatial DB

Data Mining

EXCEL

MOLAP

ROLAP(Cube) Deductive DB

(Recursion)

Procedure-C/C++, Java, ..

Temporal DB

Queue e-mail

Multimedia

File System

XML/Stream/Bioinformatics/Appl. Server/Web Service ..

Page 5: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

A Quick Overview of Database History(3)

• The state-of-the-arts functionalities of RDBMSs– Rapidly growing internal optimization/tuning knobs

– DBA

DB2, Oracle,SQL Server

MVs

PartitioningBuffer Mgmt

Bitmap Index

……….

Extensibility

Cost-BasedOptimizationAutomatic

Disk Mgmt

Page 6: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

A Quick Overview of Database History(4)

• Achievements of RDBMSs1. Relational model and the “Esperanto” SQL “productivity”

2. Query optimization “can beat all but the best programmers”

3. Transaction management “ACID”

• E. F. Codd gone! but Relational forever!!

• A vision for database community

"My prediction is that eventually all storage systems will evolve to be database systems."(Jim Gray, 2002)

Page 7: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

A Quick Overview of Database History(5)

DB = IT Core Platform in the 21st Century???

• ERP/CRM/SEM/BSC

• DW/OLAP/Data Mining

• Web Log Analysis

• GIS/XML/Mobile

• Bio-informatics

• ....

• Ubiquitous Databases

Page 8: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

A Quick Overview of Database History(6)

• One big challenge for RDBMS– Simplicity is a big issue. (Jim Gray, 2003)

• Commercial DBMSs are exponentially complicated– Even 10-year veteran Oracle DBAs do not know 50% of all the n

ew features of Oracle’s latest release– Worsely, it would take more than 1 week for them to understand

the performance behavior of a single new feature– The gap between human DB learning curve and new features be

comes larger

“Crossing this chasm” seems to be impossible!!

• It’s time for self-tuning intelligence!

Page 9: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

What is DB Tuning?

• Car tuning– Goals?

– What?

– How?

Page 10: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Why DB Tuning in the 21st Century?

• DB performance influences all aspects of your business– DB performance management is one of the most important

success factors in almost every IT projects– “ 미래 웹경쟁력은 데이터처리능력에” (Scott McNealy, Sun CEO,

전자신문 )

• DB tuning is not a technical issue, but a business issue!

– Improve your productivity

– Increase your revenue/profit

– Increase your ROI(Return on Investment)

Page 11: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Why DB Tuning in the 21st Century?(2)

1. More data - doubling every 9 month

2. More users - from CEO to sales representatives

3. More complex query - e.g. OLAP/data mining query

4. Faster responses– e.g. web response time impacts revenue!– e.g. real time personalization

0

500

1000

1500

2000

2500

3000

3500

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

Year

Pe

tab

yte

s

DiskSalesMoore'sLaw

• Data growth vs. computer speedup

* Moore’s Law -- # of transistors/chip doubles every 18 months (1965)

Page 12: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Why DB Tuning in the 21st Century?(4)

• Disk sales doubling every 9 months – Greg Papadopoulos, Sun CTO

– Also from Winter VLDB survey (http://www.wintercorp.com)

• Time to process all your data doubles every 18 months!– Without DB tuning, you need double-up investment on DB serve

r and software every 6 month!

• Maximize your DB performance!! (Let’s DBMax)– Maximize your PRODUCTIVITY

– Maximize your REVENUE/PROFIT

– Maximize your ROI

Page 13: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Why Self-Tuning Databases?

• War of TCO(Total Cost of Ownership) in 21st century– Major costs of IT infra.: H/W, S/W, Human Cost

• Moore’s law: H/W, S/W cost ↓

• BUT! Human cost ↑

– cf. 1950s: telephone industry

• manual switching by operators automatic switching system

Page 14: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Why Self-Tuning Databases?(2)

Blackbox Database Tuning

Page 15: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Self-Tuning: Targets

Server Server ProcessesProcesses

Shared PoolShared Pool Database Buffer CacheDatabase Buffer Cache

SGASGA Redo LogRedo LogBufferBuffer

Oracle InstanceOracle Instance

Parameter FileParameter File DatafilesDatafilesControl FilesControl Files Redo Log FilesRedo Log Files

Oracle DatabaseOracle Database

UserUserProcessesProcesses

SQLSQL

OptimizerOptimizerSQL Exec. SQL Exec.

EngineEngine

SQL Exec. SQL Exec. MemoryMemory

Big Picture of a Simplied DBMS

Page 16: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Self-Tuning: Targets(2)

• SQLs

• Memory

• Index/Materialized Views

• Emerging fields

Page 17: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

SQL Self-Tuning: Optimizer

• SQL best execution plan– access method + join order + join method

– Rule-based optimizer vs. Cost-based optimizer

• The Selinger-style optmizer– P. Selinger et al, “Access path selection in a relational databas

e management systems,” SIGMOD 79

– SYSTEM/R

– A “MUST” reading for every SQL guys

• cost estimation

• dynamic programming

Page 18: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

SQL Self-Tuning(2): Cost-Based Optimzation

Data Dictionary

Query Optimizer

SingleSelectivity

CombinedSelectivitySelectivity &

CardinalityEstimation

Formula

EstimatedCardinality

Cost Model EstimatedCost

RDBMS InstanceRuntime Memory Environment:

Buffer Cache Size,Sort/Hash Memory etc.

Real Data

2. Plan GeneratorP1, P2, .. , Pn

1. SQL4. (Estimated)Best Execution Plan for SQL:

3. Cost Estimator

Pi

for each plan estimated cost

Page 19: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

SQL Self-Tuning(3): Cost-Based Optimzation

• Cost-based optimization technique itself is a self-tuning process, but it lacks 5%– 80% of SQL: Best plan

– 15%: top 3 plan

– 5%: not so good

• Fallacies of CBO1. Uniform distribution

2. Attribute Independence

3. Join Independence

Page 20: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

SQL Self-Tuning(4): Cost-Based Optimzation

• How to fix the fallacies of CBO?– Various approaches: who? when? based on what?

1. Manual tuning

2. LEO

3. SOS [13]

4. Dynamic reoptimization

5. Parameteric query optimization

6. ….

Page 21: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

SQL Self-Tuning(5): IBM DB2 LEO

• LEO( LEarning Optimizer ) – DB part of IBM SMART: Self-Managing And Resource Tuning

(See IBM’s IBM Autonomic Computing )

Page 22: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

SQL Self-Tuning(6): IBM DB2 LEO

• eeed

SELECT * FROM X,Y,Z

WHERE X.PRICE>=100 AND Z.CITY=‘Denver’

AND Y.MONTH=‘Dec’ AND X.ID=Y.ID

AND Y.NR=Z.NR

GROUP BY A

Negative feed back!!!

Page 23: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

SQL Self-Tuning(7): Oracle’s v$sql_plan_statistics

• Runtime statistics monitor(Since Oracle9iR2)– You can compare optimizer’s estimation with real statistics

– Then, you can find out where is the source of errors?

** Someday, Oracle and MS will come with LEO-like features

• SKKU and XM are developing SOS tool– “Save Our SQL”

– How to tune SQL (semi-)automatically?

1. Find the negative feedback

2. Drill down to the causes

3. Analyze the effect

4. Hint the SQL

Page 24: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Self-Tuning: Targets

• SQLs

• Memory

• Index/Materialized Views

• Emerging fields

Page 25: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Memory Self-Tuning

• Complex queries use memory intensive operators– Sorting, hashing, bitmap index manipulation etc.

• The amount of available memory greatly affects their performance

• However, there is only a finited amount of memory, shared by all concurrent users

Response Time

Memory

..

1-pass optimalM-pass

Sort

M1-pass = sqrt(I x S x 2)

Moptimal = I

Page 26: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Memory Self-Tuning(2): Oracle SQL Memory Mgmt.

Oracle Memory Model

Sort Area, Hash Area, Bitmap Index

• Pre-Oracle9i– Fixed PGA size

– Sort(Hash)_Area_Size

• Oracle9i [8]– PGA_Aggregate_Target

– Dynamically allocated according to the needs

Page 27: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Memory Self-Tuning(3): Oracle SQL Memory Mgmt.

0

100

200

300

400

500

600

1 2 4 6 8 10 12 16 20

Number of Users

Time (min)

Manual - 5MB

Manual - 15MB

Auto - 1.5G

• Manual vs. Automatic: Response time

Page 28: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

0

1,000

2,000

3,000

4,000

5,000

6,000

1 2 4 6 8 10 12 16 20

Manual - 5MManual - 15MAuto

Number of Users

• Manual vs. Automatic: Total Memory Usage

Memory Self-Tuning(4): Oracle SQL Memory Mgmt.

Page 29: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

• DBA specifies PGA_AGGREGATE_TARGET

• Oracle divides PGA memory for max. performance and optimal memory usage

Memory Self-Tuning(5): Oracle SQL Memory Mgmt.

Local Memory Manager

WP1

WP2

WP3

SnSnSnSn

Memory Bound Computation

Memory Target Computation

Memory Usage Statistics

SQL Statements pga_aggregate_target

Active work areas profiles

SQLMemoryTargetMemory

Bound

SMM Daemon

WP4

WPn

Page 30: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Self-Tuning: Targets

• SQLs

• Memory

• Index/Materialized View(MV)s

• Emerging fields

Page 31: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Index/MV Self-Tuning

• Selection of Index/MVs:

“SQL Workloads + Available Disk Space Best Index/MV sets”

– Workload: queries + updates

– Constraints: upper bound on storage space for indexes

– Search: pick a configuration that is of “lowest” cost for the given database and workload

• Industry cases– MS Autoadmin

– IBM Design Advisor

– Oracle OEM Index/Summary Advisor

Page 32: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Index/MV Self-Tuning(2): MS SQL Server

• AutoAdmin: Self-Tuning and -Administering Databases[12] “We achieve this by enabling databases to track the usage of their syste

ms and to gracefully adapt to application requirements. Thus, instead of applications having to track and tune databases, databases actively auto-tunes itself to be responsive to application needs.”

– Workload driven physical database design

– To reduce TCO(Total Cost of Ownership)

Page 33: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Index/MV Self-Tuning(3): MS SQL Server

Architecture of Index/MV Selection Tools

Issue - search space

• Large search space for indexes– Many columns to choose from

– Kinds of indexes

• Explosive search space for MVs

• Query optimizers use physical design in novel ways

• Physical design choices interact

Workload

Syntactic structure selection

CandidateIndex

Selection

ConfigurationEnumeration

FinalRecommendation

MicrosoftSQLServer

ConfigurationSimulationand CostEstimationModule

CandidateMaterializedView Selection

Page 34: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Self-Tuning: Targets

• SQLs

• Memory

• Index/Materialized View(MV)s

• Emerging areas for self-tuning

Page 35: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Emerging Areas for Self-Tuning

Server Server ProcessesProcesses

Shared PoolShared Pool Database Buffer CacheDatabase Buffer Cache

SGASGA Redo LogRedo LogBufferBuffer

Oracle InstanceOracle Instance

Parameter FileParameter File DatafilesDatafilesControl FilesControl Files Redo Log FilesRedo Log Files

Oracle DatabaseOracle Database

UserUserProcessesProcesses

SQLSQL

OptimizerOptimizerSQL Exec. SQL Exec.

EngineEngine

SQL Exec. SQL Exec. MemoryMemory

Big Picture of a Simplied DBMS

Page 36: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Emerging Areas for Self-Tuning(2)

• Automatic configuration of init. parameter[5]– e.g. IBM DB2 V7.2 “Configuration Advisor”

– For 36 parameters, 98.4% of DB2 performance experts

• Automating physical database layout[9]– “How database objects such as tables and indexes are assigned

to disk drives can impact the I/O performance of the system” (MS SQL Server)

– Automatic partitioning of tables under parallel databases (DB2)

Page 37: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Emerging Areas for Self-Tuning(3)

• Accurate storage device access cost estimation[11]– CPU speed, disk I/O time, network, RAID rebuild

– Inaccurate estimation cause (seriously) suboptimal query plan

– Why wrong estimation?

• storage or network may undergo heavy load or failure

• older/newer than the h/w the optimizer assume

– “We can achieve noticeable performance improvements by providing query optimizer with accurate and timely information about the current status of their storage devices”

Page 38: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Emerging Areas for Self-Tuning(4)

• System tuning[1]– Most urgent, but only ad-hoc or rule-of-thumb approaches with some mo

nitoring tools: Bizmax/MaxGuage, Orange/Chakra, Spotlight, Patrol, OEM

– Only the black box consulting

– Highly dependent on consultant or DBA experience

– Not scientic/engineering realm!!

• We need to develop mathematical model for the interactions among various system tuning knobs; Or, we need to construct and share the rule/knowledge base of the system tuning experts

MonitoringDrill-down

to BottleneckRoot Cause

AnalysisTuning

Page 39: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

Conclusion

• Next decade will be self-tuning intelligence era– Better than average DBAs

• BUT, You may be still the MVP for IT ROI only if– “Crossing the chasm between you and your DBMS technology”

• Know your data

• Know your application workloads

• Know your DBMS intelligence

Let’s DBMax!

Page 40: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

References

1. Gerhard Weikum et al., “Self-Tuning Database Technology and Information Services: from Wishful Thinking to Viable Engineering” VLDB 2002, Ten-Year Award

2. Gerhard Weikum et al, “The COMFORT Automatic Tuning Project”, Information Systems 19(5), 1994

3. Gerhard Weikum et al, “Special Issues on Self-Tuning Database and Applications,” Data Engineering Bulletine, Jun., 1999

4. IBM’s Autonomic Computing Perspectives: http://www.resarch.ibm.com/autonomic/manifesto/autonomic_computing.pdf

5. Guy Lohman et al, “Toward Automomic computing with DB2 Universal Database,” SIGMOD Record, Sep., 2002

6. MS AutoAdmin Homepage: http://www.research.microsoft.com/

7. Oracle Manageability: http://technet.oracle.com/products/manageability/content.html

Page 41: 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -

References(2)

8. Benoit Dageville and Mohamed Zait, “SQL Memory Management in Oracle9i,” VLDB 2002

9. Jun Rao, Chun Zhang, Guy Lohman et al., “Automating Physical Database Design in a Parallel Database,” SIGMOD 2002

10. S. Agrawal and Surajit Chaudhuri et al., “Automating Layout of Relational Databases,” ICDE 2003

11. F. R. Reiss and T. Kanungo, “A Characterization of the Sensitivity of Query Optimization to Storage Access Cost Parameters,” SIGMOD 2003

12. S. Agrawal and Surajit et al., “Automated Selection of Materialized Views and Indexes for SQL Databases,” VLDB 2000

13. Michael Stillger, Guy M. Lohman et al., “LEO - DB2's LEarning Optimizer,” VLDB 2001