pmit-6102 advanced database systems

25
Distributed DBMS Slide 1 PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University

Upload: emera

Post on 22-Mar-2016

59 views

Category:

Documents


0 download

DESCRIPTION

PMIT-6102 Advanced Database Systems. By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University. Lecture -14 Parallel Database Systems. Outline. Parallel Database Systems Fundamental Functional Architecture Parallel DBMS Architectures shared-memory, shared-disk and - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PMIT-6102 Advanced Database Systems

PMIT-6102Advanced Database Systems

By-Jesmin Akhter

Assistant Professor, IIT, Jahangirnagar University

Page 2: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 2

Lecture -14Parallel Database

Systems

Page 3: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 3

Outline Parallel Database Systems

Fundamental Functional Architecture Parallel DBMS Architectures

shared-memory, shared-disk and shared-nothing.

Page 4: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 4

Parallel Database Systems A parallel computer, or multiprocessor, is a special

kind of distributed system made of a number of nodes (processors, memories and disks) connected by a very fast network within one or more cabinets in the same room.

Data distribution can be exploited to increase performance (through parallelism) and availability (through replication).

They can support very large databases with very high loads.

Implementation of parallel database systems naturally relies on distributed database techniques.

Page 5: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 5

Advantages A parallel database system should provide the

following advantages. High-performance

Parallelism can increase throughput, using inter-query parallelism Inter-query parallelism is a form of parallelism in the

evaluation of database queries, in which several different queries execute concurrently on multiple processors to improve the overall throughput of the system.

decrease transaction response times, using intra-query parallelism Intra-query parallelism is a form of parallelism in the

evaluation of database queries, in which a single query is decomposed into smaller tasks that execute concurrently on multiple processors.

Page 6: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 6

Advantages

High-availability Because a parallel database system consists of

many redundant components, it can well increase data availability and fault-tolerance.

Replicating data at several nodes is useful to support failover, a fault-tolerance technique that enables automatic redirection of transactions from a failed node to another node that stores a copy of the data. This provides uninterupted service to users.

Page 7: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 7

Advantages Extensibility

Extensibility is the ability to expand the system smoothly by adding processing and storage power to the system.

Ideally, the parallel database system should Linear speedup and linear scale-up

Linear speedup refers to a linear increase in performance for a constant database size while the number of nodes (i.e., processing and storage power) are increased linearly.

Linear scale up refers to a sustained performance for a linear increase in both database size and number of nodes.

Page 8: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 8

Extensibility

Advantages

Fig. 14.1 Extensibility Metrics

Page 9: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 9

Functional Architecture

The functions supported by a parallel database system can be divided into three subsystems much like in a typical DBMS.

Session Manager Transaction Manager Data Manager

Page 10: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 10

Functional Architecture Session Manager

It plays the role of a transaction monitor, providing support for client interactions with the server.

In particular, it performs the connections and disconnections between the client processes and the two other subsystems.

Therefore, it initiates and closes user sessions (which may contain multiple transactions).

In case of OLTP sessions, the session manager is able to trigger the execution of pre-loaded transaction code within data manager modules.

Page 11: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 11

Functional Architecture Transaction Manager

It receives client transactions related to query compilation and execution.

It can access the database directory that holds all meta-information about data and programs.

Depending on the transaction, it activates the various compilation phases, triggers query execution, and returns the results as well as error codes to the client application.

Because it supervises transaction execution and commit, it may trigger the recovery procedure in case of transaction failure.

To speed up query execution, it may optimize and parallelize the query at compile-time.

Page 12: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 12

Functional Architecture Data Manager

It provides all the low-level functions needed to run compiled queries in parallel, i.e., database operator execution, parallel transaction support, cache management, etc.

If the transaction manager is able to compile dataflow control, then synchronization and communication among data manager modules is possible. Otherwise, transaction control and synchronization must be done by a transaction manager module.

Page 13: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 13

Parallel DBMS Architectures There are three basic parallel computer

architectures depending on how main memory or disk is shared:

shared-memory, shared-disk and shared-nothing. Hybrid architectures such as NUMA or cluster try to

combine the benefits of the basic architectures.

Page 14: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 14

Parallel DBMS Architectures

Shared-Memory In the shared-memory any processor has access

to any memory module or disk unit through a fast interconnect (e.g., a high-speed bus or a cross-bar switch).

All the processors are under the control of a single operating system.

All shared-memory parallel database products today can exploit inter-query parallelism to provide high transaction throughput and intra-query parallelism to reduce response time of decision-support queries.

Page 15: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 15

Parallel DBMS Architectures

Shared-Memory

Fig. 14.3 Shared-Memory Architecture

Page 16: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 16

Parallel DBMS Architectures

Shared-Memory Shared-memory has two strong advantages:

simplicity Since meta-information (directory) and control

information (e.g., lock tables) can be shared by all processors, writing database software is not very different than for single processor computers.

Intra-query parallelism requires some parallelization but remains rather simple

load balancing. Load balancing is easy to achieve since it can be

achieved at run-time using the shared-memory by allocating each new task to the least busy processor.

Page 17: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 17

Parallel DBMS Architectures Shared-Memory Shared-memory has three problems:

high cost, High cost is incurred by the interconnect that requires

fairly complex hardware because of the need to link each processor to each memory module or disk.

limited extensibility With faster processors (even with larger caches),

conflicting accesses to the shared-memory increase rapidly and degrade performance

Therefore, extensibility is limited to a few tens of processors, typically up to 16 for the best cost/performance using 4-processor boards.

low availability Finally, since the memory space is shared by all

processors, a memory fault may affect most processors thereby hurting availability. The solution is to use duplex memory with a redundant interconnect.

Page 18: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 18

Parallel DBMS Architectures Shared-Disk

In the shared-disk approach any processor has access to any disk unit through the interconnect but exclusive (non-shared) access to its main memory.

Each processor-memory node is under the control of its own copy of the operating system. Then, each processor can access database pages on the shared disk and cache them into its own memory.

Since different processors can access the same page in conflicting update modes, global cache consistency is needed.

The first parallel DBMS that used shared-disk is Oracle with an efficient implementation of a distributed lock manager for cache consistency.

Other major DBMS vendors such as IBM, provide shared-disk implementations.

Page 19: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 19

Shared-Disk

Page 20: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 20

Shared-disk has a number of advantages: lower cost,

The cost of the interconnect is significantly less than with shared-memory since standard bus technology may be used.

high extensibility, Given that each processor has enough main memory,

interference on the shared disk can be minimized. Thus, extensibility can be better, typically up to a hundred processors.

load balancing, easy migration from centralized systems.

availability, Since memory faults can be isolated from other nodes,

availability can be higher.

Parallel DBMS Architectures

Page 21: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 21

Shared-Nothing In the shared-nothing approach each processor

has exclusive access to its main memory and disk unit(s).

Similar to shared-disk, each processor memory- disk node is under the control of its own copy of the operating system.

Each node can be viewed as a local site (with its own database and software) in a distributed database system.

Therefore, most solutions designed for distributed databases such as database fragmentation, distributed transaction management and distributed query processing may be reused.

Using a fast interconnect, it is possible to accommodate large numbers of nodes. This architecture is often called Massively Parallel Processor (MPP).

Page 22: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 22

The first major parallel DBMS product was Teradata’s Database Computer that could accommodate a thousand processors in its early version.

Other major DBMS vendors such as IBM, Microsoft provide shared-nothing implementations.

Shared-Nothing

Page 23: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 23

Shared-Nothing

Page 24: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 24

As demonstrated by the existing products, shared-nothing has three main virtues:

lower cost, The cost advantage is better than that of shared-disk that

requires a special interconnect for the disks. high extensibility

By implementing a distributed database design that favors the smooth incremental growth of the system by the addition of new nodes, extensibility can be better (in the thousands of nodes).

high availability By replicating data on multiple nodes, high availability can

also be achieved.

Shared-Nothing

Page 25: PMIT-6102 Advanced Database Systems

Distributed DBMS Slide 25

Thank You