pmit-6102 advanced database systems

PMIT-6102Advanced Database Systems

By-Jesmin Akhter

Assistant Professor, IIT, Jahangirnagar University

Distributed DBMS Slide 2

Lecture -14Parallel Database

Systems


Outline Parallel Database Systems

Fundamental Functional Architecture Parallel DBMS Architectures

shared-memory, shared-disk and shared-nothing.


Parallel Database Systems A parallel computer, or multiprocessor, is a special

kind of distributed system made of a number of nodes (processors, memories and disks) connected by a very fast network within one or more cabinets in the same room.

Data distribution can be exploited to increase performance (through parallelism) and availability (through replication).

They can support very large databases with very high loads.

Implementation of parallel database systems naturally relies on distributed database techniques.


Advantages A parallel database system should provide the

following advantages. High-performance

Parallelism can increase throughput, using inter-query parallelism Inter-query parallelism is a form of parallelism in the

evaluation of database queries, in which several different queries execute concurrently on multiple processors to improve the overall throughput of the system.

decrease transaction response times, using intra-query parallelism Intra-query parallelism is a form of parallelism in the

evaluation of database queries, in which a single query is decomposed into smaller tasks that execute concurrently on multiple processors.


Advantages

High-availability Because a parallel database system consists of

many redundant components, it can well increase data availability and fault-tolerance.

Replicating data at several nodes is useful to support failover, a fault-tolerance technique that enables automatic redirection of transactions from a failed node to another node that stores a copy of the data. This provides uninterupted service to users.


Advantages Extensibility

Extensibility is the ability to expand the system smoothly by adding processing and storage power to the system.

Ideally, the parallel database system should Linear speedup and linear scale-up

Linear speedup refers to a linear increase in performance for a constant database size while the number of nodes (i.e., processing and storage power) are increased linearly.

Linear scale up refers to a sustained performance for a linear increase in both database size and number of nodes.


Extensibility

Advantages

Fig. 14.1 Extensibility Metrics


Functional Architecture

The functions supported by a parallel database system can be divided into three subsystems much like in a typical DBMS.

Session Manager Transaction Manager Data Manager


Functional Architecture Session Manager

It plays the role of a transaction monitor, providing support for client interactions with the server.

In particular, it performs the connections and disconnections between the client processes and the two other subsystems.

Therefore, it initiates and closes user sessions (which may contain multiple transactions).

In case of OLTP sessions, the session manager is able to trigger the execution of pre-loaded transaction code within data manager modules.


Functional Architecture Transaction Manager

It receives client transactions related to query compilation and execution.

It can access the database directory that holds all meta-information about data and programs.

Depending on the transaction, it activates the various compilation phases, triggers query execution, and returns the results as well as error codes to the client application.

Because it supervises transaction execution and commit, it may trigger the recovery procedure in case of transaction failure.

To speed up query execution, it may optimize and parallelize the query at compile-time.


Functional Architecture Data Manager

It provides all the low-level functions needed to run compiled queries in parallel, i.e., database operator execution, parallel transaction support, cache management, etc.

If the transaction manager is able to compile dataflow control, then synchronization and communication among data manager modules is possible. Otherwise, transaction control and synchronization must be done by a transaction manager module.


Parallel DBMS Architectures There are three basic parallel computer

architectures depending on how main memory or disk is shared:

shared-memory, shared-disk and shared-nothing. Hybrid architectures such as NUMA or cluster try to

combine the benefits of the basic architectures.


Parallel DBMS Architectures

Shared-Memory In the shared-memory any processor has access

to any memory module or disk unit through a fast interconnect (e.g., a high-speed bus or a cross-bar switch).

All the processors are under the control of a single operating system.

All shared-memory parallel database products today can exploit inter-query parallelism to provide high transaction throughput and intra-query parallelism to reduce response time of decision-support queries.



Shared-Memory

Fig. 14.3 Shared-Memory Architecture



Shared-Memory Shared-memory has two strong advantages:

simplicity Since meta-information (directory) and control

information (e.g., lock tables) can be shared by all processors, writing database software is not very different than for single processor computers.

Intra-query parallelism requires some parallelization but remains rather simple

load balancing. Load balancing is easy to achieve since it can be

achieved at run-time using the shared-memory by allocating each new task to the least busy processor.


Parallel DBMS Architectures Shared-Memory Shared-memory has three problems:

high cost, High cost is incurred by the interconnect that requires

fairly complex hardware because of the need to link each processor to each memory module or disk.

limited extensibility With faster processors (even with larger caches),

conflicting accesses to the shared-memory increase rapidly and degrade performance

Therefore, extensibility is limited to a few tens of processors, typically up to 16 for the best cost/performance using 4-processor boards.

low availability Finally, since the memory space is shared by all

processors, a memory fault may affect most processors thereby hurting availability. The solution is to use duplex memory with a redundant interconnect.


Parallel DBMS Architectures Shared-Disk

In the shared-disk approach any processor has access to any disk unit through the interconnect but exclusive (non-shared) access to its main memory.

Each processor-memory node is under the control of its own copy of the operating system. Then, each processor can access database pages on the shared disk and cache them into its own memory.

Since different processors can access the same page in conflicting update modes, global cache consistency is needed.

The first parallel DBMS that used shared-disk is Oracle with an efficient implementation of a distributed lock manager for cache consistency.

Other major DBMS vendors such as IBM, provide shared-disk implementations.


Shared-Disk


Shared-disk has a number of advantages: lower cost,

The cost of the interconnect is significantly less than with shared-memory since standard bus technology may be used.

high extensibility, Given that each processor has enough main memory,

interference on the shared disk can be minimized. Thus, extensibility can be better, typically up to a hundred processors.

load balancing, easy migration from centralized systems.

availability, Since memory faults can be isolated from other nodes,

availability can be higher.



Shared-Nothing In the shared-nothing approach each processor

has exclusive access to its main memory and disk unit(s).

Similar to shared-disk, each processor memory- disk node is under the control of its own copy of the operating system.

Each node can be viewed as a local site (with its own database and software) in a distributed database system.

Therefore, most solutions designed for distributed databases such as database fragmentation, distributed transaction management and distributed query processing may be reused.

Using a fast interconnect, it is possible to accommodate large numbers of nodes. This architecture is often called Massively Parallel Processor (MPP).


The first major parallel DBMS product was Teradata’s Database Computer that could accommodate a thousand processors in its early version.

Other major DBMS vendors such as IBM, Microsoft provide shared-nothing implementations.

Shared-Nothing


Shared-Nothing


As demonstrated by the existing products, shared-nothing has three main virtues:

lower cost, The cost advantage is better than that of shared-disk that

requires a special interconnect for the disks. high extensibility

By implementing a distributed database design that favors the smooth incremental growth of the system by the addition of new nodes, extensibility can be better (in the thousands of nodes).

high availability By replicating data on multiple nodes, high availability can

also be achieved.

Shared-Nothing


Thank You

pmit-6102 advanced database systems

Documents