pmit-6102 advanced database systems
DESCRIPTION
PMIT-6102 Advanced Database Systems. By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University. Lecture -14 Parallel Database Systems. Outline. Parallel Database Systems Fundamental Functional Architecture Parallel DBMS Architectures shared-memory, shared-disk and - PowerPoint PPT PresentationTRANSCRIPT
PMIT-6102Advanced Database Systems
By-Jesmin Akhter
Assistant Professor, IIT, Jahangirnagar University
Distributed DBMS Slide 2
Lecture -14Parallel Database
Systems
Distributed DBMS Slide 3
Outline Parallel Database Systems
Fundamental Functional Architecture Parallel DBMS Architectures
shared-memory, shared-disk and shared-nothing.
Distributed DBMS Slide 4
Parallel Database Systems A parallel computer, or multiprocessor, is a special
kind of distributed system made of a number of nodes (processors, memories and disks) connected by a very fast network within one or more cabinets in the same room.
Data distribution can be exploited to increase performance (through parallelism) and availability (through replication).
They can support very large databases with very high loads.
Implementation of parallel database systems naturally relies on distributed database techniques.
Distributed DBMS Slide 5
Advantages A parallel database system should provide the
following advantages. High-performance
Parallelism can increase throughput, using inter-query parallelism Inter-query parallelism is a form of parallelism in the
evaluation of database queries, in which several different queries execute concurrently on multiple processors to improve the overall throughput of the system.
decrease transaction response times, using intra-query parallelism Intra-query parallelism is a form of parallelism in the
evaluation of database queries, in which a single query is decomposed into smaller tasks that execute concurrently on multiple processors.
Distributed DBMS Slide 6
Advantages
High-availability Because a parallel database system consists of
many redundant components, it can well increase data availability and fault-tolerance.
Replicating data at several nodes is useful to support failover, a fault-tolerance technique that enables automatic redirection of transactions from a failed node to another node that stores a copy of the data. This provides uninterupted service to users.
Distributed DBMS Slide 7
Advantages Extensibility
Extensibility is the ability to expand the system smoothly by adding processing and storage power to the system.
Ideally, the parallel database system should Linear speedup and linear scale-up
Linear speedup refers to a linear increase in performance for a constant database size while the number of nodes (i.e., processing and storage power) are increased linearly.
Linear scale up refers to a sustained performance for a linear increase in both database size and number of nodes.
Distributed DBMS Slide 8
Extensibility
Advantages
Fig. 14.1 Extensibility Metrics
Distributed DBMS Slide 9
Functional Architecture
The functions supported by a parallel database system can be divided into three subsystems much like in a typical DBMS.
Session Manager Transaction Manager Data Manager
Distributed DBMS Slide 10
Functional Architecture Session Manager
It plays the role of a transaction monitor, providing support for client interactions with the server.
In particular, it performs the connections and disconnections between the client processes and the two other subsystems.
Therefore, it initiates and closes user sessions (which may contain multiple transactions).
In case of OLTP sessions, the session manager is able to trigger the execution of pre-loaded transaction code within data manager modules.
Distributed DBMS Slide 11
Functional Architecture Transaction Manager
It receives client transactions related to query compilation and execution.
It can access the database directory that holds all meta-information about data and programs.
Depending on the transaction, it activates the various compilation phases, triggers query execution, and returns the results as well as error codes to the client application.
Because it supervises transaction execution and commit, it may trigger the recovery procedure in case of transaction failure.
To speed up query execution, it may optimize and parallelize the query at compile-time.
Distributed DBMS Slide 12
Functional Architecture Data Manager
It provides all the low-level functions needed to run compiled queries in parallel, i.e., database operator execution, parallel transaction support, cache management, etc.
If the transaction manager is able to compile dataflow control, then synchronization and communication among data manager modules is possible. Otherwise, transaction control and synchronization must be done by a transaction manager module.
Distributed DBMS Slide 13
Parallel DBMS Architectures There are three basic parallel computer
architectures depending on how main memory or disk is shared:
shared-memory, shared-disk and shared-nothing. Hybrid architectures such as NUMA or cluster try to
combine the benefits of the basic architectures.
Distributed DBMS Slide 14
Parallel DBMS Architectures
Shared-Memory In the shared-memory any processor has access
to any memory module or disk unit through a fast interconnect (e.g., a high-speed bus or a cross-bar switch).
All the processors are under the control of a single operating system.
All shared-memory parallel database products today can exploit inter-query parallelism to provide high transaction throughput and intra-query parallelism to reduce response time of decision-support queries.
Distributed DBMS Slide 15
Parallel DBMS Architectures
Shared-Memory
Fig. 14.3 Shared-Memory Architecture
Distributed DBMS Slide 16
Parallel DBMS Architectures
Shared-Memory Shared-memory has two strong advantages:
simplicity Since meta-information (directory) and control
information (e.g., lock tables) can be shared by all processors, writing database software is not very different than for single processor computers.
Intra-query parallelism requires some parallelization but remains rather simple
load balancing. Load balancing is easy to achieve since it can be
achieved at run-time using the shared-memory by allocating each new task to the least busy processor.
Distributed DBMS Slide 17
Parallel DBMS Architectures Shared-Memory Shared-memory has three problems:
high cost, High cost is incurred by the interconnect that requires
fairly complex hardware because of the need to link each processor to each memory module or disk.
limited extensibility With faster processors (even with larger caches),
conflicting accesses to the shared-memory increase rapidly and degrade performance
Therefore, extensibility is limited to a few tens of processors, typically up to 16 for the best cost/performance using 4-processor boards.
low availability Finally, since the memory space is shared by all
processors, a memory fault may affect most processors thereby hurting availability. The solution is to use duplex memory with a redundant interconnect.
Distributed DBMS Slide 18
Parallel DBMS Architectures Shared-Disk
In the shared-disk approach any processor has access to any disk unit through the interconnect but exclusive (non-shared) access to its main memory.
Each processor-memory node is under the control of its own copy of the operating system. Then, each processor can access database pages on the shared disk and cache them into its own memory.
Since different processors can access the same page in conflicting update modes, global cache consistency is needed.
The first parallel DBMS that used shared-disk is Oracle with an efficient implementation of a distributed lock manager for cache consistency.
Other major DBMS vendors such as IBM, provide shared-disk implementations.
Distributed DBMS Slide 19
Shared-Disk
Distributed DBMS Slide 20
Shared-disk has a number of advantages: lower cost,
The cost of the interconnect is significantly less than with shared-memory since standard bus technology may be used.
high extensibility, Given that each processor has enough main memory,
interference on the shared disk can be minimized. Thus, extensibility can be better, typically up to a hundred processors.
load balancing, easy migration from centralized systems.
availability, Since memory faults can be isolated from other nodes,
availability can be higher.
Parallel DBMS Architectures
Distributed DBMS Slide 21
Shared-Nothing In the shared-nothing approach each processor
has exclusive access to its main memory and disk unit(s).
Similar to shared-disk, each processor memory- disk node is under the control of its own copy of the operating system.
Each node can be viewed as a local site (with its own database and software) in a distributed database system.
Therefore, most solutions designed for distributed databases such as database fragmentation, distributed transaction management and distributed query processing may be reused.
Using a fast interconnect, it is possible to accommodate large numbers of nodes. This architecture is often called Massively Parallel Processor (MPP).
Distributed DBMS Slide 22
The first major parallel DBMS product was Teradata’s Database Computer that could accommodate a thousand processors in its early version.
Other major DBMS vendors such as IBM, Microsoft provide shared-nothing implementations.
Shared-Nothing
Distributed DBMS Slide 23
Shared-Nothing
Distributed DBMS Slide 24
As demonstrated by the existing products, shared-nothing has three main virtues:
lower cost, The cost advantage is better than that of shared-disk that
requires a special interconnect for the disks. high extensibility
By implementing a distributed database design that favors the smooth incremental growth of the system by the addition of new nodes, extensibility can be better (in the thousands of nodes).
high availability By replicating data on multiple nodes, high availability can
also be achieved.
Shared-Nothing
Distributed DBMS Slide 25
Thank You