Download - BC0050
-
8/13/2019 BC0050
1/3
Sikkim Manipal University Page No. 1
What are the uses of Distributed Databases?
There are several reasons why distributed databases are developed. The following is a list of the
main motivations.
Organizational and economic reasons
Usage and interconnection of existing databases Incremental growth of an organization
Reduced communication overhead
Performance aspects
Increased reliability and availability
Organizational and economic reasons:Many organizations are decentral- ized, and a distributed
database approach fits more naturally the structure of the organization. With the recent
developments in computer technology, the economy-of-scale motivation for having large,
centralized computer centers is becoming questionable. The organizational and economic
motivations are probably the most important reason for developing distributed databases.
Interconnection of existing databases:Distributed databases are the nat- ural solution when several
databases already exist in an organization and the necessity of performing global applications arises.
In this case, the distributed database is created bottom-up from the preexisting local databases. This
process may require a certain degree of local restructuring; however, the effort which is required by
this restructuring is much less than that needed for the creation of a completely new centralized
database.
Incremental growth:If an organization grows by adding new, relatively autonomous organizational
units (new branches, new warehouses, etc.), then the distributed database approach supports a
smooth incremental growth with a minimum degree of impact on the already existing units.
Reduced communication overhead: In a geographically distributed data- base like the database ofExample 1.1, the fact that many applications are local clearly reduces the communication overhead
with respect to a centralized database. Therefore, the maximization of the locality of applications is
one of the primary objectives in distributed database design.
Performance considerations:The existence of several autonomous processors results in the
increase of performance through a high degree of parallelism. This consideration can be applied to
any multiprocessor system, and not only to distributed databases. However, distributed databases
have the advantage in that the decomposition of data reflects application dependent criteria which
maximize application locality; in this way the mutual interference between different processors is
minimized.
Reliability and availability:The distributed database approach, especially with redundant data, can
be used also in order to obtain higher reliability and availability. However, obtaining this goal is not
straightforward and requires the use of techniques which are still not completely understood. The
autonomous processing capability of the different sites does not by itself guarantee a higher overall
reliability of the system, but it ensures a graceful degradation property; in other words, failures in a
distributed database can be more frequent than in a centralized one because of the greater number
of components, but the effect of each failure is confined to those applications which use the data of
the failed site, and complete system crash is rare.
-
8/13/2019 BC0050
2/3
Sikkim Manipal University Page No. 2
Explain any three characteristics of Query processor.
Characterization of Query Processors
It is very difficult to give the characteristics, which differentiates centralized and distributed query
processors. Still some of them have been listed here. Out of them, the first four are common to
both and the next four are particular to distributed query processors.1. Languages:The input language to the query processor can be based on relational calculus or
relational algebra. In distributed context, the output language is generally some form of
relational algebra augmented with communication primitives.
2. Types of Optimization:Conceptually, query optimization is to choose a best point of
solution space that leads to the minimum cost. A popular approach called exhaustive search
is used. This is a method where heuristic techniques are used. In both centralized and
distributed systems a common heuristic is to minimize the size of intermediate relations.
Performing unary operations first and ordering the binary operations by the increasing size
of their intermediate relations can do this.
3. Optimization Timing:A query may be optimized at different times relative to the actual time
of query execution. Optimization can be done statically before executing the query ordynamically as the query is executed. The main advantage of the later method is that the
actual sizes of the intermediate relations are available to the query processor, thereby
minimizing the probability of a bad choice.
4. Statistics:The effectiveness of the query optimization is based on statistics on the database.
Dynamic query optimization requires statistics in order to choose the operation that has to
be done first. Static query optimization requires statistics to estimate the size of
intermediate relations. The accuracy of the statistics can be improved by periodical
updating.
5. Decision Sites:Most of the systems use centralized decision approach, in which a single site
generates the strategy. However, the decision process could be distributed among various
sites participating in the elaboration of the best strategy. The centralized approach is simplerbut requires the knowledge of the complete distributed database where as the distributed
approach requires only local information.
6. Exploitation of the Network Topology:the distributed query processor exploits the network
topology. This issue reduces the work of distributed query optimization, which can be
dealt as two separate problems:
Selection of the global execution strategy, based on the inter-site communication and selection of
each local execution strategy, based on a centralized query processing algorithms. With local area
networks, communication costs are comparable to I/O costs.
1. Exploitation of Replicated Fragments:For reliability purposes it is useful to have fragments
replicated at different sites. Query processors have to exploit this information either
statically or dynamically for processing the query efficiently.
2. Use of Semi-Joins:The semi-join operation reduces the size of the data that are exchanged
between the sites so that the communication cost can be reduced.
Explain the properties of Transaction?
-
8/13/2019 BC0050
3/3
Sikkim Manipal University Page No. 3
The Transaction is an application or part of application that is characterized by the following
properties.
1. Atomicity: Either all or none of the transactions operations are performed. It requires that if
a transaction is interrupted by a failure its partial results are not at all taken into
consideration and the whole operation has to be repeated. The two types of problems that
do not allow the transaction to complete are: Transaction aborts: This may be requested by the transaction itself as some of its inputs are
wrong or it has been estimated that the results produced may become useless. It also may
be forced by the system for its own reason. The activity of ensuring atomicity in the
presence of Transaction aborts is called Transaction recovery.
System Crashes: It is because of some catastrophic effects that crash the system without any
prior knowledge. The activity of ensuring atomicity in the presence of system crashes is
called crash recovery.
The completion of transaction is called Commit. The primitives that can be used for carrying out the
transaction are:
Begin _Transaction Begin _Transaction Begin _Transaction
Commit Abort X System
Forces Abort
2. Durability:Once a transaction is committed, the system must guarantee that the results of
operations will never be lost, independent of subsequent failures. The activity of providing
Durability of the transaction is called Database recovery.
3. Serializability: If many transactions execute concurrently, the result must be same as if they
were executed serially in the same order. The activity of providing Serializability of the
transaction is called Concurrency control.
4. Isolation:This property states that an incomplete transaction cannot disclose its result to
other transactions until it is committed. This property has to be strictly followed to avoid a
problem called Cascading Aborts (Domino Effect). According to this all the transactions that
have observed the partial results have to be aborted.