Download - NDIM
-
8/12/2019 NDIM
1/64
1
1. Entity relationship model and explain all three levels of E-R Diagram?An Entity Relationship model(ER model) is an abstract way to describe a database.
It is a visual representation of different data using conventions that describe how these
data are related to each other.
There are three basic elements in ER models:
Entitiesare the things about which we seek information. Attributesare the data we collect about the entities. Relationships provide the structure needed to draw information from multiple
entities.
Symbols used in E-R Diagram:
Entity rectangle Attribute-oval Relationship diamond Link- line
Entities and Attributes
Entity Type: It is a set of similar objects or a category of entities that are well defined
A rectangle represents an entity set Ex: students, courses We often just say entity and mean entity type
Attribute: It describes one aspect of an entity type; usually [and best when] single valued
and indivisible (atomic)
Represented by oval on E-R diagram Ex: name, maximum enrollment
-
8/12/2019 NDIM
2/64
2
Types of Attribute:
Simple and Composite Attribute
Simple attribute that consist of a single atomic value.A simple attribute cannot be
subdivided. For example the attributes age, sex etc are simple attributes.
A composite attribute is an attribute that can be further subdivided. For example the
attribute ADDRESS can be subdivided into street, city, state, and zip code.
Simple Attribute: Attribute that consist of a single atomic value.
Example: Salary, age etc
Composite Attribute : Attribute value not atomic.
Example : Address : House_no:City:State
Name : First Name: Middle Name: Last Name
Single Valued and Multi Valued attributeA single valued attribute can have only a single value. For example a person can have only
one date of birth, age etc. That is a single valued attributes can have only single value.
But it can be simple or composite attribute.That is date of birth is a composite attribute ,
age is a simple attribute. But both are single valued attributes.
Multivalued attributes can have multiple values. For instance a person may have multiple
phone numbers,multiple degrees etc.Multivalued attributes are shown by a double line
connecting to the entity in the ER diagram.
Single Valued Attribute: Attribute that hold a single value
Example1: Age
Exampe 2: City
Example 3: Customer id
Multi Valued Attribute: Attribute that hold multiple values.
Example1: A customer can have multiple phone numbers, email ids etc
Example 2: A person may have several college degrees
Stored and Derived Attributes
The value for the derived attribute is derived from the stored attribute. For example Date
of birth of a person is a stored attribute. The value for the attribute AGE can be derived
by subtracting the Date of Birth(DOB) from the current date. Stored attribute supplies a
value to the related attribute.
Stored Attribute: An attribute that supplies a value to the related attribute.
Example: Date of Birth
-
8/12/2019 NDIM
3/64
3
Derived Attribute: An attribute thats value is derived from a stored attribute.
Example : age, and its value is derived from the stored attribute Date of Birth.
Keys
Super key: An attribute or set of attributes that uniquely identifies an entitythere can be
many of these
Composite key:A key requiring more than one attribute
Candidate key: a superkey such that no proper subset of its attributes is also a superkey
(minimal superkey has no unnecessary attributes)
Primary key: The candidate key chosen to be used for identifying entities and accessing
records. Unless otherwise noted key means primary key
Alternate key: A candidate key not used for primary key
Secondary key: Attribute or set of attributes commonly used for accessing records, butnot necessarily unique
Foreign key:An attribute that is the primary key of another table and is used to establish a
relationship with that table where it appears as an attribute also.
Graphical Representation in E-R diagram
Rectangle Entity
Ellipses Attribute (underlined attributes are [part of] the primary key)
Double ellipses multi-valued attribute
Dashed ellipses derived attribute, e.g. age is derivable from birthdate and current date.
Relationships
Relationship: connects two or more entities into an association/relationship
John majors in Computer ScienceRelationship Type: set of similar relationships
-
8/12/2019 NDIM
4/64
4
Student(entity type) is related to Department(entity type) by MajorsIn(relationshiptype).
Relationship Types may also have attributes in the E-R model. When they are mapped to
the relational model, the attributes become part of the relation. Represented by a diamond
on E-R diagram.
Cardinality of Relationships
Cardinality is the number of entity instances to which another entity set can map under the
relationship. This does not reflect a requirement that an entity has to participate in a
relationship. Participation is another concept.
One-to-one: X-Y is 1:1 when each entity in X is associated with at most one entity in Y,
and each entity in Y is associated with at most one entity in X.
One-to-many: X-Y is 1:M when each entity in X can be associated with many entities in
Y, but each entity in Y is associated with at most one entity in X.
Many-to-many: X:Y is M:M if each entity in X can be associated with many entities in Y,and each entity in Y is associated with many entities in X (many =>one or more and
sometimes zero)
-
8/12/2019 NDIM
5/64
5
-
8/12/2019 NDIM
6/64
6
Relationship Participation
Constraints
Total participation
Every member ofentity set must
participate in the
relationship
Representedbydouble line from
entity rectangle to relationship diamond
E.g., A Classentity cannot exist unless related to a Facultymember entity in thisexample, not necessarily at Juniata.
You can set this double line in Dia In a relational model we will use the referencesclause.
Key constraint
If every entity participates in exactly one relationship, both a total participation anda key constraint hold
E.g., if a class is taught by only one faculty member.Partial participation
Not every entity instance must participate Represented by single line from entity rectangle to relationship diamond E.g., A Textbookentity can exist without being related to a Classor vice versa.
-
8/12/2019 NDIM
7/64
7
Strong and Weak Entities
Strong Entity Vs Weak Entity
An entity set that does not have sufficient
attributes to form a primary key is termed as a
weak entity set. An entity set that has a
primary key is termed as strong entity set.
A weak entity is existence dependent. That is
the existence of a weak entity depends on the
existence of a identifying entity set. The
discriminator (or partial key) is used to identify other attributes of a weak entity set.The
primary key of a weak entity set is formed by primary key of identifying entity set and the
discriminator of weak entity set. The existence of a weak entity is indicated by a doublerectangle in the ER diagram. We underline the discriminator of a weak entity set with a
dashed line in the ER diagram.
-
8/12/2019 NDIM
8/64
8
2. Make a ER Diagram of Library Management System? (all three levels)Library management System (LMS) provides a simple GUI (graphical user interface) for
the Library Staff to manage the functions of the library effectively. Usually when a book is
returned or issued, it is noted down in a register after which data entry is done to update
the status of the books in a moderate scale. This process takes some time and proper
updation cannot be guaranteed. Such anomalies in the updation process can cause loss of
books. So a more user friendly interface which could update the database instantly, has a
great demand in libraries.
E-R Diagram for LMS:
-
8/12/2019 NDIM
9/64
9
3. Explain decision table and its parts? Make a decision table of a report card
A decision table is an excellent tool to use in both testing and requirements management.
Essentially it is a structured exercise to formulate requirements when dealing with
complex business rules. Decision tables are used to model complicated logic. They can
make it easy to see that all possible combinations of conditions have been considered and
when conditions are missed, it is easy to see this.
A decision table is a good way to deal with combinations of things (e.g. inputs). This
technique is sometimes also referred to as a cause-effect table. The reason for this is that
there is an associated logic diagramming technique called cause-effect graphing which
was sometimes used to help derive the decision table (Myers describes this as a
combinatorial logic network. However, most people find it more useful just to use the
table described in. Decision tables provide a systematic way of stating complex business rules, which
is useful for developers as well as for testers.
Decision tables can be used in test design whether or not they are used inspecifications, as they help testers explore the effects of combinations of different
inputs and other software states that must correctly implement business rules.
It helps the developers to do a better job can also lead to better relationships withthem. Testing combinations can be a challenge, as the number of combinations can
often be huge. Testing all combinations may be impractical if not impossible. We
have to be satisfied with testing just a small subset of combinations but making the
choice of which combinations to test and which to leave out is also important. If
you do not have a systematic way of selecting combinations, an arbitrary subset
will be used and this may well result in an ineffective test effort.
The four quadrants
Conditions Condition alternatives
Actions Action entries
Each decision corresponds to a variable, relation or predicate whose possible values are
listed among the condition alternatives. Each action is a procedure or operation to perform,
and the entries specify whether (or in what order) the action is to be performed for the set
of condition alternatives the entry corresponds to. Many decision tables include in their
condition alternatives the don't care symbol, a hyphen. Using don't cares can simplify
-
8/12/2019 NDIM
10/64
10
decision tables, especially when a given condition has little influence on the actions to be
performed. In some cases, entire conditions thought to be important initially are found to
be irrelevant when none of the conditions influence which actions are performed.
Aside from the basic four quadrant structure, decision tables vary widely in the way the
condition alternatives and action entries are represented. Some decision tables use simple
true/false values to represent the alternatives to a condition (akin to if-then-else), other
tables may use numbered alternatives (akin to switch-case), and some tables even use
fuzzy logic or probabilistic representations for condition alternatives.In a similar way,
action entries can simply represent whether an action is to be performed (check the actions
to perform), or in more advanced decision tables, the sequencing of actions to perform
(number the actions to perform).
-
8/12/2019 NDIM
11/64
11
4. Explain various types of cohesion and coupling along with the Diagram?
In software engineering, coupling or dependency is the degree to which each program
module relies on each one of the other modules.
Coupling is usually contrasted with cohesion. Low coupling often correlates with high
cohesion, and vice versa. The software quality metrics of coupling and cohesion were
invented by Larry Constantine, an original developer of Structured Design who was also
an early proponent of these concepts (see also SSADM). Low coupling is often a sign of a
well-structured computer system and a good design, and when combined with high
cohesion, supports the general goals of high readability and maintainability.
In computer programming, cohesion refers to the degree to which the elements of a
module belong together. Thus, it is a measure of how strongly related each piece of
functionality expressed by the source code of a software module is.Cohesion is an ordinal type of measurement and is usually expressed as high cohesion
or low cohesion when being discussed. Modules with high cohesion tend to be
preferable because high cohesion is associated with several desirable traits of software
including robustness, reliability, reusability, and understandability whereas low cohesion
is associated with undesirable traits such as being difficult to maintain, difficult to test,
difficult to reuse, and even difficult to understand.
Cohesion is often contrasted with coupling, a different concept. High cohesion often
correlates with loose coupling, and vice versa. The software quality metrics of coupling
and cohesion were invented by Larry Constantine based on characteristics of good
programming practices that reduced maintenance and modification costs.
Types of coupling
-
8/12/2019 NDIM
12/64
12
Conceptual model of coupling
Coupling can be "low" (also "loose" and "weak") or "high" (also "tight" and "strong").
Some types of coupling, in order of highest to lowest coupling, are as follows:
Procedural programming
A module here refers to a subroutine of any kind, i.e. a set of one or more statements
having a name and preferably its own set of variable names.
Content coupling (high)
Content coupling (also known as Pathological coupling) occurs when one module
modifies or relies on the internal workings of another module (e.g., accessing local
data of another module).
Therefore changing the way the second module produces data (location, type,
timing) will lead to changing the dependent module.Common coupling
Common coupling (also known as Global coupling) occurs when two modules
share the same global data (e.g., a global variable).
Changing the shared resource implies changing all the modules using it.
External coupling
External coupling occurs when two modules share an externally imposed data
format, communication protocol, or device interface. This is basically related to the
communication to external tools and devices.
Control coupling
Control coupling is one module controlling the flow of another, by passing it
information on what to do (e.g., passing a what-to-do flag).
Stamp coupling (Data-structured coupling)
Stamp coupling occurs when modules share a composite data structure and use
only a part of it, possibly a different part (e.g., passing a whole record to a function
that only needs one field of it).
This may lead to changing the way a module reads a record because a field that the
module does not need has been modified.
Data coupling
Data coupling occurs when modules share data through, for example, parameters.
Each datum is an elementary piece, and these are the only data shared (e.g.,
passing an integer to a function that computes a square root).
-
8/12/2019 NDIM
13/64
13
Message coupling (low)
This is the loosest type of coupling. It can be achieved by state decentralization (as
in objects) and component communication is done via parameters or message
passing (see Message passing).
No coupling
Modules do not communicate at all with one another.
Object-oriented programming
Subclass Coupling
Describes the relationship between a child and its parent. The child is connected to
its parent, but the parent is not connected to the child.
Temporal coupling
When two actions are bundled together into one module just because they happento occur at the same time.
In recent work various other coupling concepts have been investigated and used as
indicators for different modularization principles used in practice.
Disadvantages
Tightly coupled systems tend to exhibit the following developmental characteristics,
which are often seen as disadvantages:
1. A change in one module usually forces a ripple effect of changes in other modules.2. Assembly of modules might require more effort and/or time due to the increased
inter-module dependency.
3. A particular module might be harder to reuse and/or test because dependentmodules must be included.
Performance issues
Whether loosely or tightly coupled, a system's performance is often reduced by message
and parameter creation, transmission, translation (e.g. marshaling) and message
interpretation (which might be a reference to a string, array or data structure), which
require less overhead than creating a complicated message such as a SOAP message.
Longer messages require more CPU and memory to produce. To optimize runtime
performance, message length must be minimized and message meaning must be
maximized.
-
8/12/2019 NDIM
14/64
14
Message Transmission Overhead and Performance
Since a message must be transmitted in full to retain its complete meaning,
message transmission must be optimized. Longer messages require more CPU and
memory to transmit and receive. Also, when necessary, receivers must reassemble
a message into its original state to completely receive it. Hence, to optimize
runtime performance, message length must be minimized and message meaning
must be maximized.
Message Translation Overhead and Performance
Message protocols and messages themselves often contain extra information (i.e.,
packet, structure, definition and language information). Hence, the receiver often
needs to translate a message into a more refined form by removing extra characters
and structure information and/or by converting values from one type to another.Any sort of translation increases CPU and/or memory overhead. To optimize
runtime performance, message form and content must be reduced and refined to
maximize its meaning and reduce translation.
Message Interpretation Overhead and Performance
All messages must be interpreted by the receiver. Simple messages such as integers
might not require additional processing to be interpreted. However, complex
messages such as SOAP messages require a parser and a string transformer for
them to exhibit intended meanings. To optimize runtime performance, messages
must be refined and reduced to minimize interpretation overhead.
Solutions
One approach to decreasing coupling is functional design, which seeks to limit the
responsibilities of modules along functionality, coupling increases between two classes A
and Bif:
Ahas an attribute that refers to (is of type) B. Acalls on services of an object B. Ahas a method that references B(via return type or parameter). Ais a subclass of (or implements) class B.
Low coupling refers to a relationship in which one module interacts with another module
through a simple and stable interface and does not need to be concerned with the other
module's internal implementation (see Information Hiding).
-
8/12/2019 NDIM
15/64
15
Systems such as CORBA or COM allow objects to communicate with each other without
having to know anything about the other object's implementation. Both of these systems
even allow for objects to communicate with objects written in other languages.
Coupling versus Cohesion
Coupling and Cohesion are terms which occur together very frequently. Coupling refers to
the interdependencies between modules, while cohesion describes how related are the
functions within a single module. Low cohesion implies that a given module performs
tasks which are not very related to each other and hence can create problems as the
module becomes large.
Module coupling
Coupling in Software Engineering describes a version of metrics associated with this
concept.For data and control flow coupling:
di: number of input data parameters ci: number of input control parameters do: number of output data parameters co: number of output control parameters
For global coupling:
gd: number of global variables used as data gc: number of global variables used as control
For environmental coupling:
w: number of modules called (fan-out) r: number of modules calling the module under consideration (fan-in)
Coupling(C) makes the value larger the more coupled the module is. This number ranges
from approximately 0.67 (low coupling) to 1.0 (highly coupled)
For example, if a module has only a single input and output data parameter
If a module has 5 input and output data parameters, an equal number of control
parameters, and accesses 10 items of global data, with a fan-in of 3 and a fan-out of 4,
-
8/12/2019 NDIM
16/64
16
COUPLING
An indication of the strength of interconnections between program units.
Highly coupled have program units dependent on each other. Loosely coupled are made
up of units that are independent or almost independent.
Modules are independent if they can function completely without the presence of the
other. Obviously, can't have modules completely independent of each other. Must interact
so that can produce desired outputs. The more connections between modules, the more
dependent they are in the sense that more info about one modules is required to understand
the other module.
Three factors: number of interfaces, complexity of interfaces, type of info flow alonginterfaces.
Want to minimize number of interfaces between modules, minimize the complexity of
each interface, and control the type of info flow. An interface of a module is used to pass
information to and from other modules.
In general, modules tightly coupled if they use shared variables or if they exchange control
info.
Loose coupling if info held within a unit and interface with other units via parameter lists.
Tight coupling if shared global data.If need only one field of a record, don't pass entire record. Keep interface as simple and
small as possible.
Two types of info flow: data or control.
Passing or receiving back control info means that the action of the module willdepend on this control info, which makes it difficult to understand the module.
Interfaces with only data communication result in lowest degree of coupling,followed by interfaces that only transfer control data. Highest if data is hybrid.
Ranked highest to lowest:
1. Content coupling: if one directly references the contents of the other.When one module modifies local data values or instructions in another module.
(can happen in assembly language)
if one refers to local data in another module.
if one branches into a local label of another.
-
8/12/2019 NDIM
17/64
17
2. Common coupling: access to global data.modules bound together by global data structures.
3. Control coupling: passing control flags (as parameters or globals) so that onemodule controls the sequence of processing steps in another module.
4. Stamp coupling: similar to common coupling except that global variables areshared selectively among routines that require the data. E.g., packages in Ada.
More desirable than common coupling because fewer modules will have to be
modified if a shared data structure is modified. Pass entire data structure but need
only parts of it.
5. Data coupling: use of parameter lists to pass data items between routines.COHESION
Measure of how well module fits together.A component should implement a single logical function or single logical entity. All the
parts should contribute to the implementation.
Many levels of cohesion:
1. Coincidental cohesion: the parts of a component are not related but simply bundledinto a single component.
harder to understand and not reusable.
2. Logical association: similar functions such as input, error handling, etc. puttogether. Functions fall in same logical class. May pass a flag to determine which
ones executed.
interface difficult to understand. Code for more than one function may be
intertwined, leading to severe maintenance problems. Difficult to reuse
3. Temporal cohesion: all of statements activated at a single time, such as start up orshut down, are brought together. Initialization, clean up.
Functions weakly related to one another, but more strongly related to functions in
other modules so may need to change lots of modules when do maintenance.
4. Procedural cohesion: a single control sequence, e.g., a loop or sequence of decisionstatements. Often cuts across functional lines. May contain only part of a complete
function or parts of several functions.
Functions still weakly connected, and again unlikely to be reusable in another
product.
-
8/12/2019 NDIM
18/64
18
5. Communicational cohesion: operate on same input data or produce same outputdata. May be performing more than one function. Generally acceptable if alternate
structures with higher cohesion cannot be easily identified.
still problems with reusability.
6. Sequential cohesion: output from one part serves as input for another part. Maycontain several functions or parts of different functions.
7. Informational cohesion: performs a number of functions, each with its own entrypoint, with independent code for each function, all performed on same data
structure. Different than logical cohesion because functions not intertwined.
8. Functional cohesion: each part necessary for execution of a single function. e.g.,compute square root or sort the array.
Usually reusable in other contexts. Maintenance easier.9. Type cohesion: modules that support a data abstraction.
Not strictly a linear scale. Functional much stronger than rest while first two much
weaker than others. Often many levels may be applicable when considering two
elements of a module. Cohesion of module considered as highest level of cohesion
that is applicable to all elements in the module.
-
8/12/2019 NDIM
19/64
19
5. Explain project selection technique and data dictionary with the help of example?
One of the biggest decisions that any organization would have to make is related to the
projects they would undertake. Once a proposal has been received, there are numerous
factors that need to be considered before an organization decides to take it up.
The most viable option needs to be chosen, keeping in mind the goals and requirements of
the organization. How is it then that you decide whether a project is viable? How do you
decide if the project at hand is worth approving? This is where project selection methods
come in use.
Choosing a project using the right method is therefore of utmost importance. This is what
will ultimately define the way the project is to be carried out.
But the question then arises as to how you would go about finding the right methodology
for your particular organization. At this instance, you would need careful guidance in theproject selection criteria, as a small mistake could be detrimental to your project as a
whole, and in the long run, the organization as well.
Selection Methods
There are various project selection methods practised by the modern business
organizations. These methods have different features and characteristics. Therefore, each
selection method is best for different organizations.
Although there are many differences between these project selection methods, usually the
underlying concepts and principles are the same.
Following is an illustration of two of such methods (Benefit Measurement and
Constrained Optimization methods):
-
8/12/2019 NDIM
20/64
20
As the value of one project would need to be compared against the other projects, you
could use the benefit measurement methods. This could include various techniques, of
which the following are the most common:
You and your team could come up with certain criteria that you want your idealproject objectives to meet. You could then give each project scores based on how
they rate in each of these criteria and then choose the project with the highest
score.
When it comes to the Discounted Cash flow method, the future value of a project isascertained by considering the present value and the interest earned on the money.
The higher the present value of the project, the better it would be for your
organization.
The rate of return received from the money is what is known as the IRR. Hereagain, you need to be looking for a high rate of return from the project.
The mathematical approach is commonly used for larger projects. The constrained
optimization methods require several calculations in order to decide on whether or not a
project should be rejected.
Cost-benefit analysis is used by several organizations to assist them to make their
selections. Going by this method, you would have to consider all the positive aspects of
the project which are the benefits and then deduct the negative aspects (or the costs) from
the benefits. Based on the results you receive for different projects, you could choose
which option would be the most viable and financially rewarding.
These benefits and costs need to be carefully considered and quantified in order to arrive
at a proper conclusion. Questions that you may want to consider asking in the selection
process are:
Would this decision help me to increase organizational value in the long run? How long will the equipment last for? Would I be able to cut down on costs as I go along?
In addition to these methods, you could also consider choosing based on opportunity cost -
When choosing any project, you would need to keep in mind the profits that you would
make if you decide to go ahead with the project.
Profit optimization is therefore the ultimate goal. You need to consider the difference
between the profits of the project you are primarily interested in and the next best
alternative.
-
8/12/2019 NDIM
21/64
21
Implementation of the Chosen Method:
The methods mentioned above can be carried out in various combinations. It is best that
you try out different methods, as in this way you would be able to make the best decision
for your organization considering a wide range of factors rather than concentrating on just
a few. Careful consideration would therefore need to be given to each project.
Conclusion:
In conclusion, you would need to remember that these methods are time-consuming, but
are absolutely essential for efficient business planning.
It is always best to have a good plan from the inception, with a list of criteria to be
considered and goals to be achieved. This will guide you through the entire selection
process and will also ensure that you do make the right choice.
A data dictionary is a collection of data about data. It maintains information about thedefintion, structure, and use of each data element that an organization uses.
There are many attributes that may be stored about a data element. Typical attributes used
in CASE tools (Computer Assisted Software Engineering) are:
Name Aliases or synonyms Default label Description Source(s) Date of origin Users Programs in which used Change authorizations Access authorization Data type Length Units(cm., degrees C, etc.) Range of values Frequency of use Input/output/local Conditional values Parent structure
-
8/12/2019 NDIM
22/64
22
Subsidiary structures Repetitive structures Physical location: record, file, data base
A data dictionary is invaluable for documentation purposes, for keeping control
information on corporate data, for ensuring consistency of elements between
organizational systems, and for use in developing databases.
Data dictionary software packages are commercially available, often as part of a CASE
package or DBMS. DD software allows for consistency checks and code generation. It is
also used in DBMSs to generate reports.
The term data dictionary and data repository are used to indicate a more general
software utility than a catalogue. A catalogueis closely coupled with the DBMS software.
It provides the information stored in it to the user and the DBA, but it is mainly accessedby the various software modules of the DBMS itself, such as DDL and DML compilers,
the query optimiser, the transaction processor, report generators, and the constraint
enforcer. On the other hand, a data dictionaryis a data structure that stores metadata, i.e.,
(structured) data about data. The software package for a stand-alone data dictionary or
data repository may interact with the software modules of the DBMS, but it is mainly used
by the designers, users and administrators of a computer system for information resource
management. These systems are used to maintain information on system hardware and
software configuration, documentation, application and users as well as other information
relevant to system administration.
If a data dictionary system is used only by the designers, users, and administrators and not
by the DBMS Software, it is called a passive data dictionary.Otherwise, it is called an
active data dictionaryor data dictionary.When a passive data dictionary is updated, it
is done so manually and independently from any changes to a DBMS (database) structure.
With an active data dictionary, the dictionary is updated first and changes occur in the
DBMS automatically as a result.
Database users and application developers can benefit from an authoritative data
dictionary document that catalogs the organization, contents, and conventions of one or
more databases. This typically includes the names and descriptions of various tables
(records or Entities) and their contents (fields) plus additional details, like the type and
length of each data element. Another important piece of information that a data dictionary
can provide is the relationship between Tables. This is sometimes referred to in Entity-
-
8/12/2019 NDIM
23/64
23
Relationship diagrams, or if using Set descriptors, identifying in which Sets database
Tables participate.
In an active data dictionary constraints may be placed upon the underlying data. For
instance, a Range may be imposed on the value of numeric data in a data element (field),
or a Record in a Table may be FORCED to participate in a set relationship with another
Record-Type. Additionally, a distributed DBMS may have certain location specifics
described within its active data dictionary (e.g. where Tables are physically located).
The data dictionary consists of record types (tables) created in the database by systems
generated command files, tailored for each supported back-end DBMS. Command files
contain SQL Statements for CREATE TABLE, CREATE UNIQUE INDEX, ALTER
TABLE (for referential integrity), etc., using the specific statement required by that type
of database.There is no universal standard as to the level of detail in such a document.
Middleware
In the construction of database applications, it can be useful to introduce an additional
layer of data dictionary software, i.e. middleware, which communicates with the
underlying DBMS data dictionary. Such a "high-level" data dictionary may offer
additional features and a degree of flexibility that goes beyond the limitations of the native
"low-level" data dictionary, whose primary purpose is to support the basic functions of the
DBMS, not the requirements of a typical application. For example, a high-level data
dictionary can provide alternative entity-relationship models tailored to suit different
applications that share a common database. Extensions to the data dictionary also can
assist in query optimization against distributed databases. Additionally, DBA functions are
often automated using restructuring tools that are tightly coupled to an active data
dictionary.
Software frameworks aimed at rapid application development sometimes include high-
level data dictionary facilities, which can substantially reduce the amount of programming
required to build menus, forms, reports, and other components of a database application,
including the database itself. For example, PHPLens includes a PHP class library to
automate the creation of tables, indexes, and foreign key constraints portably for multiple
databases. Another PHP-based data dictionary, part of the RADICORE toolkit,
automatically generates program objects, scripts, and SQL code for menus and forms with
data validation and complex joins. For the ASP.NET environment, Base One's data
-
8/12/2019 NDIM
24/64
24
dictionary provides cross-DBMS facilities for automated database creation, data
validation, performance enhancement (caching and index utilization), application security,
and extended data types. Visual DataFlex features provides the ability to use
DataDictionaries as class files to form middle layer between the user interface and the
underlying database. The intent is to create standardized rules to maintain data integrity
and enforce business rules throughout one or more related applications.
Platform-specific examples
Data description specifications(DDS) allow the developer to describe data attributes in
file descriptions that are external to the application program that processes the data, in the
context of an IBM System i.
The table below is an example of a typical data dictionary entry. The IT staff uses this to
develop and maintain the database.
Field Name Data Type Other information
CustomerID Autonumber Primary key field
Title TextLookup: Mr, Mrs, Miss, Ms
Field size 4
Surname TextField size 15
Indexed
FirstName Text Field size 15
DateOfBirth Date/TimeFormat: Medium Date
Range check: >=01/01/1930
HomeTelephone TextField size: 12
Presence check
-
8/12/2019 NDIM
25/64
25
6. Explain data flow diagram and pseudo codes with the difference between physical
DFD and logical DFD any five points?
To understand the differences between a physical and logical DFD, we need to know what
DFD is. A DFD stands for data flow diagram and it helps in representing graphically the
flow of data in an organization, particularly its information system. A DFD enables a user
to know where information comes in, where it goes inside the organization and how it
finally leaves the organization. DFD does give information about whether the processing
of information takes place sequentially or if it is processed in a parallel fashion. There are
two types of DFDs known as physical and logical DFD. Though both serve the same
purpose of representing data flow, there are some differences between the two that will be
discussed in this article.
Any DFD begins with an overview DFD that describes in a nutshell the system to bedesigned. A logical data flow diagram, as the name indicates concentrates on the business
and tells about the events that take place in a business and the data generated from each
such event. A physical DFD, on the other hand is more concerned with how the flow of
information is to be represented. It is a usual practice to use DFDs for representation of
logical data flow and processing of data. However, it is prudent to evolve a logical DFD
after first developing a physical DFD that reflects all the persons in the organization
performing various operations and how data flows between all these persons.
What is the difference between Physical DFD and Logical DFD?
While there is no restraint on the developer to depict how the system is constructed in the
case of logical DFD, it is necessary to show how the system has been constructed. There
are certain features of logical DFD that make it popular among organizations. A logical
DFD makes it easier to communicate for the employees of an organization, leads to more
stable systems, allows for better understanding of the system by analysts, is flexible and
easy to maintain, and allows the user to remove redundancies easily. On the other hand, a
physical DFD is clear on division between manual and automated processes, gives detailed
description of processes, identifies temporary data stores, and adds more controls to make
the system more efficient and simple.
Data Flow Diagrams (DFDs) are used to show the flow of data through a system in terms
of the inputs, processes,and outputs.
-
8/12/2019 NDIM
26/64
26
External Entities
Data either comes from or goes to External Entities. They are either the source or
destination (sometimes called a source or sink) of data, which is considered to be external
to the system. It could be people or groups that provide or input data to the system or who
receive data from the system Defined by an oval see below. Identified by a noun.
External Entities are not part of the system but are needed to provide sources of data used
by the system. Fig 1 below shows an example of an External Entity
Fig 1 External Entity
Processes and Data Flows
Data passed to, or from an External Entity mustbe processed in some way. The passing
of data (flow of data) is shown on the DFD as an arrow. The direction of the arrow
defines the direction of the flow of data. All data flows to and from External Entities to
Processes and vice versa need to be named. Fig 2 below shows an example of a data flow:
Fig 2 Data Flow
Process processing data that emanates from external entities or data stores. The process
could be manual, mechanised, or automated/computed. A data process will use or alter the
data in some way. Identified from a scenario by a verb or action. Each process is given a
unique number and is also given a name. An example of a Process is shown in Fig 3
below:
Fig 3 - Process
Customer
Customer details
Add New Customer
1
-
8/12/2019 NDIM
27/64
27
Data Stores
A Data Store is a point where data is
held and receives or provides data
through data flows. Examples of data
stores are transaction records, data files, reports, and documents. Could be a filing cabinet
or magnetic media. Data stores are named in the singular and numbered. A manual store
such as a filing cabinet is numbered with an M prefix. A D is used as a prefix for an
electronic store such as a relational table. An example of an electronic data store is
shown in Fig 4 below
Fig 4 Data StoreRules
There are certain rules that must be applied when drawing DFDs. These are explained
below:
An external entity cannot be connected to another external entity by a data flow An external entity cannot be connected directly to a data store An external entity must pass data to, or receive data from a process using a data
flow
A data store cannot be directly connected to another data store A data store cannot be directly connected to an external entity A data store can pass data to, or receive data from a process A process can pass data to and receive data from another process Data must flow from external entity to a process and then be passed onto anther
process or a data store
A matrix for the above rules is show in Fig 5 below
Fig 5 ERD Rules
Entity Process Store
Entity No Yes No
Process Yes Yes Yes
Store No Yes No
Customer
-
8/12/2019 NDIM
28/64
28
There are different levels of DFDs depending on the level of detail shown
Level 0 or context diagram
The context diagram shows the top-level process, the whole system, as a single process
rectangle. It shows all external entities and all data flows to and from the system.
Analysts draw the context diagram first to show the high-level processing in a system. An
example of a Context Diagram is shown in Fig 6 below:
Fig 6 Context Diagram for a Car Sales System
Level 1 DFD
This level of DFD shows all external entities that are on the context diagram, all the high-
level processes and all data stores used in the system. Each high-level process may
contain sub-processes. These are shown on lower level DFDs.
BilbosCar
Sales
Customer
Management
Customer
Management
customerdetails
new car details
monthly reportdetails
invoice details
updated custom-er details
Customer OrderDetails
staff details
-
8/12/2019 NDIM
29/64
29
A Level 1 DFDfor the Car Sales scenario is shown in Fig 7 below:
Fig 7 Level 1 DFD for a Car Sales System
Management
Customer
Customer
Management
Customer
AddNew
Customer
1
CustomerD1
CarD2
*
CreateMonthlySalesReport
2
SalesD3
AddNewSale
3
*
Add NewCarDetails
4
*
UpdateCustomer
5
*
CreateCustomer
Invoice
6
SalesD3
CustomerD1
Management
*
Add StaffDetails
7
StaffD4
monthly reportdetails
invoice details
customerdetails
new car details
customerdetails
customerdetails
customerdetails
car details
sales details
car details
customerdetails sales details
car details
updated custom-er details customer
details
updated custom-er details
car details
sales details
customerdetails
Customer Order
Details
staff details
staff details
staff details
-
8/12/2019 NDIM
30/64
30
Level 2 DFDs
Each Level 1 DFD process may contain further internal processes. These are shown on
the Level 2 DFD. The numbering system used in the Level 1 DFD is continued and each
process in the Level 2 DFD is prefixed by the Level 1 DFD number followed by a unique
number for each process i.e. for process 1, sub processes 1.1, 1.2, 1.3 etc see fig 8 below
Fig 8 Level 2 DFD for Level 1 Process Add New Sale
Each of the Level 2 DFDs could also have sub-processes and could be decomposed
further into lower level DFDs i.e. 1.1.1, 1.1.2, 1.1.3 etc
More than 3 levels for a DFD would become unmanageable.
Lowest Level DFDs and Process Specification
Once the DFD has been decomposed into its lowest level, each of the lower level DFDs
can be described using pseudo-code (structured English), flow chart or similar process
specification method that can be used by a programmer to code each process or function.
For example, the Level 2 DFD for the Add New Sale process could be described as being
a process that contains 3 sub-processes, Validate Order, Add Staff to Order and Generate
New Sale. The structured English could be written thus:
Open Customer File
If existing customer
Check Customer Details
Else
Add customer details
Add New Sale3
SalesD3
Customer
CarD2
CustomerD1
*
ValidateOrder
3.1
*
GenerateNewSale
3.2
StaffD4
*
Add staffto
order
3.3
sales details
car details
customer
details
validated orderdetails
car details
Customer OrderDetails
staff detailsvalidatedstaff dets
-
8/12/2019 NDIM
31/64
31
End If
Open Car File
If car available then
Open Sale File
Add customer to sale
Set car to unavailable
Add car to sale
Add staff details
Calculate price
Generate Invoice
Close Sale File
Close Customer FileClose Car File
Inform User of successful sale exit process
Else
Inform User of problem exit process
Close Customer File
Close Car File
End If
The above example is not carved in stone as the analyst may decide to write separate
functions to validate customer and car details and that the Generate New Sale process
could include other sub-processes.
All that matters is that the underlying processing logic solves the problem.
For example, if you look at Figure 8 there is a process named Validate Order, which has a
duel purpose of checking both the customer details (is customer a current customer, if not
add to customer file) and the car details (is car available, if not stop the sale process). A
separate process called Validate Order could be created, but I have written the structured
English to show a logical sequence that shows that, only if the car is available do we begin
the transaction of creating the sale.
I have also assumed that the staff dealing with the sale will know their own details so there
would not be a need for the process named Add Staff to Order.
Like all analysis and design processes, the process of producing DFDs and writing
structured English is an iterative process
-
8/12/2019 NDIM
32/64
32
7. Explain coding techniques and types of codes?
It is required that information must be encoded into signals before it can be transported
across communication media. In more precise words we may say that the waveform
pattern of voltage or current used to represent the 1s and 0s of a digital signal on a
transmission link is called digital to digital line encoding. There are different encoding
schemes available:
Digltal-to-Digltal Encoding
It is the representation of digital information by a digital signal.
There are basically following types of digital to-digital encoding available like: Unipolar Polar Bipolar.
Unipolar
Unipolar encoding uses only one level of value 1 as a positive value and 0 remains Idle.
Since unipolar line encoding has one of its states at 0 Volts, its also called Return to Zero
(RTZ) as shown in Figure. A common example of unipolar line encoding is the 11'L logic
levels used in computers and digital logic.
Unipolar encoding represents DC (Direct Current) component and therefore, ca.'1nottravel
through media such as microwaves or transformers. It has low noise margin and needs
extra hardware for synchronization purposes. It is well suited where the signal path is
short. For long distances, it produces stray capacitance in the transmission medium and
therefore, it never returns to zero as shown in Figure.
-
8/12/2019 NDIM
33/64
33
Polar
Polar encoding uses two levels of voltages say positive and negative. For example, the
RS:232D interface uses Polar line encoding. The signal does not return to zero; it is either
a positive voltage or a negative voltage. Polar encoding may be classified as nonreturn to
zero (NRZ), return to zero (RZ) and biphase. NRZ may be further divided into NRZL and
NRZI. Biphase has also two different categories as Manchester and Differential
Manchester encoding. Polar line encoding is the simplest pattern that eliminates most of
the residua! DC problem. Figure shows the Polar line encoding. It has the same problem of
synchronization as that of unipolar encoding. The added benefit of polar encoding is that it
reduces the power required to transmit the signal by one-half.
Non-Return to Zero (NRZ)
In NRZL, the level of the signal is 1 if the amplitude is positive and 0 in case of negative
amplitude.
In NRZI, whenever a positive amplitude or bit I appears in the signal, the signal gets
inverted,
Figure explains the concepts of NRZ-L and NRZI more precisely.
-
8/12/2019 NDIM
34/64
34
Return to Zero (RZ)
RZ uses three values to represent the signal. These are positive, negative, and zero. Bit 1is
represented when signal changes from positive to zero. Bit 0 is represented when signal
changes from negative to zero. Figure explains the RZ concept.
Biphase
Biphase is implemented in two different ways as Manchester and Differential Manchester
encoding.
In Manchester encoding, transition happens at the middle of each bit period. A low to high
transition represents a 1 and a high to low transition represents a 0.In case of Differential
Manchester encoding, transition occurs at the beginning of a bit time, which represents a
zero.
These encoding can detect errors during transmission because of the transition during
every bit period. Therefore, the absence of a transition would indicate an error condition.
-
8/12/2019 NDIM
35/64
35
They have no DC component and there is always a transition available for synchronizing
receives and transmits clocks.
Bipolar
Bipolar uses three voltage levels. These are positive, negative, and zero. Bit 0 occurs at
zero level of amplitude. Bit 1 occurs alternatively when the voltage level is either positive
or negative and therefore, also called as Alternate Mark Inversion (AMI). There is no DC
component because of the alternate polarity of the pulses for Is. Figure describes bipolar
encoding.
Analog to Digital
Analog to digital encoding is the representation of analog information by a digital signal.
These include PAM (Pulse Amplitude Modulation), and PCM (Pulse Code Modulation).
Digital to Analog
These include ASK (Amplitude Shift Keying), FSK (Frequency Shift Keying), PSK
(Phase Shift Keying), QPSK (Quadrature Phase Shift Keying), are QAM (Quadrature
Amplitude Modulation).
Analog to Analog
These are Amplitude modulation, Frequency modulation and Phase modulation
techniques,
Codecs (Coders and Decoders)
Codec stands for coders/decompression in data communication. The reverse conversion of
analog to digital is necessary in situations where it is advantageous to send analog
information across a digital circuit. Certainly, this is often the case in carrier networks,
where huge volumes of analog voice are digitized and sent across high capacity, digital
circuits. The device that accomplishes the analog to digital conversion is known as a
-
8/12/2019 NDIM
36/64
36
codec. Codecs code an analog input into a digital format on the transmitting side of the
connection, reversing the process, or decoding the information on the receiving side, in
order to reconstitute the analog signal. Codecs are widely used to convert analog voice
and video to digital format, and to reverse the process on the receiving end.
-
8/12/2019 NDIM
37/64
37
8. Explain algorithm with detect error module (eleven code) and module n code with
the help of algorithm and examples
In information theory and coding theory with applications in computer science and
telecommunication, error detection and correctionor error controlare techniques that
enable reliable delivery of digital data over unreliable communication channels. Many
communication channels are subject to channel noise, and thus errors may be introduced
during transmission from the source to a receiver. Error detection techniques allow
detecting such errors, while error correction enables reconstruction of the original data.
Error correction may generally be realized in two different ways:
Automatic repeat request (ARQ) (sometimes also referred to as backward errorcorrection): This is an error control technique whereby an error detection scheme is
combined with requests for retransmission of erroneous data. Every block of datareceived is checked using the error detection code used, and if the check fails,
retransmission of the data is requested this may be done repeatedly, until the data
can be verified.
Forward error correction (FEC): The sender encodes the data using an error-correcting code (ECC) prior to transmission. The additional information
(redundancy) added by the code is used by the receiver to recover the original data.
In general, the reconstructed data is what is deemed the "most likely" original data.
ARQ and FEC may be combined, such that minor errors are corrected without
retransmission, and major errors are corrected via a request for retransmission: this is
called hybrid automatic repeat-request (HARQ).
Error detection is most commonly realized using a suitable hash function (or checksum
algorithm). A hash function adds a fixed-length tag to a message, which enables receivers
to verify the delivered message by recomputing the tag and comparing it with the one
provided.
There exists a vast variety of different hash function designs. However, some are of
particularly widespread use because of either their simplicity or their suitability for
detecting certain kinds of errors (e.g., the cyclic redundancy check's performance in
detecting burst errors).
Random-error-correcting codes based on minimum distance coding can provide a suitable
alternative to hash functions when a strict guarantee on the minimum number of errors to
be detected is desired. Repetition codes, described below, are special cases of error-
-
8/12/2019 NDIM
38/64
38
correcting codes: although rather inefficient, they find applications for both error
correction and detection due to their simplicity.
Repetition codes
A repetition code is a coding scheme that repeats the bits across a channel to achieve
error-free communication. Given a stream of data to be transmitted, the data is divided
into blocks of bits. Each block is transmitted some predetermined number of times. For
example, to send the bit pattern "1011", the four-bit block can be repeated three times, thus
producing "1011 1011 1011". However, if this twelve-bit pattern was received as "1010
1011 1011" where the first block is unlike the other two it can be determined that an
error has occurred.
Repetition codes are very inefficient, and can be susceptible to problems if the error occurs
in exactly the same place for each group (e.g., "1010 1010 1010" in the previous examplewould be detected as correct). The advantage of repetition codes is that they are extremely
simple, and are in fact used in some transmissions of numbers stations.
Parity bits
A parity bit is a bit that is added to a group of source bits to ensure that the number of set
bits (i.e., bits with value 1) in the outcome is even or odd. It is a very simple scheme that
can be used to detect single or any other odd number (i.e., three, five, etc.) of errors in the
output. An even number of flipped bits will make the parity bit appear correct even though
the data is erroneous.
Extensions and variations on the parity bit mechanism are horizontal redundancy checks,
vertical redundancy checks, and "double," "dual," or "diagonal" parity (used in RAID-DP).
Checksums
A checksum of a message is a modular arithmetic sum of message code words of a fixed
word length (e.g., byte values). The sum may be negated by means of a ones'-complement
operation prior to transmission to detect errors resulting in all-zero messages.
Checksum schemes include parity bits, check digits, and longitudinal redundancy checks.
Some checksum schemes, such as the Damm algorithm, the Luhn algorithm, and the
Verhoeff algorithm, are specifically designed to detect errors commonly introduced by
humans in writing down or remembering identification numbers.
Cyclic redundancy checks (CRCs)
A cyclic redundancy check (CRC) is a single-burst-error-detecting cyclic code and non-
secure hash function designed to detect accidental changes to digital data in computer
-
8/12/2019 NDIM
39/64
39
networks. It is not suitable for detecting maliciously introduced errors. It is characterized
by specification of a so-called generator polynomial, which is used as the divisor in a
polynomial long division over a finite field, taking the input data as the dividend, and
where the remainder becomes the result.
Cyclic codes have favorable properties in that they are well suited for detecting burst
errors. CRCs are particularly easy to implement in hardware, and are therefore commonly
used in digital networks and storage devices such as hard disk drives.
Even parity is a special case of a cyclic redundancy check, where the single-bit CRC is
generated by the divisor x + 1.
Cryptographic hash functions
The output of a cryptographic hash function, also known as a message digest, can provide
strong assurances about data integrity, whether changes of the data are accidental (e.g.,due to transmission errors) or maliciously introduced. Any modification to the data will
likely be detected through a mismatching hash value. Furthermore, given some hash value,
it is infeasible to find some input data (other than the one given) that will yield the same
hash value. If an attacker can change not only the message but also the hash value, then a
keyed hash or message authentication code (MAC) can be used for additional security.
Without knowing the key, it is infeasible for the attacker to calculate the correct keyed
hash value for a modified message.
Error-correcting codes
Any error-correcting code can be used for error detection. A code with minimum
Hamming distance, d, can detect up to d 1 errors in a code word. Using minimum-
distance-based error-correcting codes for error detection can be suitable if a strict limit on
the minimum number of errors to be detected is desired.
Codes with minimum Hamming distance d = 2 are degenerate cases of error-correcting
codes, and can be used to detect single errors. The parity bit is an example of a single-
error-detecting code.
In digital data transmission, error occurs due to noise. The probability of error or bit error
rate depends on the signal to noise ratio, the modulation type and the method of
demodulation.
-
8/12/2019 NDIM
40/64
40
The bit error rate p, may be expressed in terms of
= NNelforbitsN
bitsNinerrorsofno(arg )
For example, if p=0.1 we would expect on average there would be 1 error in every 10 bits.
A p=0.1 actually stating that every bit has a 1/10th
probability of being in error.
Depending on the type of system and many factors, error rates typically range from 10-1
to
10-5
or better.
Information transfer via digital system is usually packaged into a structure (a block of bits)
called a message block or frame. A typical message block contains the following:
Synchronization pattern to mark the start of message block
Destination and sometimes source addressesSystem control/ commands
Information
Error control coding check bits
The total number of the bits in the block may vary widely ( from say 32 bits to several
hundreds bits) depending on the requirement.
Clearly, if the bits are subjected to an error rate p, there is some probability that a message
block will be received with 1 or more bits in error. In order to counteract the effects of
errors, error control coding techniques are used to either:
a) detect errors error detectionb) correct error error detection and correction
Broadly, there are two types of error control codes:
a) Block Codes Parity codes Array codes
Repetition codes
Cyclic codes etc
b) Convolutional Codes
-
8/12/2019 NDIM
41/64
41
BLOCK CODES
A block code is a coding technique which generates C check bits for M message bits to
give a stand alone block of M+C= N bits.
The sync bits are usually not included in the error control coding because message
synchronization must be achieved before the message and check bits can be processed.
The code rate is given by
Rate = N
M
CM
M=+
Where, M = number of message bits
C = number of check bits
N= M+C= total number of bits.
The code rate is the measure of the proportion of free user assigned bits (M) to the total
bits in the blocks (N).
For example,
i) A single parity bit (C=1 bit) applied to a block of 7 bits give a code rateR =
8
7
17
7=
+
ii) A (7,4) Cyclic code has N=7, M=4
Code rate R =74
-
8/12/2019 NDIM
42/64
42
iii) A repetition-m code in which each bit or message is transmitted m times
and the receiver carries out a majority vote on each bit has a code rate
Rate =mmM
M 1=
DETECTION AND CORRECTION
Consider message transferred from a Source to a Destination, and assume that the
Destination is able to check the received messages and detect errors.
If no errors are detected, the Destination will accept the messages.
If errors are detected, there are two forms of error corrections.
a) Automatic Retransmission Request (ARQ)In ARQ system, the destination send an acknowledgment ACK message back to the
source if the errors are not detected, and a Not-ACK (NAK) message back to the source if
errors are detected.
If the source receives an ACK to a message it will send the next message. If the source
receives a NAK it repeats the same message. This process repeat until all the messages is
accepted by the destination.
-
8/12/2019 NDIM
43/64
43
b) Forward Error Correction (FEC)
The error control code may be powerful enough to allow the destination to attempt to
correct the errors by further processing. This is called Forward Error Correction, no
ACKs or NAKs are required.
Many systems are hybrid in that they use both ARQ (ACK/NAK) and FEC strategies for
error correction.
Successful, False & Lost Message Transfer
The process of checking the received messages for errors gives two possible outcomes:
a) Errors not detected messages acceptedb) Errors detected messages re rejected
An error not detected does not mean that errors are not present. Error control codes cannot
detect every possible error or combinations of errors. However, if error are not detected
the destination has not alternative but to accept the message, true or false. That is, we may
conclude if errors are not detected either
-
8/12/2019 NDIM
44/64
44
a) that there were no errors, i.e. the messages accepted are true or in other wordssuccessful message transfer.
b) that there were undetected errors, i.e. the messages accepted was false or in otherwords a false message transfer.
If errors are detected, the destination does not accept the message and may either request a
re-transmission (ARQ-system) or process the block further in an attempt to correct the
error (FEC).
In processing the block of error correction, again there are two possible outcomes.
a) the processor may get it right, i.e. correct the error and give a successful messagetransfer.
b) the processor may get it wrong, i.e. not correct the errors in which case there is afalse message transfer.
Some codes have a range of ability to detect and correct errors. For example a code may
be able to detect and correct 1 error (single bit error) and detect 2,3 and 4 bits in error, but
not correct them. Thus even with FEC, some messages may still be rejected and we think
of these as lost messages. These ideas are illustrated below:
-
8/12/2019 NDIM
45/64
45
MESSAGE TRANSFERS
Consider message transfer between two computers e.g. it is required to transfer the
contents of Computer A to Computer B.
COMPUTER A COMPUTER B
As discussed, of the messages transferred to the Computer B, some may be rejected (lost)
and some will be accepted, and will be either true (successful transfer) or false.
Obviously the requirement is for a high probability of successful transfer (ideally = 1), low
probability of false transfer (ideally = 0) and a low probability of lost messages. In
particular the false rate should be kept low, even at the expense of an increased lost
message rate.
Note in some messages there may be in-built redundancy for example in text
message
REPAUT FOR WEDLESDAY (REPORT FOR WEDNESDAY)
However if this is followed by
10 JUNE we would ?? 10
Other example where there is little or no redundancy are Car registration numbers,
Accounts etc, generally numeric or unstructured alpha-numeric information.
There is thus a need for a low false rate appropriate to the function of the system and it is
important for the information in Computer B to be correct even if it takes a long time to
transfer.
Error control coding may be considered further in two main ways.
In terms of System Performance i.e. the probabilities of successful, false and lostmessage transfer. In this case we only need to know what the code for error detection /
correction can do in terms of its ability to detect and correct errors (depends on
hamming distance).
-
8/12/2019 NDIM
46/64
46
In terms of the Error Control Codeitself i.e. the structure, operation, characteristicsand implementation of various types of codes.
SYSTEM PERFORMANCE
In order to determine system performance in terms of successful, false and lost message
transfers it is necessary to know:
1) the probability of error or b.e.r p.2) the no. of bits in the message block N3) the ability of the code to detect/ correct errors, usually expressed as a minimum
Hamming distance, dmin for the code.
Since the b.e.r, p, and the number of bits in the block, N we can apply the equation below
( )
( ) RNR ppRRN
NR
= 1
!!
!)( 0! =1, 1! =1
This gives the probability of R errors in an N bit block subject to a bit error rate p.
Hence, for an N bit block we can determine the probability of no errors in the block (R=0)
i.e. an error free block
( ) ( ) NN ppp
N
N)1(1
!0!0
!)0(
00=
=
the probability of 1 error in the block (R=1)
( ) ( ) 111 )1(1
!1!1
!)1(
=
=
NN ppNppN
N
the probability of 2 error in the block (R=2)
( ) ( ) 22 1
!2!2
!)2(
=
Npp
N
N
R=3, R=4 etc. P(3), P(4), P(5) ,..P(N).
MINIMUM HAMMING DISTANCE
The minimum hamming distance of an error control code, is a parameter which indicates
the worst case ability of the code to detect/correct errors. In general, codes will perform
better than indicated by the minimum Hamming distance.
-
8/12/2019 NDIM
47/64
47
Let dmin= minimum Hamming distance
l = number of bits errors detected
t = number of bit errors corrected
It may be shown that
dmin = l + t + 1 with t l
For a given dmin , there are a range of (worst case) options from just error detection to error
detection/ correction.
For example, suppose a code has a dmin= 6.
Since, dmin = l + t + 1
We have as options
1) 6= 5 + 0 + 1 {detect up to 5 errors , no correction}
2) 6= 4 + 1 + 1 {detect up to 4 errors , correct 1 error}3) 6= 3 + 2 + 1 {detect up to 3 errors , correct 2 error}
After this, t>l, i.e. cannot go further, since we cannot correct more errors than can be
detected.
In option 1), up to 5 errors can be detected i.e. 1,2,3,4 or 5 errors detected, but there is no
error correction.
In option 2), up to 4 errors can be detected i.e. 1,2,3,4 errors detected, and 1 error can be
corrected.
In option 3), up to 3 errors can be detected i.e. 1,2,3 errors detected, and 1 and 2 errors can
be corrected.
Hence a given code can give several decoding, error detection/correction options at the
receiver. In an ARQ system with no FEC, we would implement option 1, i.e detect as
many errors as possible.
If FEC were to be used, we might choose option 3 which allows 1 and 2 errors in a block
to be detected and corrected, 3 errors can be detected but not corrected and these messages
could be rejected and recovered by ARQ.
For option 3 for example, if 4 or more errors occurred, these would not be detected and
these messages would be accepted but would be false messages.
Fortunately, the higher the no. of errors, the less the probability they will occur for
reasonable values of p.
-
8/12/2019 NDIM
48/64
48
From the above, we may conclude that:
Messages transfers are successful if no errors occurs or if t errors occurs which are
corrected.
i.e. Probability of Success = =
+
t
i
ipp1
)()0(
Messages transfers are lost if up to l errors are detected which are not corrected, i.e
Probability of lost = p(t+1) + p(t+2)+ . P(l)
= +=
l
ti
ip1
)(
Message transfers are false of l+1 or more errors occurs
Probability of false = p(l+1) + p(l+2)+ . P(N)
= +=
N
li
ip1
)(
Example
Using dmin= 6, option 3, (t=1, l =4)
Probability of Successful transfer = p(0) + p(1)
Probability of lost messages = p(2) + p(3) + p(4)
Probability of false messages = p(5) + p(6)+ .+ p(N).
-
8/12/2019 NDIM
49/64
49
9. Explain back-up-plans
In information technology, a backup, or the process of backing up, refers to the copying
and archiving of computer data so it may be used to restore the original after a data loss
event. The verb form is to back upin two words, whereas the noun is backup.
Backups have two distinct purposes. The primary purpose is to recover data after its loss,
be it by data deletion or corruption. Data loss can be a common experience of computer
users. A 2008 survey found that 66% of respondents had lost files on their home PC. The
secondary purpose of backups is to recover data from an earlier time, according to a user-
defined data retention policy, typically configured within a backup application for how
long copies of data are required. Though backups popularly represent a simple form of
disaster recovery, and should be part of a disaster recovery plan, by themselves, backups
should not alone be considered disaster recovery. One reason for this is that not all backupsystems or backup applications are able to reconstitute a computer system or other
complex configurations such as a computer cluster, active directory servers, or a database
server, by restoring only data from a backup.
Since a backup system contains at least one copy of all data worth saving, the data storage
requirements can be significant. Organizing this storage space and managing the backup
process can be a complicated undertaking. A data repository model can be used to provide
structure to the storage. Nowadays, there are many different types of data storage devices
that are useful for making backups. There are also many different ways in which these
devices can be arranged to provide geographic redundancy, data security, and portability.
Before data is sent to its storage location, it is selected, extracted, and manipulated. Many
different techniques have been developed to optimize the backup procedure. These include
optimizations for dealing with open files and live data sources as well as compression,
encryption, and de-duplication, among others. Every backup scheme should include dry
runs that validate the reliability of the data being backed up. It is important to recognize
the limitations and human factors involved in any backup scheme.
Because data is the heart of the enterprise, it's crucial for you to protect it. And to protect
your organization's data, you need to implement a data backup and recovery plan. Backing
up files can protect against accidental loss of user data, database corruption, hardware
failures, and even natural disasters. It's your job as an administrator to make sure that
backups are performed and that backup tapes are stored in a secure location.
-
8/12/2019 NDIM
50/64
50
Creating a Backup and Recovery Plan
Data backup is an insurance plan. Important files are accidentally deleted all the time.
Mission-critical data can become corrupt. Natural disasters can leave your office in ruin.
With a solid backup and recovery plan, you can recover from any of these. Without one,
you're left with nothing to fall back on.
Figuring Out a Backup Plan
It takes time to create and implement a backup and recovery plan. You'll need to figure out
what data needs to be backed up, how often the data should be backed up, and more. To
help you create a plan, consider the following:
How important is the data on your systems?The importance of data can go along way in helping you determine if you need to back it upas well as when and
how it should be backed up. For critical data, such as a database, you'll want tohave redundant backup sets that extend back for several backup periods. For less
important data, such as daily user files, you won't need such an elaborate backup
plan, but you'll need to back up the data regularly and ensure that the data can be
recovered easily.
What type of information does the data contain? Data that doesn't seemimportant to you may be very important to someone else. Thus, the type of
information the data contains can help you determine if you need to back up the
dataas well as when and how the data should be backed up.
How often does the data change? The frequency of change can affect yourdecision on how often the data should be backed up. For example, data that
changes daily should be backed up daily.
How quickly do you need to recover the data? Time is an important factor increating a backup plan. For critical systems, you may need to get back online
swiftly. To do this, you may need to alter your backup plan.
Do you have the equipment to perform backups? You must have backuphardware to perform backups. To perform timely backups, you may need several
backup devices and several sets of backup media. Backup hardware includes tape
drives, optical drives, and removable disk drives. Generally, tape drives are less
expensive but slower than other types of drives.
Who will be responsible for the backup and recovery plan? Ideally, someoneshould be a primary contact for the organization's backup and recovery plan. This
-
8/12/2019 NDIM
51/64
51
person may also be responsible for performing the actual backup and recovery of
data.
What is the best time to schedule backups? Scheduling backups when systemuse is as low as possible will speed the backup process. However, you can't always
schedule backups for off-peak hours. So you'll need to carefully plan when key
system data is backed up.
Do you need to store backups off-site?Storing copies of backup tapes off-site isessential to recovering your systems in the case of a natural disaster. In your off-
site storage location, you should also include copies of the software you may need
to install to reestablish operational systems.
The Basic Types of Backup
There are many techniques for backing up files. The techniques you use will depend onthe type of data you're backing up, how convenient you want the recovery process to be,
and more.
If you view the properties of a file or directory in Windows Explorer, you'll note an
attribute called Archive. This attribute often is used to determine whether a file or
directory should be backed up. If the attribute is on, the file or directory may need to be
backed up. The basic types of backups you can perform include
Normal/full backupsAll files that have been selected are backed up, regardless ofthe setting of the archive attribute. When a file is backed up, the archive attribute is
cleared. If the file is later modified, this attribute is set, which indicates that the file
needs to be backed up.
Copy backups All files that have been selected are backed up, regardless of thesetting of the archive attribute. Unlike a normal backup, the archive attribute on
files isn't modified. This allows you to perform other types of backups on the files
at a later date.
Differential backupsDesigned to create backup copies of files that have changedsince the last normal backup. The presence of the archive attribute indicates that
the file has been modified and only files with this attribute are backed up.
However, the archive attribute on files isn't modified. This allows you to perform
other types of backups on the files at a later date.
Incremental backupsDesigned to create backups of files that have changed sincethe most recent normal or incremental backup. The presence of the archive
-
8/12/2019 NDIM
52/64
52
attribute indicates that the file has been modified and only files with this attribute
are backed up. When a file is backed up, the archive attribute is cleared. If the file
is later modified, this attribute is set, which indicates that the file needs to be
backed up.
Daily backupsDesigned to back up files using the modification date on the fileitself. If a file has been modified on the same day as the backup, the file will be
backed up. This technique doesn't change the archive attributes of files.
In your backup plan you'll probably want to perform full backups on a weekly basis and
supplement this with daily, differential, or incremental backups. You may also want to
create an extended backup set for monthly and quarterly backups that includes additional
files that aren't being backed up regularly.
Tip You'll often find that weeks or months can go by before anyone notices that a file ordata source is missing. This doesn't mean the file isn't important. Although some types of
data aren't used often, they're still needed. So don't forget that you may also want to create
extra sets of backups for monthly or quarterly periods, or both, to ensure that you can
recover historical data over time.
Differential and Incremental Backups
The difference between differential and incremental backups is extremely important. To
understand the distinction between them, examine Table 1. As it shows, with differential
backups you back up all the files that have changed since the last full backup (which
means that the size of the differential backup grows over time). With incremental backups,
you only back up files that have changed since the most recent full or incremental backup
(which means the size of the incremental backup is usually much smaller than a full
backup).
Table -1 Incremental and Differential Backup Techniques
Day of
Week
Weekly Full Backup with Daily
Differential Backup
Weekly Full Backup with Daily
Incremental Backup
Sunday A full backup is performed. A full backup is performed.
MondayA differential backup contains all
changes since Sunday.
An incremental backup contains
changes since Sunday.
TuesdayA differential backup contains all
changes since Sunday.
An incremental backup contains
changes since Monday.
-
8/12/2019 NDIM
53/64
53
WednesdayA differential backup contains all
changes since Sunday.
An incremental backup contains
changes since Tuesday.
ThursdayA differential backup contains all
changes since Sunday.
An incremental backup contains
changes since Wednesday.
FridayA differential backup contains all
changes since Sunday.
An incremental backup contains
changes since Thursday.
SaturdayA differential backup contains all
changes since Sunday.
An incremental backup contains
changes since Friday.
Once you determine what data you're going to back up and how often, you can select
backup devices and media that support these choices. These are covered in the next
section.
Selecting Backup Devices and Media
Many tools are available for backing up data. Some are fast and expensive. Others are
slow but very reliable. The backup solution that's right for your organization depends on
many factors, including
CapacityThe amount of data that you need to back up on a routine basis. Can thebackup hardware support the required load given your time and resource
constraints?
ReliabilityThe reliability of the backup hardware and media. Can you afford tosacrifice reliability to meet budget or time needs?
ExtensibilityThe extensibility of the backup solution. Will this solution meet yourneeds as the organization grows?
SpeedThe speed with which data can be backed up and recovered. Can you affordto sacrifice speed to reduce costs?
CostThe cost of the backup solution. Does it fit into your budget?Common Backup Solutions
Capacity, reliability, extensibility, speed, and cost are the issues driving your backup plan.
If you understand how these issues affect your organization, you'll be on track to select an
appropriate backup solution. Some of the most commonly used backup solutions include
Tape drives Tape drives are the most common backup devices. Tape drives usemagnetic tape cartridges to store data. Magnetic tapes are relatively inexpensive
bu