object database system-part2

Object Database SystemBy Sudarshan

MCA Sem V

Objects and OIDs Data objects can be given an object identifier (OID), which is some value that is unique in the database across time. An object identifier (OID) is a persistent handle or name for a particular object. DBMS is responsible for generating OIDs and ensuring that an OID identifies an object uniquely over its entire lifetime. Generally, OIDs are 32 or 64 bit integers that are managed by the DBMS Some systems have all tuples stored in a table as objects and are automatically assigned unique OID. Some systems have a facility given to user to specify the tables for which the tuples are to be assigned OIDs.

Objects and OIDs An objects OID cab be used to refer to it from elsewhere in the data. An OID has a type similar to the type of a pointer in a programming language. In SQL:1999 every tuple can be given an OID by defining the table in terms of a structured type and declaring that a REF type is associated with it Eg. Theaters table

REF types have values that are unique identifiers or OIDs. SQL:1999 requires that a given REF type must be associated with a table.

Notions of Equality Two objects having the same type are defined to be deep equal iff The objects are of atomic type and have same value The objects are of reference type and the deep equals operator is true for the two referenced objects, The objects are of structured type and the deep equals operator is true for all the corresponding subparts of the two objects.

Two objects having the same type are defined to be shallow equal if both refer to the same object

Equality Example ROW(538, t89, 6-3-97, 8-7-97) ROW(538, t33, 6-3-97, 8-7-97) Shallow Equals => false Deep Equals if t89 and t33 refers to objects of type theater_t that have same value.

Dereferencing Reference Type To access the referenced basetype item, a built-in deref() method is provided along with REF type constructor. Nowshowing.deref(theater).name

SQL:1999 uses a Java-style arrow operator, combined with dot operator to access the referenced item. Nowshowing.theater->name

URL and OID OID are uniquely identify a single object over all time. Web resource pointed at by an URL can change over time. OID are simple identifiers and carry no physical information about the objects they identify. URL include network addresses and often filesystem names. OID are automatically generated by the system. URL are specified by the users. Deletion of OID can be checked by including REFERENCES ARE CHECKED as a part of the SCOPE clause and choose one of the actions

Database Design for ORDBMS Example Several space probes continuously records a video. A single video stream is associated with each probe, and while this stream was collected over a certain time period, we assume that it is now a complete object associated with the probe. During the time period over which the video was collected, the probes location was periodically recorded. The information associated with a probe has three parts - a probe ID

RDBMS DesignProbes(pid: integer, time: timestamp, lat: real, long: real, camera: string, video: BLOB) Have different time, lat, and long values. Have same pid, camera, and video values. Functional Dependency: PTLN CV P CV Needs to be decomposed: Probes_Loc(pid: integer, time: timestamp, lat: real, long: real) Probes_Video(pid: integer, camera: string, video: BLOB)

Drawbacks of RDBMS Design Representing video as BLOB needs the application code to be written in an external language to manipulate a video object in the database. For probe 10, display the video recorded between 1:00 pm 1:10 pm on Sept 22 2006. Entire video object associated with probe 10, recorded over several hours need to be retrieved to display a segment recorded over 10 minutes.

Each probe has an associated sequence of location readings is hidden Sequence information is spread across several tuples. Some queries will require a join.

ORDBMS Design Store video as an ADT object and write methods to manipulate it. Structured type can be used to store location sequence. Probes _AllInfo(pid: integer, locseq: location_seq, camera:string, video: mpeg_stream) ADT: mpeg_stream, with a method display() that takes a start time and end time and displays the portion of the video recorded at

Queries

SELECT display(P.video, 1:00 PM Sept 22 2006, 1:10 PM Sept 22 2006) FROM Probes_AllInfo P WHERE P.pid = 10

Structured TypeStructured Type: location_seq, defined as a list type containing a list of ROW type objects. CREATE TYPE location_seq listof (ROW(time: timestamp, lat: real, long: real)) Query to find the earliest time at which the given probe was recorded. SELECT From P.pid, MIN(P.locseq.time) Probes_AllInfo P

Difference between structured and reference type. my_theater tuple(tno integer, name text, address text, phone text) theater REF(theater_t)

Deletion Objects with reference can be affected by the deletion of objects that they reference eg. Deletion of Theaters table

Reference free structured objects are not affected by deletion if other objects.

Difference between structured and reference type Update Objects of reference types change value if the referenced object is updated. Objects of reference free structured types change value only if updated directly.

Sharing and copying An identified object can be referenced by multiple reference-type items. Thus each update is reflected in many places. An reference free structured types requires updating all the copies of an object.

Difference between structured and reference type Storage Overhead Multiple copies of large values in structured type objects require much more storage. This affects disk usage and buffer management (if multiple copies are accessed at once)

Clustering The subparts of a structured object are typically stored together on disk. Objects with reference may point to other objects that are far away in the disk, thus requiring significant movement of the disk arm.

OID vs Foreign Key An OID can point to an object that is stored anywhere in the database, even in the field. Foreign key reference is constrained to point to an object in a particular referenced relation. Referential Integrity can be a problem for OID. An object is deleted while there are still oidpointers to it.

EER

pid

cameraDisplay(start,end)

Listof(row(time, lat, long))

video Probes

Using Nested CollectionsCan_Teach1(cid: integer, teachers: setof(ssn: string), sal: integer) course cid can be thought by the team of teachers, at a combined cost of sal. Can_teach2 (cid: integer, teacher_ssn: string, sal: integer) course cid can be thought by any of the teachers in the teachers field, at a cost sal.

ORDBMS Implementation Challenges - 1Storage and Access Methods Storing Large ADT and Structured Type Objects ADT Large in size (larger than the single disk page) Stored in different location on disk from the tuple and disk based pointers are maintained.

Structured Type Often vary in size. Can grow arbitrarily and hence requires flexible disk layout mechanisms. Array type items needs to be stored in sequence. But queries may request subarrays that are not stored continuously, thus requiring high I/O requests. To reduce I/O access, arrays are often broken in chunks and then stored in some order on disk.

ORDBMS Implementation Challenges - 1 Indexing New Types Efficient access can be incorporated by using index. RDBMS supports only equality and range conditions for the indexing support ORDBMS requires efficient indexes for ADT methods and operators for structured types. One way to make the set of index structures extensible is to publish an access method interface that lets users implement an index structure outside the DBMS.

ORDBMS Implementation Challenges - 1 Indexing New Types (contd..) An alternative is to provide a generic template index structure. The Generalized Search Tree (GiST) is such s structure. It is a template index structure based on B+ trees, which allows most of the tree index structures to be implemented with only a few lines of user defined ADT code.

ORDBMS Implementation Challenges - 2 Query Processing User defined Aggregation function New aggregate functions to be defined. To register new aggregate function, a user must implement three methods Initialize Iterate Terminate

ORDBMS Implementation Challenges - 2 Method Security ADT gives the power to add code to the DBMS DBMS must have mechanisms to prevent buggy or malicious code from causing problem. User methods can be interpreted rather than complied. Allow complied methods but run those methods in a different address space that the DBMS. (Use of IPC)

ORDBMS Implementation Challenges - 2 Method caching ADT methods can be expensive to execute and can account for bulk of time spent on query execution. During query processing it may make sense to cache the results, in case they can be used again. Within the scope of a single query, one can avoid calling a method twice on duplicate values in column by either sorting the table or using hashing techniques. An alternative is to maintain a cache of

ORDBMS Implementation Challenges - 2 Pointer Swizzling In some applications, objects are retrieved into memory and accessed frequently through their oids. Some system maintain the table of oid that are currently in memory. When an object O is brought into memory, they check each oid contained in O and replace oids of in-memory objects by in-memory pointers to those objects. This technique is called Pointer Swizzling and makes references faster. Caution: if an object is paged out, in-memory reference to it must be invalidated and replaced with oid.

ORDBMS Implementation Challenges - 3Query optimization Registering Indexes with the optimizer The optimizer must be informed about the new index structures. The optimizer must know What WHERE-clause conditions are matched by that index What is the cost of fetching a tuple for that index

Optimizer can use any index structure in constructing a query plan.

Reduction factor and cost estimation for ADT methods For user defined conditions such as is_sunrise(), the optimizer needs to estimate reduction factor. Users who register the method can also specify the methods cost as a number, typically in units of the

ORDBMS Implementation Challenges - 3 Expensive selection optimization. RDBMS considers selection as zero time operation. Large objects needs access time and processing them in memory is complicated. ORDBMS optimizers must consider carefully how to order selection conditions.Frames.frameno

object database system-part2

Documents