physical and logical structure

53
Physical and Logical Structure SNU IDB Lab.

Upload: gayora

Post on 23-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Physical and Logical Structure. SNU IDB Lab. XML Documents 1 : structure. Peeping into XML document at Physical view : Entity at logical view : DTD. Peeping into XML document(1/5). Hello, XML!! . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Physical and Logical Structure

Physical and Logical Struc-ture

SNU IDB Lab.

Page 2: Physical and Logical Structure

2

XML Documents 1 : structure Peeping into XML document at Physical view : Entity at logical view : DTD

Page 3: Physical and Logical Structure

3

Peeping into XML document(1/5)

<?xml version=“1.0” standalone=“yes”?>

<GREETING> Hello, XML!! <!--this is greeting--></GREETING>

Mark-updata

Mark-up and character data

Page 4: Physical and Logical Structure

4

Peeping into XML document(2/5)

<? xml version=“1.0” standalone=“yes” ?>

<!DOCUMENT DATE [ <!ELEMENT DATE (#PCDATA)>] >

<DATE> 001224</DATE>

XML document : date.xml

XML declarationxml 문서임을 선언 .<? 로 시작하여 ?> 로 끝난다 .

DTD(Document Type Defini-tion)user 가 사용할 tag 를 정의한다 .여기서는 DATE tag 를 정의 .

Content

<!--This is date --> Comment : parser 는 이를 무시 .

Page 5: Physical and Logical Structure

5

Peeping into XML document(3/5) Structure of XML document

– physical structure : allows components of the document, called entities

– logical structure : allows a document to be divided into named units and sub-units,

called elements

Page 6: Physical and Logical Structure

Sub-unit

Unit

Document

elements

Logical Structure

entities(internal)(separate)

Physical Structure

5

Peeping into XML document(4/5)

Page 7: Physical and Logical Structure

7

Peeping into XML document(5/5)

<person><name> kim </name> <ID>771224</ID>

<office>301-453</office><phone>1830</phone>

<photo source=“k.jpg”/>

</person>

<person><name> kim </name> <ID>771224</ID>

<office>301-453</office><phone>1830</phone>

<photo source= />

</person>“k.jpg”

element

entity

Page 8: Physical and Logical Structure

8

XML Documents 1 : structure Peeping into XML document at Physical view : Entity at logical view : DTD

Page 9: Physical and Logical Structure

9

Content of Physical structure Entity Figures of Document Entity Defining an entity Grammar in Declaring Entity Examples of EntityDeclaration URL format

Page 10: Physical and Logical Structure

Entity (1/3)– unit of physically isolating and storing any part of a docu-

ment ( 정보저장단위 )– Each unit of information is called an entity

entities(internal)(separate)

Physical Structure<person><name> kim </name> <ID>771224</ID>

<office>301-453</office><phone>1830</phone>

<photo source= /> </person>

“k.jpg”

entity

SNUOOPSLA Lab.

Page 11: Physical and Logical Structure

11

Entity (2/3) Purpose of Entity

– contain all the information– (well-formed XML data , other text file, binary data…)

<person><name> kim </name> <ID>771224</ID>

<office>301-453</office><phone>1830</phone>

<photo source= /> </person>

“k.jpg”

Document entity

Image entity

Page 12: Physical and Logical Structure

12

Entity (3/3) Internal Entity

– 해당 document 안에서 완전하게 정의되는 entity

External Entity– URL 을 통해 알려진 외부의 source 로부터 그들의 content 를 받아

오는 entity

Page 13: Physical and Logical Structure

13

Figures of Document Entity

document entity(no entities)

document entity(main content)

A

A

B

C

D

document entity(framework file)

Page 14: Physical and Logical Structure

14

Defining an entity Entity must be defined before the first reference to

them in the data stream Declared in the DTD(Document Type Definition)

<!DOCTYPE DOCUMENT [

<!ENTITY EMAIL “[email protected]”> <!ENTITY TEXT “(#PCDATA)”>

]>

Entity definition in DTD

Page 15: Physical and Logical Structure

15

Example : EntityDeclaration(1/3) Internal text entities

– <!ENTITY XML “eXtensible Markup Language”>– <!ENTITY DemoEntity ‘The rule is 6” long.’>

Built-in entities ( 내장 entity)– <!ENTITY sample “Use &quot; and ‘as delimiters.”>

&li; &gt;&amp;&apos;&quot;

for ‘<‘for ‘>’for ‘&’for ‘ ’ ’for ‘ ” ’;

Page 16: Physical and Logical Structure

16

Example : EntityDeclaration(2/3) External text entities

– <!ENTITY myent SYSTEM “/EMTS/MYENT.XML”>– <!ENTITY myent PUBLIC “-//MyCorp//ENTITY Syperscript

Chars//EN”….>

Binary entities– <!ENTITY Jsphoto SYSTEM “/ENTS/Jsphoto.tif” NDATA “TIFF”>

Page 17: Physical and Logical Structure

Example : EntityDeclaration(3/3) URL format

<!ENTITY ent9 SYSTEM “en-tities/entity9.xml”> /xml/document.xml/enti-ties/entity9.xml

<!ENTITY ent9 SYSTEM “../entities/entity9.xml”>

/xml/docs/document.xml/ entities/en-tity9.xml

xml

document.xml entitiesentity9.xml

xml

entitiesentity9.xml

docsdocument.xml

Page 18: Physical and Logical Structure

18

XML Documents 1 : structure Peeping into XML document at Physical view : Entity at logical view : DTD

Page 19: Physical and Logical Structure

19

Content of Logical structure Concepts DTD Structure Element Declaration Attribute Declarations Parameter Entities Conditional Sections Notation Declarations DTD Processing Issues

Page 20: Physical and Logical Structure

20

Concepts of DTD(1/3) DTD(Document Type Definition)

– An optional but powerful feature of XML– Comprises a set of declarations that define a document

structure tree– XML processors read the DTD and check whether the docu-

ment is valid and use it to build the document model in memory

– Describes user’s own tag set as meta markup language

Page 21: Physical and Logical Structure

21

Concepts of DTD(2/3) DTD describes..

– Element , attribute , notation , relation between each ele-ments

Establishes formal document structure rules

Page 22: Physical and Logical Structure

22

Concepts of DTD(3/3) Declare Vs. Define

– Declare “This document is a concert poster”– Define “A concert poster must have the following features”

DTD define– Element type + Attribute + Entities

Valid Vs. Invalid– Valid conforms to DTD– Invalid fail to conform to DTD Well formed

XML Document

Valid XML Document

Page 23: Physical and Logical Structure

23

Valid & Invalid Documents

– Valid:– <GREETING>– various random text but no markup– </GREETING>– Invalid: anything else including– <GREETING>– <sometag>various random text</sometag>– <someEmptyTag/>– <GREETING> Example:

<!DOCTYPE GREETING[<ELEMENT GREETING (#PCDATA)>]>

Page 24: Physical and Logical Structure

24

DTD structure DTD is composed of a number of declarations

– ELEMENT (tag definition)– ATTLIST (attribute definitions)– ENTITY (entity definition)– NOTATION(data type notation definition)

DTD can be stored in an external subset or an inter-nal subset

Page 25: Physical and Logical Structure

25

Internal and External Subset(1/3) Internal subset

– Form : – <!DOCTYPE … [– <!-- Internal Subset --> – …– ]>– Pros

Easy to write XML– Cons

Editing two files without moving Other document can’t reuse without copying internal subset

Page 26: Physical and Logical Structure

26

Internal and External Subset(2/3) External subset

– better to use external DTDs– Reason why?

Many benefits– document management– updating– editing

Few reasons– If you use an external DTD, you can use public DTDs(capability)– External DTDs provide for better document management– External DTDs make it easier to validate you document

Page 27: Physical and Logical Structure

27

Internal and External Subset(3/3)

internal

external

Internal subset

external subset

full parsing path

Page 28: Physical and Logical Structure

28

Element Declarations Used to define a new element, specify its allowed

content and gives the name and content model of the element

Each tag must be declared in a <!ELEMENT> declara-tion.

The content model uses a simple regular expression-like grammar to precisely specify what is and isn't al-lowed in an element

ELEMENT Type declaration ‘<!ELEMENT’ S Name S Contentspec S? ‘>’

Page 29: Physical and Logical Structure

29

Content Specifications ANY #PCDATA Sequences Choices Mixed Content Modifiers Empty

Page 30: Physical and Logical Structure

30

ANY A SEASON can contain any child element and/or raw

text (parsed character data)

Rarely used in practice, due to the lack of constraint on structure it encourages.

<!ELEMENT SEASON ANY>

Page 31: Physical and Logical Structure

31

#PCDATA Parsed Character Data; i.e. raw text, no markup Represent normal data and preceded by the hash-

symbol, ‘#’, to avoid confusion with an identical ele-ment name, when used within a model group( for example, ‘(#PCDATA | PCDATA)’)

<!ELEMENT YEAR (#PCDATA)>

Page 32: Physical and Logical Structure

32

Use of #PCDATA in XML

Valid: Invalid:<YEAR>1999</YEAR><YEAR>99</YEAR><YEAR>1999 .E.</YEAR><YEAR> The year of our Lord one thousand, nine hundred, and ninety-nine</YEAR>

<YEAR><MONTH>January</MONTH><MONTH>February</MONTH><MONTH>March</MONTH><MONTH>April</MONTH><MONTH>May</MONTH><MONTH>June</MONTH><MONTH>July</MONTH><MONTH>August</MONTH><MONTH>September</MONTH><MONTH>October</MONTH><MONTH>November</MONTH><MONTH>December</MONTH></YEAR>

Page 33: Physical and Logical Structure

33

Child Elements To declare that a LEAGUE element must have a

LEAGUE_NAME child:

<!ELEMENT LEAGUE (LEAGUE_NAME)> <!ELEMENT LEAGUE_NAME (#PCDATA)>

Page 34: Physical and Logical Structure

34

Sequences(1/2) Separate multiple required child elements with com-

mas; e.g.

One or More Children +

<!ELEMENT SEASON (YEAR, LEAGUE, LEAGUE)><!ELEMENT LEAGUE (LEAGUE_NAME, DIVISION, DIVISION, DIVISION)>

<!ELEMENT DIVISION_NAME (#PCDATA)><!ELEMENT DIVISION (DIVISION_NAME, TEAM+)>

Page 35: Physical and Logical Structure

35

Sequences(2/2) Zero or More Children *

Choices

<!ELEMENT TEAM (TEAM_CITY, TEAM_NAME, PLAYER*)><!ELEMENT TEAM_CITY (#PCDATA)><!ELEMENT TEAM_NAME (#PCDATA)>

<!ELEMENT PAYMENT (CASH | CREDIT_CARD)><!ELEMENT PAYMENT (CASH | CREDIT_CARD | CHECK)>

Page 36: Physical and Logical Structure

36

Grouping With Parentheses Parentheses combine several elements into a single

element. Parenthesized element can be nested inside other

parentheses in place of a single element. The parenthesized element can be suffixed with a

plus sign, a comma, or a question mark.

<!ELEMENT dl (dt, dd)*><!ELEMENT ARTICLE (TITLE, (P | PHOTO |GRAPH | SIDEBAR | PULLQUOTE | SUBHEAD)*, BYLINE?)>

Page 37: Physical and Logical Structure

37

Mixed Content

Both #PCDATA and child elements in a choice

#PCDATA must come first #PCDATA cannot be used in a sequence

<!ELEMENT TEAM (#PCDATA | TEAM_CITY | TEAM_NAME | PLAYER)*>

Empty elements <!ELEMENT BR EMPTY>

Page 38: Physical and Logical Structure

38

Attribute Declarations Consider this element:

It is declared like this:

<GREETING LANGUAGE="Spanish"> Hola!</GREETING>

<!ELEMENT GREETING (#PCDATA)><!ATTLIST GREETING LANGUAGE CDATA "English">

<!ATTLIST Element_name Attribute_name Type Default_value>

Page 39: Physical and Logical Structure

39

Multiple Attribute Declarations

Consider this element

With two attribute declarations:

With one attribute declaration Indentation is a convetion, not a requirement

<RECT LENGTH="70px" WIDTH="85px"/>

<!ELEMENT RECTANGLE EMPTY><!ATTLIST RECTANGLE LENGTH CDATA "0px"><!ATTLIST RECTANGLE WIDTH CDATA "0px">

<!ATTLIST RECTANGLE LENGTH CDATA "0px" WIDTH CDATA "0px">

Page 40: Physical and Logical Structure

40

Attribute Types

CDATA ID IDREF IDREFS ENTITY

ENTITIES NOTATION NMTOKEN NMTOKENS Enumerated

Page 41: Physical and Logical Structure

41

CDATA Most general attribute type

Value can be any string of text not containing a less-than sign (<) or quotation marks (")

Page 42: Physical and Logical Structure

42

ID Value must be an XML name

– May include letters, digits, underscores, hyphens, and peri-ods

– May not include whitespace– May contain colons only if used for namespaces

Value must be unique within ID type attributes in the document

Generally the default value is #REQUIRED

Page 43: Physical and Logical Structure

43

IDREF Value matches the ID of an element in the same doc-

ument Used for links and the like

IDREFS

A list of ID values in the same documentSeparated by white space

Page 44: Physical and Logical Structure

44

ENTITY Value is the name of an unparsed general entity de-

clared in the DTD

ENTITIES

Value is a list of unparsed general entities declared in the DTDSeparated by white space

Page 45: Physical and Logical Structure

45

NOTATION Value is the name of a notation declared in the DTD

<!NOTATION Tex SYSTEM “..\TEXVIEW.EXE”>

<!ENTITY Logo SYSTEM “LOGO.TEX” NDATA Tex>

TEXVIEW.EXE LOGO.TEX

1

2

34

Page 46: Physical and Logical Structure

46

NMTOKEN Value is any legal XML name

NMTOKENS

Value is a list of XML namesSeparated by white space

Page 47: Physical and Logical Structure

47

Enumerated Not a keyword Refers to a list of possible values from which one

must be chosen Default value is generally provided explicitly

<!ATTLIST P VISIBLE (TRUE | FALSE) "TRUE">

Page 48: Physical and Logical Structure

48

Attribute Default Values A literal string value One of these three keywords

– #REQUIRED– #IMPLIED– #FIXED

Page 49: Physical and Logical Structure

49

#REQUIRED No default value is provided in the DTD Document authors must provide attribute value for

each element

<!ELEMENT IMG EMPTY><!ATTLIST IMG ALT CDATA #REQUIRED><!ATTLIST IMG WIDTH CDATA #REQUIRED><!ATTLIST IMG HEIGHT CDATA #REQUIRED>

Page 50: Physical and Logical Structure

50

#IMPLIED No default value in the DTD Author may(but does not have to) provide a value

with each element

Page 51: Physical and Logical Structure

51

#FIXED Value is the same for all elements Default value must be provided in DTD Document author may not change default value

<!ELEMENT AUTHOR EMPTY><!ATTLIST AUTHOR NAME CDATA #REQUIRED><!ATTLIST AUTHOR EMAIL CDATA #REQUIRED><!ATTLIST AUTHOR EXTENSION CDATA #IMPLIED><!ATTLIST AUTHOR COMPANY CDATA #FIXED "TIC">

Page 52: Physical and Logical Structure

52

Example of Internal DTDs

<?xml version="1.0"?><!DOCTYPE GREETING [ <!ELEMENT GREETING (#PCDATA)>]><GREETING>Hello XML!</GREETING>

Page 53: Physical and Logical Structure

53

Internal DTD Subsets Internal declarations override external declarations

<?xml version="1.0"?><!DOCTYPE GREETING SYSTEM "greeting.dtd" [ <!ELEMENT GREETING (#PCDATA)>]><GREETING>Hello XML!</GREETING>