xpath 2.0 w3/tr/xpath20/ w3/tr/xquery-operators

XPath 2.0 http://www.w3.org/TR/xpath20/

http://www.w3.org/TR/xquery-operators/

Roger L. Costello6 March 2010

Set this to XPath 2.0

Using Namespaces in Oxygen

• Suppose in the Oxygen XPath expression evaluator tool you would like to write expressions such as this: current-dateTime() - xs:dateTime('2008-01-14T00:00:00')

• How do you tell Oxygen what namespace the "xs" prefix maps to? Here's how:– Go to:

Options ► Preferences ► XML ► XSLT-FO-XQuery ► XPath and in the Default prefix-namespace mappings table add a new entry mapping xs to the XML Schema namespace http://www.w3.org/2001/XMLSchema

XML Document<?xml version="1.0" encoding="UTF-8"?><planets> <planet> <name>Mercury</name> <mass units="(Earth = 1)">.0553</mass> <day units="days">58.65</day> <radius units="miles">1516</radius> <density units="(Earth = 1)">.983</density> <distance units="millions miles">43.4</distance> </planet> <planet> <name>Venus</name> <mass units="(Earth = 1)">.815</mass> <day units="days">116.75</day> <radius units="miles">3716</radius> <density units="(Earth = 1)">.943</density> <distance units="millions miles">66.8</distance> </planet> <planet> <name>Earth</name> <mass units="(Earth = 1)">1</mass> <day units="days">1</day> <radius units="miles">2107</radius> <density units="(Earth = 1)">1</density> <distance units="millions miles">128.4</distance> </planet></planets>

planets.xml

We will use this XML document throughout this tutorial, so spend a minute or two familiarizing yourself with it.

It is planets.xml in the example01 folder. Please load it into Oxygen XML.

Sequences

• Sequences are central to XPath 2.0

• XPath 2.0 operates on sequences, and generates sequences.

• A sequence is an ordered collection of nodes and/or atomic values.

Example Sequences

• This sequence is composed of three atomic values:(1, 2, 3)

• This sequence is also composed of three atomic values:

('red', 'white', 'blue')• This XPath expression will generate a sequence

composed of three <name> nodes:(//planet/name)

See example01http://www.w3.org/TR/xpath20/#id-sequence-expressions

More Sequence Examples

• With the following XPath, a sequence of six nodes are generated; the first three are <mass> nodes, the next three are <name> nodes:

(//planet/mass, //planet/name)

• This sequence contains node values followed by atomic values:

(//planet/name, 1, 2, 3)

See example02

Definition of Sequence

• A sequence is an ordered collection of zero or more items.• An item is either an atomic value or a node.• An atomic value is a single, non-variable piece of data, e.g.

10, true, 2007, "hello world". (An atomic value is an XML Schema simpleType value)

• There are seven kinds of nodes:– element, text, attribute, document, PI, comment, namespace

• A sequence containing exactly one item is called a singleton sequence.

• A sequence containing zero items is called an empty sequence.

http://www.w3.org/TR/xpath20/#dt-item

Sequence Constructor

• A sequence is constructed by enclosing an expression in parentheses.

• Each item is separated by a comma.– The comma is called the sequence constructor

operator.

No Nested Sequences

• If you have a sequence (1, 2) and nest it in another sequence

((1, 2), 3) the resulting sequence is flattened to simply

(1, 2, 3)• A nested empty sequence is removed

(1, (2, 3), (), 4, 5, 6)the resulting sequence is flattened to simply:

(1, 2, 3, 4, 5, 6)See example03

Extract Items from a Sequence

• You can extract items from a sequence using the […] operator (predicate):

(4, 5, 6)[2]returns the singleton sequence:

• This XPath expression: //planet[2]returns the second planet

See example04

The index must be an integer

• The predicate value must be an integer (more specifically, it must be an XML Schema integer datatype).

(sequence)[index]

The index must be an integer

Initializing

• Example: suppose an element may or may not have an attribute, discount. If the element has the discount attribute then return its value; otherwise, return 0.

(@discount, 0)[1]

Context Item

• Dot "." stands for the current context item.

• The context item can be a node, e.g.//planet[.]

or it can be an atomic value, e.g. (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)[. mod 2 = 0]

See example05

count(sequence)

• This function returns an integer, representing the number of items in the sequence.

See example03.bhttp://www.w3.org/TR/xquery-operators/#func-count

Why Nested Parentheses?

Compare these two:

count((1, 2, 3)) count(1, 2, 3)

Notice the nested parentheses

Why is this one correct and the other one incorrect?

Answer

• The count function has only one argument.• This form:

count(1, 2, 3)provides three arguments to count, which is incorrect.

• This form:count((1, 2, 3))

provides one argument to count (the argument is a sequence with three items).

Sequence of Sequences?

• There is no such thing as a sequence of sequences!

• There's only one sequence; all subsequences get flattened into a single sequence.

count((//planet, (1, 2, 3), ('red', 'white', 'blue')))

sequence of sequences?

The value of a non-existent node is the empty

sequence, ()

/Planets/Planet[999]

There is no 999th Planet,so the result of evaluating thisXPath expression is the empty sequence, denoted by ()

() is not equal to ''

• An empty sequence is not equal to a string of length zero.

('a', 'b', (), 'c') is not equal to ('a', 'b', '', 'c')

See example03.a

count = 3 count = 4

This predicate [.] eliminates empty strings

The value of ('a', '')[.] is just ('a')

The value of ('a', 'b', '', 'c')[.] is just ('a', 'b', 'c')

Two built-in functions

true()

false()

http://www.w3.org/TR/xquery-operators/#func-truehttp://www.w3.org/TR/xquery-operators/#func-false

index-of(sequence, value)

• The index-of() function allows you to obtain the position of value in sequence.

index-of((1,3,5,7,9,11), 7)

Output: (4)7 is at the 4th index position.

sequence value

http://www.w3.org/TR/xquery-operators/#func-index-of

Suppose the value occurs at multiple locations in the

sequence• index-of returns a sequence of index

locations. In the last example the result was a sequence of length 1.

index-of((1,3,5,7,9,11,7,7), 7)

multiple 7's in the sequence

Output: (4, 7, 8)See example05.1

remove(sequence, position)

• The remove function enables you to remove a value at a specified position from a sequence.

remove((1,3,5,7,9,11), 4)

sequence position

Output: (1, 3, 5, 9, 11)

See example05.2http://www.w3.org/TR/xquery-operators/#func-remove

remove this

The "to" Range Operator

• The range operator–to–can be used to generate a sequence of consecutive integers:

(1 to 10)returns the sequence:

(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)• This expression:

(1 to 100)[(. mod 10) = 0]returns the sequence:

(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)• This expression:

(1, 2, 10 to 14, 34, 99)returns this disjointed sequence:

(1, 2, 10, 11, 12, 13, 14, 34, 99) See example06

The operands of "to" must be integers

('a' to 'z')

Error message you will get:"Error: Required type of first operand of 'to'is integer; supplied value has type string"

This is not valid:

insert-before(sequence, position,value)

insert-before((1,3,4,5,6,7,8,9),2,2

sequence (note: '2' is missing) po

Output: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

http://www.w3.org/TR/xquery-operators/#func-insert-before

insert the value 2 before position 2

Appending a value to the end

insert-before(1 to 10, count(1 to 10) + 1, 2)

Output: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2)

Specify a position greater than thelength of the sequence

The inserted value can be a sequence

insert-before((1,3,4,5,6,7,8,9),2,(2,3))

Output: (1, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10)

sequence of values

See example05.3

Sequence Functions

• index-of() returns the index (position) of a value

• [idx] returns the value at idx

• remove() returns the sequence minus the item whose index (position) is specified

• insert-before() returns the sequence plus a new value

Do Lab8

Sequences are Ordered

• Order matters.

• This generates a sequence composed of the <mass> elements followed by the <name> elements:

(//planet/mass, //planet/name)

See example07

reverse(sequence)

See example07.1

Notice in the first example the items are wrapped in parentheses (thus creating a sequence).

http://www.w3.org/TR/xquery-operators/#func-reverse

• This function reverses the items in sequence.

The for Expression

• Use the for expression to loop (iterate) over all items in a sequence. This is its general form: for variable in sequence return expression

• Here's an example which iterates over the integers 1-10, multiplying each integer by two:

for $i in (1 to 10) return $i * 2returns

(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)

See example08http://www.w3.org/TR/xpath20/#id-for-expressions

for Expression Examples

• This iterates over each <planet> element, and returns its <radius> element: for $p in /planets/planet return $p/radius

• This iterates over each <radius> element, and returns itself (the sequence generated is identical to above): for $r in /planets/planet/radius return $r

• This iterates over each letter of the alphabet:for $i in ('a','b','c','d','e','f','g','h','i','j','k','l', 'm','n','o','p','q','r','s','t','u','v','w','x','y','z') return $i

See example09

More for Examples

• This returns the radius converted to kilometers (it returns numbers, not nodes):for $r in /planets/planet/radius return $r * 1.61

• This applies the avg() function to the sequence of nodes returned by the for expression:avg(for $r in /planets/planet/radius return $r)

See example10

Terminology

for variable in sequence return expression

range variable

input sequence

return expression

The return expression is evaluated once for each item in the input sequence.

Multiple Variables

Multiple variables can be used:

for variable in sequence return expression,

Example of Multiple Variables

for $x in (1, 2), $y in (3, 4) return ($x * $y)

returns (3, 4, 6, 8)

See example11

Do Lab9

The if Expression

• The form of the if expression is:if (boolean expression) then expression1 else expression2

• If the boolean expression evaluates to true then the result is expression1, else the result is expression2

• This if expression finds the minimum of two numbers:if (10 < 20) then 10 else 20

• This for loop returns all the positive numbers in the sequence:for $i in (0, -3, 5, 7, -1, 2) return if ($i > 0) then $i else ()

See example12http://www.w3.org/TR/xpath20/#id-conditionals

Nested if-then-else

if (boolean expr) then expr1 else expr2

These can be an if-then-else

Notes about the if Expression

1. You must wrap the boolean expression in parentheses.

2. You must have an "else" part. There is no if-then expression, only an if-then-else

Do Lab10

The some Expression

• The form of the some expression is:some variable in sequence satisfies boolean expression

• The result of the expression is either true or false.

• Using the some expression means that at least one item in the sequence satisfies the boolean expression.

http://www.w3.org/TR/xpath20/#id-quantified-expressions

Examples of the some Expression

• This example determines if there are some (one or more) negative values in the sequence:some $i in (2, 6, -1, 3, 9) satisfies $i < 0

• Note that this produces the same boolean result: (2, 6, -1, 3, 9) < 0 because "<" is a general comparison operator, i.e. it compares each item in the sequence until a match is found.

See example13

More Examples of "some"

• Is there is some planet that has a radius greater than 2000?some $i in /planets/planet satisfies $i/radius > 2000

• Note that this produces the same boolean result: /planets/planet/radius > 2000

See example14

The every Expression

• The form of the every expression is:every variable in sequence satisfies boolean expression

• The result of the expression is either true or false.

• Using the every expression means that every item in the sequence satisfies the boolean expression.

http://www.w3.org/TR/xpath20/#id-quantified-expressions

Examples of the every Expression

• This example determines if every item in the sequence is positive:every $i in (2, 6, -1, 3, 9) satisfies $i > 0

• Note that this produces the same boolean result: not((2, 6, -1, 3, 9) <= 0)

Multiple Universal Quantifiers

• An XPath expression can have multiple universal quantifiers.

every variable in sequence satisfies condition,

See example15

Union Operator

• The union operator is used to combine two node sequences (cannot union atomic sequences).

• Example: /planets/planet/mass union /planets/planet/radius

produces the sequence:<mass units="(Earth = 1)">.0553</mass><radius units="miles">1516</radius><mass units="(Earth = 1)">.815</mass><radius units="miles">3716</radius><mass units="(Earth = 1)">1</mass><radius units="miles">2107</radius>

http://www.w3.org/TR/xpath20/#combining_seq

Equivalent

/planets/planet/mass union /planets/planet/radius

/planets/planet/mass | /planets/planet/radius

The union and | operators are equivalent.

Duplicates are Eliminated

• When you union two node sets, any duplicates are eliminated.

• This yields 3 nodes, not 6:

/planets/planet/mass union /planets/planet/mass

See example16

Intersect Operator

• The intersect operator returns the intersection of two node sequences.

• Example: find all planets with mass over .8 and radius over 2000:/planets/planet[mass > .8] intersect /planets/planet[radius > 2000]

<planet> <name>Venus</name> <mass units="(Earth = 1)">.815</mass> <day units="days">116.75</day> <radius units="miles">3716</radius> <density units="(Earth = 1)">.943</density> <distance units="millions miles">66.8</distance></planet><planet> <name>Earth</name> <mass units="(Earth = 1)">1</mass> <day units="days">1</day> <radius units="miles">2107</radius> <density units="(Earth = 1)">1</density> <distance units="millions miles">128.4</distance></planet>http://www.w3.org/TR/xpath20/#combining_seq

Equivalent

/planets/planet[mass > .8] intersect /planets/planet[radius > 2000]

/planets/planet[(mass > .8) and (radius > 2000)]

Duplicates are Eliminated

• When you intersect two node sets, any duplicates are eliminated.

• This yields 2 nodes, not 4:

/planets/planet[mass > .8] intersect /planets/planet[mass > .8]

See example17

Except Operator

• The except operator returns the difference between two node sequences.

• Example: get all planets except Earth:/planets/planet except /planets/planet[name='Earth']

<planet> <name>Mercury</name> <mass units="(Earth = 1)">.0553</mass> <day units="days">58.65</day> <radius units="miles">1516</radius> <density units="(Earth = 1)">.983</density> <distance units="millions miles">43.4</distance></planet><planet> <name>Venus</name> <mass units="(Earth = 1)">.815</mass> <day units="days">116.75</day> <radius units="miles">3716</radius> <density units="(Earth = 1)">.943</density> <distance units="millions miles">66.8</distance></planet>

http://www.w3.org/TR/xpath20/#combining_seq

Equivalent

/planets/planet except /planets/planet[name='Earth']

/planets/planet[name!='Earth']

See example18

I posed a challenge to the xml-dev list, challenging them to simplify an XPath expression. Their answer is awesome.

Problem: create an XPath expression for this:

There must be one child Title element and there must be zero or more child Author elements and there must be one child Date element and nothing else.

Here's the XPath 2.0 expression I created: count(Title) eq 1 and count(Author) ge 0 and count(Date) eq 1 and count(*[not(name() = ('Title','Author','Date'))]) eq 0

See next slide for the solution created by the XPath masters on xml-dev

Title and Date and empty(* except (Title[1], Date[1], Author))

Incredible, don't you think?

No Duplicates, Document Order

• The union, intersect, and except operators return their results as sequences in document order, without any duplicate items in the result sequence.

"Duplicate" is Based on Identity, Not Value

• Two nodes are duplicates iff they are the exact same node.

• These two elements have the same value, but different identities

Do Lab11

Multiple Node Tests

• Recall that in XPath 1.0 an XPath expression is composed of steps separated by slashes: node-test slash node-test slash …

• At each step you can only specify one node test.

• In XPath 2.0 you can specify multiple node tests on each step.

Example of Multiple Node Tests

• Example: select the mass and radius for each planet: /planets/planet/(mass|radius)

Equivalent

/planets/planet/(mass|radius)

/planets/planet/(mass union radius)

/planets/planet/mass | /planets/planet/radius

/planets/planet/*[(self::mass) or (self::radius)]

See example19

Examples of Multiple Node Tests using Union and

Intersect Operators

/test/(a, b) union /test/(c, d, e)

Output:

/test/(a, b, c) intersect /test/(b, c, d)

Output:

B<c>C</c>

XML: XPath: XPath:

See example20

Feed Nodes into a Function

• In XPath 1.0 an expression following a slash identifies node(s).

• In XPath 2.0 an expression following a slash can be a function. Each value preceding the slash is fed into the function.

/planets/planet/name/substring(.,1,1)

The name of each planet is fed into Output: ("M", "V", "E")

See example21

Feed Nodes into a for loop

/planets/planet/day/(for $i in . return $i * 2)

Output: (117.3, 233.5, 2)

Note: be sure you wrap the for-loop in parentheses.

See example22

Can't Feed Atomic Values

• The previous slides showed feeding nodes into a function and for-loop.

• You cannot feed atomic values, e.g., this is illegal: (1 to 10)/(for $i in . return $i)

Here's the error message you get:Error: Required item type of first operand of / is node(); supplied value has item type xs:integer

See example22.a

Do Lab12

Comments

• XPath 2.0 expressions may be commented using this syntax:

(: comment :)

(: multiply each day by two :) /planets/planet/day/(for $i in . return $i * 2)

General Comparison Operators

• Here are the general comparison operators:=, !=, <, <=, >, >=

• These operators are used to compare sequences.

• Each item in one sequence is compared against each item in the other sequence; the comparison evaluates to true if one or moreitem-item comparisons evaluates to true.

http://www.w3.org/TR/xpath20/#id-general-comparisons

How General Comparison Works

(item1, item2) op (item3, item4)

is evaluated as:

(item1 op item3) or (item1 op item4) or (item2 op item3) or (item2 op item4)

(1, 2) = (2, 3)

is evaluated as:

(1 = 2) or (1 = 3) or (2 =2) or (2 = 3)

this it returns true

(1, 2) = (3, 4)

returns false because there are no equal values between the sequences

See example23

Example

• The left side returns a sequence of two planets (Venus, Earth), and the right side returns a sequence of three planets (Mercury, Venus, Earth).

• The result is true.

/planets/planet[mass > .8] = /planets/planet[density > .9]

See example24

Definition of Equal

• Two nodes are equivalent if:– their node values are the same– the order of the values are the same– the number of values is the same

• The tag names can be different. Comparison is based on data, not markup.

Example

• The below document has two <planet> elements. They use different tag names./planets/planet[1] = /planets/planet[2] returns true.

<planets> <planet> <name>Mercury</name> <mass units="(Earth = 1)">.0553</mass> <day units="days">58.65</day> <radius units="miles">1516</radius> <density units="(Earth = 1)">.983</density> <distance units="millions miles">43.4</distance> </planet> <planet> <n>Mercury</n> <m units="(Earth = 1)">.0553</m> <d units="days">58.65</d> <r units="miles">1516</r> <d units="(Earth = 1)">.983</d> <d units="millions miles">43.4</d> </planet></planets>

See example25

Equivalent?

• Problem: find all planets whose name is not in this sequence ('Earth', 'Mars')

• Are these equivalent?/planets/planet[not(name = ('Earth', 'Mars'))]/planets/planet[name != ('Earth', 'Mars')]

Not Equivalent!

<planet> <name>Mercury</name> <mass units="(Earth = 1)">.0553</mass> <day units="days">58.65</day> <radius units="miles">1516</radius> <density units="(Earth = 1)">.983</density> <distance units="millions miles">43.4</distance> </planet><planet> <name>Venus</name> <mass units="(Earth = 1)">.815</mass> <day units="days">116.75</day> <radius units="miles">3716</radius> <density units="(Earth = 1)">.943</density> <distance units="millions miles">66.8</distance> </planet>

/planets/planet[not(name = ('Earth', 'Mars'))]

<planet> <name>Mercury</name> <mass units="(Earth = 1)">.0553</mass> <day units="days">58.65</day> <radius units="miles">1516</radius> <density units="(Earth = 1)">.983</density> <distance units="millions miles">43.4</distance> </planet><planet> <name>Venus</name> <mass units="(Earth = 1)">.815</mass> <day units="days">116.75</day> <radius units="miles">3716</radius> <density units="(Earth = 1)">.943</density> <distance units="millions miles">66.8</distance> </planet><planet> <name>Earth</name> <mass units="(Earth = 1)">1</mass> <day units="days">1</day> <radius units="miles">2107</radius> <density units="(Earth = 1)">1</density> <distance units="millions miles">128.4</distance> </planet>

/planets/planet[name != ('Earth', 'Mars')]

Explanation

/planets/planet[not(name = ('Earth', 'Mars'))]

for each planet is its name 'Earth' or 'Mars'? if so, don't return it otherwise return it

/planets/planet[name != ('Earth', 'Mars')]

for each planet is its name not 'Earth' or not 'Mars'? if so, don't return it otherwise return it

Consider the planet whose name is Earth:

EarthEarth

equal?

not((Earth equal Earth) or (Earth equal Mars))not(true or false)not(true)false

Consider the planet whose name is Earth:

EarthEarth

not equal?

(Earth not equal Earth) or (Earth not equal Mars)false or truetrue(Every planet will not equal Earth or Mars, so everyplanet is returned.

See example26

Value Comparison Operators

• Here are the value comparison operators: eq, ne, lt, le, gt, ge

• These operators are used to compare atomic values.

• Example:10 lt 30 returns true• Example:

/planets/planet[1]/name eq 'Mercury' returns true

See example27http://www.w3.org/TR/xpath20/#id-value-comparisons

No Sequences Allowed!

• Suppose the third planet contains two <name> elements:

<planet><name>Earth</name><name>Mother Earth</name>

</planet>then

/planets/planet[3]/name eq 'Earth'raises an error:

"Error! A sequence of more than one item is not allowed as the first operandof 'eq'."

See example28

However, this works

Note that:/planets/planet[3]/name = 'Earth'

returns true because the "=" operator is used with sequences.

See example29

is Operator

• You can compare two nodes to see if they are the same nodes by using the "is" operator:

expr1 is expr2returns true only if expr1 and expr2 identify the same node. expr1 and expr2 must be singleton sequences.

This expression //planet[mass = .815] is //planet[day = 116.75]returns true because both expressions identify the same <planet> element

See example30http://www.w3.org/TR/xpath20/#id-node-comparisons

<< Operator

• This expressionexpr1 << expr2

returns true if the node identified by expr1 comes before the node identified by expr2 in the document.

This expression //planet[mass = .0553] << //planet[mass = .815]returns true because the left expression identifies Mercury, the right expression identifies Venus, and Mercury comes before Venus in the document

See example31http://www.w3.org/TR/xpath20/#id-comparisons

>> Operator

• This expressionexpr1 >> expr2

returns true if the node identified by expr1 comes after the node identified by expr2 in the document.

This expression //planet[mass = .815] >> //planet[mass =.0553]returns true because the left expression identifies Venus, the right expression identifies Mercury, and Venus comes after Mercury in the document

See example32http://www.w3.org/TR/xpath20/#id-comparisons

Do Lab13

Arithmetic Operators

• Here are the arithmetic operators:+, -, *, div, mod, idiv

• The idiv operates on integers and returns an integer rounded toward zero, e.g.

3 idiv 2 returns 1-5 idiv 2 returns -2

See example33http://www.w3.org/TR/xpath20/#id-arithmetic

Equivalent

n idiv m

floor(n div m) if n and m are positive

ceiling(n div m) if n or m is negative

current-dateTime Function

• current-dateTime() is an XPath 2.0 function that returns the current date and time, e.g.

2008-01-19T14:19:26.406-05:00

• The value returned by this function is of type xs:dateTime (the XML Schema dateTime datatype).

See example34http://www.w3.org/TR/xquery-operators/#func-current-dateTime

The matches() Function

• The form of the matches function is:matches(input string, regex)

• It is a boolean function. It returns true if the input string matches the regular expression, false otherwise.

if (matches(/planets/planet[2]/name, 'Venus')) then 'Success' else 'Failure'

The matches() function evaluates to true; the result is 'Success'

http://www.w3.org/TR/xpath-functions/#func-matches

The matches() Function

if (matches(/planets/planet[2]/name, 'V[a-z]+s')) then 'Success' else 'Failure'

This regex says: Any string that starts with 'V' ends with 's' and has at least one lowercase letter of the alphabet.

See example44

Regular Expressions

• The following 4 slides show examples of regular expressions:

Regular Expressions Chapter \d Chapter \d a*b [xyz]b a?b a+b [a-c]x

Examples Chapter 1 Chapter 1 b, ab, aab, aaab, … xb, yb, zb b, ab ab, aab, aaab, … ax, bx, cx

Regular Expressions (cont.)

Regular Expressions[a-c]x

[-ac]x

[ac-]x

[^0-9]x

Chapter\s\d

(ho){2} there

(ho\s){2} there

(a|b)+x

Examplesax, bx, cx

-x, ax, cx

ax, cx, -x any non-digit char followed by x

any non-digit char followed by x

Chapter followed by a blank followed by a digit

hoho there

ho ho there any (one) char followed by abc

ax, bx, aax, bbx, abx, bax,...

a{1,3}x

a{2,}x

\w\s\w

ax, aax, aaax

aax, aaax, aaaax, …

word character (alphanumeric plus dash) followed by a space followed by a word character

[a-zA-Z-[Ol]]* A string composed of any lower and upper case letters, except "O" and "l"

\. The period "." (Without the backward slash the period means "any character")

^Hello

Hello$

^Hello$

Hello (and it must be at the beginning)

Hello (and it must be at the end)

Hello (and it must be the only value)

linefeed

carriage return

The backward slash \

The vertical bar |

The hyphen -

The caret ^

The question mark ?

The asterisk *

The plus sign +

The open curly brace {

The close curly brace }

The open paren (

The close paren )

The open square bracket [

The close square bracket ]

Regular Expressions (concluded)

\p{Lu}

\p{Ll}

\p{Nd}

\p{Sc}

A letter, from any language

An uppercase letter, from any language

A lowercase letter, from any language

A number - Roman, fractions, etc

A digit from any language

A punctuation symbol

A currency sign, from any language

\p{Sc}\p{Nd}+(\.\p{Nd}\p{Nd})? "currency sign from anylanguage, followed by one or more digits from any language, optionally followed by a period and two digits from anylanguage"

Different from the Regex in the XML Schema Pattern Facet

Consider this XML Schema element declaration:

<Free-text>Hello</Free-text>

And suppose this is the input:

The input validates against the schema. That is, the string "Hello" matches the regex in the pattern facet.Likewise, using the same input and regex, the matches function succeeds:

if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure'

Different from the Regex in the XML Schema Pattern Facet

<Free-text>He said Hello World</Free-text>

Next, consider this input:

The input does not validate against the schema. That is, the string "He said Hello World" does not match the regex in the pattern facet.Conversely, the matches function does succeed:

http://www.w3.org/TR/xpath-functions/#regex-syntax

XSD Regex's are Implicitly Achored

• When you give a regex in a pattern facet, there are "implicit anchors" in the regex.

• The regex "Hello" is actually this:

^Hello$

The ^ matches the start of the input

The $ matches the end of the input

Thus "Hello" matches only input that starts with H, ends with o, and in between is ello.

No Implicit Anchors in XPath Regex's

• The regex "Hello" in XPath has no implicit anchors. Any anchors must be explicitly specified.

• Thus, the regex "Hello" matches any input that contains the string Hello

is equivalent to:

if (contains(//Free-text, 'Hello')) then 'Success' else 'Failure'

See example45

Case-Insensitivity Mode

• The matches function has an optional third argument:

matches(input, regex, flags)• The "i" flag is used to: perform a case-insensitive

comparison of the input and the regex.

Example: suppose this is the input:

<Free-text>He said HELLO WORLD</Free-text>

Consider this XPath:

if (matches(//Free-text, 'Hello', 'i')) then 'Success' else 'Failure'

The result is 'Success' because the input is checked to see if it contains 'Hello', 'hello', 'HELLO', 'HeLLO', etc.

The Default is Case-Sensitive

• If the "i" flag is not used in the matches function, it defaults to a case-sensitive comparison.

The result is 'Failure' because the input is checked to see if it contains 'Hello'

See example46

Multiline Mode

• The "m" flag is used to indicate that the input should be treated as composed of one or more lines, each line has a start and end, and the regex should be compared against each line.

Example: suppose this is the input:

<Free-text>He said Hello World</Free-text>

if (matches(//Free-text, '^Hello', 'm')) then 'Success' else 'Failure'

The result is 'Success.' The regex says: does the input start with the string 'Hello.' The 'm' flag say: check each line. Thus, the result is 'Success' since the second line start with 'Hello.'

The Default is One Long String

• If the "m" flag is not used in the matches function, it defaults to treating the input as one long string, with one start and one end.

if (matches(//Free-text, '^Hello')) then 'Success' else 'Failure'

The result is 'Failure' because the input is treated as one long string and 'Hello' does not start the string.

See example47

Dot-all Mode

• The "s" flag is used to indicate that the dot (.) character matches every character, including the newline (x0A) character.

• If the "s" flag is not used, the default behavior is for the dot character to match every character except the newline character.

if (matches('HelloWorld', 'H.*World')) then 'Success' else 'Failure'

The result is 'Failure'

if (matches('HelloWorld', 'H.*World', 's')) then 'Success' else 'Failure'

The result is 'Success'

See example48

Ignore Whitespace Mode

• The "x" flag is used to indicate that whitespace in a regex should be ignored.

• If the "x" flag is not used then any whitespace in the regex is treated as part of the regex.

if (matches('abcabc', '(a b c)+')) then 'Success' else 'Failure'

The result is 'Failure.' The regex only matches this input: a b c, a b c a b c, etc.

if (matches('abcabc', '(a b c)+', 'x')) then 'Success' else 'Failure'

The result is 'Success.' The regex only matches this input: abc, abcabc, etc.

See example49

Multiple Flags

• Zero or more flags can be specified.

• The default value is used for modes not specified.

if (matches('HelloWorld', '^WORLD$', 'im')) then 'Success' else 'Failure'

The result is 'Success.' The regex says: The input must begin and end with the literal string 'WORLD.' The flags say: ignore case and treat the input as 2 lines, and compare each line.

See example50

Do Lab14

The tokenize() Function

• Use to split up a string into pieces (tokens).

• A regex specifies the characters that separate the tokens.

for $i in tokenize('12, 16, 3, 99', ',\s*') return $i

The result is: 12 16 3 99

http://www.w3.org/TR/xpath-functions/#func-tokenize

Use Flags with tokenize()

• The flags (i, m, s, x) we saw with the matches() function are also available with tokenize()

for $i in tokenize('12xx16XX3xX99', 'xx', 'i') return $i

The result is: 12 16 3 99

See example51

Separators are Discarded

• The separators are specified using a regex.

• The input string is processed from left to right, looking for substrings that match the regex.

• The separators are discarded, the remaining strings are collected and yield the output sequence.

Example: Footnote References as Separators

• Tokenize the input using [n] as the separators.

• For example, tokenize this:XPath[1] XSLT[2]

into these tokens: XPath XSLT

Will this work?

tokenize('XPath[1] XSLT[2]', '\[.+\]')

+ is a Greedy Quantifier

• The regex on the previous slide does not produce the desired result.

• Here's why: the + operator searches for the longest string that matches. It is called a greedy operator.

\[.+\] Read as: find the longest string that startswith '[' and ends with ']'

See example52

Why Does This Work?

tokenize('XPath[1] XSLT[2]', '\[\d+\]')

Regex is for [digit(s)]

tokenize('XPath[1] XSLT[2]', '\[\d+\]')

Only permit digits in the brackets

See example53

+? is a non-Greedy Operator

• If you want to match the shortest possible substring, add a '?' after the quantifier to make it non-greedy.

\[.+?\] Read as: find the shortest string that startswith '[' and ends with ']'

tokenize('XPath[1] XSLT[2]', '\[.+?\]') Yields the desired tokens: 'XPath' and 'XSLT'

See example54

* and + are Greedy

• Above we saw that + is greedy

• * is also greedy

• To make them non-greedy append a '?'*? and +?

Regex with 2 Alternatives, and Both Match

• Consider this XPath: tokenize('bab', 'a|ab')

• What tokens will be generated?{b, b} or {b}

First Alternative Wins!

• If multiple alternatives match, the first one is used.

• Thus, the result is: {b, b}

• Suppose that's not what we want. We want the longest alternative ('ab') used whenever possible.

See example55

Solution

• Both of these regex's give the desired result:ab|a or ab?

See example56

Separator Matches Beginning and Ending

• Consider this XPath: tokenize('aba', 'a')

• The input string starts with the separator and ends with the separator

• What will be the result?

Zero-length Strings

• The output is a zero-length string, 'b', zero-length string:

{'', 'b', ''}

See example57

Regex Doesn't Match Input

• If the regex doesn't match the input string then the result is the input string:

tokenize('bbb', 'a') produces {'bbb'}

See example58

Do Lab15

What Separator?

• Suppose you want to split (tokenize) this string W151TBH into

{'W', '151', 'TBH'}

• That is, separate the numeric from the alphabetic.

• What regex would you use?

Need More Knowledge

• The problem can't be solved given what we currently know.

• However, it can be solved by using the tokenize() function with the replace() function, so let's learn about replace().

The replace() Function

• The replace() function replaces any string that matches the regex with a replacement string:

replace(input, regex, replacement)

• Example: this removes all vowels:replace('Hello World', '[aeiou]', '')

returns:{'Hll Wrld'}

See example59http://www.w3.org/TR/xpath-functions/#func-replace

Example

• What is the result of this replace:replace('banana', '(an)*a', '#')

See example60

* is a Greedy Operator

• The result of: replace('banana', '(an)*a', '#')is b#

• (an)* looks for the longest string of 'anan…'

• The * is a greedy operator

• To make it non-greedy, append ? to the * replace('banana', '(an)*?a', '#')

• The result is: b#n#n#

See example61

Two Matching Alternatives

• Suppose the regex contains two alternatives, and both match:

replace('banana', 'a|an', '#')

• What will be the result?

Leftmost Alternative Wins

• The rule is that the first (leftmost) alternative wins:

replace('banana', 'a|an', '#')results in:

b#n#n#• Switching the alternatives:

replace('banana', 'an|a', '#')results in:

See example62

Using Variables in the Replacement String

• Consider a regex composed of a sequence of parenthesized expressions:

( … )( … )( … )

$1 $2 $3

$1 stands for the characters matched by the first parenthesized expression

$2 stands for the characters matched by the second parenthesized expression

$9 stands for the characters matched by the ninth parenthesized expression

Example: Insert Hyphens into a Date

replace('12March2008', '([0-9]+)([a-zA-Z]+)([0-9]+)', '$1-$2-$3')

The result is: 12-March-2008

See example63

Regex Doesn't Match Input

• If the regex doesn't match the input then the result will be unchanged:

replace('aaaa', 'b', '#')

The result is: aaaa

See example64

Use Flags with replace()

• replace() uses the same flags as matches() and tokenize(): i, m, s, x

• Example: replace('Haha', 'h', 'b', 'i')returns:

See example65

Do Lab16

Tokenize this String

• How would you separate the numeric parts from the character parts:

W151TBH

{'W', '151', 'TBH'}

Step 1

• Use replace() to append a hash mark (#) onto the end of each part:

W151TBH

W#151#TBH#

This is accomplished using replace:replace('W151TBH', '([0-9]+|[a-zA-Z]+)', '$1#')

See example66

Step 2

• Tokenize using # as the separator:

W#151#TBH#

{'W', '151', 'TBH', ''}

This is accomplished by this: tokenize('W#151#TBH#', '#')

See example67

Step 3

• Remove the zero-length string

('W', '151', 'TGH', '')[.]

The predicate says: Give me the value of the sequence.Recall that the value of ('a', '')[.] is just ('a')

See example68

Putting it all Together

tokenize(replace('W151TBH', '([0-9]+|[a-zA-Z]+)', '$1#'), '#')[.]

This produces: ('W', '151', 'TBH')

See example69

What does the predicate apply to?

• What is the result of these statements?

//name[1]

(//name)[1]

Answer

• //name[1] returns the first <name> element in each <planet> element.– Number of elements returned: 3

• (//name)[1] returns the first <name> element among all the <name> elements in all the <planet> elements.– Number of elements returned: 1

See example70

Select the first Book by each Author

<BookStore> <Book> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Book> <Book> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper & Row</Publisher> </Book> <Book> <Title>Jonathan Livingston Seagul</Title> <Author>Richard Bach</Author> <Date>1970</Date> <ISBN>0-684-84684-5</ISBN> <Publisher>Simon & Schuster</Publisher> </Book></BookStore>

Select these two

Select the first Book by each Author

//Book[not(Author = preceding::Book/Author)]

The predicate evaluates to true if the Author of the Book is not the same as the Author of a preceding Book

See example71

Do Lab17

XPath Functions

• http://www.w3schools.com/Xpath/xpath_functions.asp

• http://www.w3.org/TR/xquery-operators/#contents

XPath 2.0 Functions

distinct-values(values)

• This XPath function will return a sequence composed of unique values.

distinct-values((2, 2, 3, 4, 1, 4, 2, 6, 3, 9))

Output: 2 3 4 1 6 9

Note that the sequence ofintegers is wrapped withina pair of parentheses. Why?Because the function takesonly one argument.

See example72

http://www.w3.org/TR/xquery-operators/#func-distinct-values

143<?xml version="1.0"?><FitnessCenter> <Member id="1" level="platinum"> <Name>Jeff</Name> <FavoriteColor>lightgrey</FavoriteColor> </Member> <Member id="2" level="gold"> <Name>David</Name> <FavoriteColor>lightblue</FavoriteColor> </Member> <Member id="3" level="platinum"> <Name>Roger</Name> <FavoriteColor>lightyellow</FavoriteColor> </Member> <Member id="4" level="platinum"> <Name>Sally</Name> <FavoriteColor>lightgrey</FavoriteColor> </Member> <Member id="5" level="platinum"> <Name>Linda</Name> <FavoriteColor>purple</FavoriteColor> </Member></FitnessCenter>

distinct-values(/FitnessCenter/Member/FavoriteColor)

Output: lightgrey lightblue lightyellow purple

Another Example

See example73

Do Lab18

doc(url)

• The doc(url) function is used to retrieve data from another XML document.

doc('FitnessCenter2.xml')

See example74

You must put quotes around the file name.Actually, the argument to doc() is a URL.

http://www.w3.org/TR/xquery-operators/#func-doc

data(item)

• This function returns the (atomic) value of node, i.e., it "atomizes" the node.

• This function is exactly the same as the string(item) function, except the string function always returns the value of the item as a string, whereas the data(item) function returns the value of the item with its type intact.

http://www.w3.org/TR/xquery-operators/#func-data

data(item)

string(/FitnessCenter/Member[1]/MembershipFee) + 1 error

data(/FitnessCenter/Member[1]/MembershipFee) + 1 341

data(340) + 1 341

See example75

error(QName?, description)

• You can raise an error in your XPath using the error() function.

for $i in /FitnessCenter/Member return if (number($i/MembershipFee) lt 0) then error((), 'Invalid value for MembershipFee') else true()

http://www.w3.org/TR/xquery-operators/#func-error See example76

trace(value, message)

• This is used for debugging, to monitor the execution.• The trace() function does two things:

– it returns (outputs) value

– it displays message and information about value

for $i in /FitnessCenter/Member return trace($i/MembershipFee, 'The membership fee is:')

Output:<MembershipFee>340</MembershipFee> <MembershipFee>-500</MembershipFee>

Screen:The membership fee is: [1]: element(MembershipFee, untyped): /FitnessCenter/Member[1]/MembershipFee[1]The membership fee is: [1]: element(MembershipFee, untyped): /FitnessCenter/Member[2]/MembershipFee[1]The membership fee is: [1]: element(MembershipFee, untyped): /FitnessCenter/Member[3]/MembershipFee[1]

http://www.w3.org/TR/xquery-operators/#func-trace See example77

compare(string1, string2)

• This function performs a string comparison of string1 against string2.

• If string1 is less than string2 then it returns -1

• If string1 is equal to string2 then it returns 0

• If string1 is greater than string2 then it returns 1

compare('ab','abc')compare('ab','ab')compare('abc','ab')

Output: -1 0 1

http://www.w3.org/TR/xquery-operators/#func-compare See example78

string-join(sequence, separator)

• The first argument identifies any number of values.

• The function will concatenate all the values, placing separator between each value.

string-join(('a','b','c'),' ')string-join(/FitnessCenter/Member/Name,'/')

Output: a b c Jeff/David/Roger

http://www.w3.org/TR/xquery-operators/#func-string-join See example79

An elegant way of creating the XPath to any node

string-join(for $i in ancestor-or-self::* return name($i),'/')

This returns the name of the currentnode (self) plus all its ancestors

Example: Suppose that the currentnode is FavoriteColor. Then this willreturn: FitnessCenter Member FavoriteColorAnd this function will concatentate thesevalues together, separating each value with /Thus, the output is: FitnessCenter/Member/FavoriteColor See example80

Do Lab19

starts-with(string-to-test, string)

• This function returns true if string-to-test starts with string, false otherwise.

starts-with('abc', 'a')starts-with(/FitnessCenter/Member[1]/FavoriteColor, 'light')

Output: true true Note: this XPath function is also present in version 1.0

See example81http://www.w3.org/TR/xquery-operators/#func-starts-with

ends-with(string-to-test, string)

• This function returns true if string-to-test ends with string, false otherwise.

ends-with('xyz', 'yz')ends-with(/FitnessCenter/Member[1]/FavoriteColor, 'grey')

Output: true true

Note: this XPath function is not present in version 1.0

See example82http://www.w3.org/TR/xquery-operators/#func-ends-with

String Functions You Already Know

• contains(string-to-test, string)

• substring(string, starting-loc, length?)

• substring-before(string, match-string)

• substring-after(string, match-string)

• translate(string, from-pattern, to-pattern)

See example83http://www.w3.org/TR/xquery-operators/#contents

normalize-space(string)

• This function strips leading and trailing whitespace (space, carriage return, tab), and replaces multiple whitespaces within the data by a single space.

normalize-space(' A cat ate the mouse ')normalize-space('There aretwo lines')

Output: A cat ate the mouse There are two lines

See example84http://www.w3.org/TR/xquery-operators/#func-normalize-space

upper-case(string)lower-case(string)

upper-case('hello world')

lower-case('BLUE SKY')

Output: HELLO WORLD

Output: blue sky

See example85

http://www.w3.org/TR/xquery-operators/#func-upper-case

http://www.w3.org/TR/xquery-operators/#func-lower-case

escape-html-uri(uri)

• This function makes a URI usable by browsers, by escaping non-ASCII characters.

escape-html-uri('http://www.example.com?value=Π')

Output: http://www.example.com?value=%CE%A0

See example86http://www.w3.org/TR/xquery-operators/#func-escape-html-uri

year-from-date(xs:date)

• The argument of this function is a date as defined in XML Schemas.

• Recall that the format of a date is: CCYY-MM-DD

year-from-date(xs:date('2009-09-19'))

Output: 2009

See example87http://www.w3.org/TR/xquery-operators/#func-year-from-date

Many Date, Time Functions!

year-from-dateTime(xsd:dateTime)month-from-dateTime(xsd:dateTime)day-from-dateTime(xsd:dateTime)hours-from-dateTime(xsd:dateTime)minutes-from-dateTime(xsd:dateTime)seconds-from-dateTime(xsd:dateTime)timezone-from-dateTime(xsd:dateTime)year-from-date (xsd:date)month-from-date (xsd:date)day-from-date (xsd:date)timezone-from-date (xsd:date)hours-from-time (xsd:time)minutes-from-time (xsd:time)seconds-from-time (xsd:time)timezone-from-time (xsd:time)

http://www.w3.org/TR/xquery-operators/#component-extraction-functions See example88

root(node?)

Document/

PI<?xml version=“1.0”?>

ElementFitnessCenter

ElementMember

ElementName

ElementFavoriteColor

TextJeff

Textlightgrey

ElementName

TextDavid

Textlightblue

ElementName

TextRoger

Textlightyellow

The root() function returnsthe document node

Useful if working with multiple documents

• The root() function can be very useful if are working with multiple documents.

• The following XPath expression outputs the name of every node in the document, regardless of what document is currently being processed.

for $i in root()//* return name($i)

See example89http://www.w3.org/TR/xquery-operators/#func-root

subsequence(sequence, start-loc, length?)

• This function returns a portion of sequence. Namely, it returns the items in sequence starting at index position start-loc. If length is not specified then it returns all the following items in the sequence. Otherwise, it returns length items.

subsequence((1 to 10), 2, 5)subsequence(//Name, 2)

Output: 2,3,4,5,6 <Name>David</Name> <Name>Roger</Name>

See example90http://www.w3.org/TR/xquery-operators/#func-subsequence

Do Lab20

zero-or-one(sequence) one-or-more(sequence)exactly-one(sequence)

• These functions are used to assert that a sequence contains the number of occurrences that you expect.

• Each function will generate an error if the sequence does not contain the expected number of occurrences. If the sequence does contain the expected number of occurrences then it simply returns the sequence

zero-or-one(/FitnessCenter/Member[1]/Name) one-or-more(/FitnessCenter/Member[1]/Phone) exactly-one(/FitnessCenter/Member[1]/FavoriteColor)

See example91http://www.w3.org/TR/xquery-operators/#func-zero-or-one

avg(sequence)

avg((1 to 100))avg(//MembershipFee)

Output: 50.5 393.3333333333

Note that the avg() function has only one argument.Consequently, in the first XPath expression it was necessary to wrap the items with parentheses.

See example92http://www.w3.org/TR/xquery-operators/#func-avg

max(sequence)

• The max() function enables you to obtain the maximum value among a sequence of values.

http://www.w3.org/TR/xpath-functions/#func-max

max((5, 3, 19, 2, -7))max(//MembershipFee)

See example93

Output: 19 500

min(sequence)

• The min() function enables you to obtain the minimum value among a sequence of values.

http://www.w3.org/TR/xpath-functions/#func-max

min((5, 3, 19, 2, -7))min(//MembershipFee)

See example94

Output: -7 340

Why 2 sets of parentheses?

• Did you notice that I used two sets of parentheses in the min and max functions?– min((2,1,3)) and max((2,1,3))

• In fact, if you omitted the inner parenthesis you would get an error message.– min(2,1,3) and max(2,1,3)

Error!

Reason for 2 parentheses

• Both the min and max functions have an optional second argument, collation:

min(sequence, collation?) max(sequence, collation?)

• The collation argument enables you to specify the collating sequence that should be used to determine the min/max value. We will typically just use the default collating sequence. Consequently, we will not use the second argument.

• Do you now understand the need for the 2 parentheses?

min(2,1)

Is this a member of the sequence, or is it a collation?Instead, you must do this: min((2,1))

number(value), string(value)number(value) … "Hey, treat value as a number".string(value) … "Hey, treat value as a string".

09 represents the number 9, which has a string value of '9'

See example95

http://www.w3.org/TR/xquery-operators/#func-number

http://www.w3.org/TR/xquery-operators/#func-string

Lesson Learned

• When you are doing a comparison of two values it is very good practice to wrap your values within either number() or string(). That way you are explicitly telling the XSLT Processor how you want the values compared - as numeric values or as string values.

exists() function

• This function returns either true or false.

• This function is used to determine if an element exists.

if (exists(/FitnessCenter/Member[3])) then 'There is a 3rd Member' else 'Error! No 3rd Member'

Output: There is a 3rd Member

if (exists(/FitnessCenter/Member[99])) then 'There is a 99th Member' else 'Error! No 99th Member'

Output: Error! No 99th Member

http://www.w3.org/TR/xquery-operators/#func-exists

exists(()) = false

exists(())

Output: false

"The empty sequence does not exist"

See example96

empty() function

• This function returns either true or false.

• This function is used to determine if an element does not exist.

if (empty(/FitnessCenter/Member[3])) then 'No 3rd Member' else 'Error! There is a 3rd Member'

Output: Error! There is a 3rd Member

if (empty(/FitnessCenter/Member[99])) then 'No 99th Member' else 'Error! There is a 99th Member'

Output: No 99th Member

http://www.w3.org/TR/xquery-operators/#func-empty See example97

empty(()) = true

empty(())

Output: true

"The empty sequence is empty"

See example97

empty() = not(exists())

empty(/FitnessCenter/Member[3]) eq not(exists(/FitnessCenter/Member[3]))

Output: true

empty(/FitnessCenter/Member[99]) eq not(exists(/FitnessCenter/Member[99]))

Output: true

See example98

deep-equal(sequence1, sequence2)

See example99http://www.w3.org/TR/xquery-operators/#func-deep-equal

• This function returns true if the two sequences are identical in value and position.

operand instance of datatype

• You can use the XPath instance of boolean operator to determine if an operand is of a particular datatype.

• The operand must not be a node. You must first atomize the node, using data(.)

• instance of checks the datatype label on the operand. The label must match datatype. Thus 340 is an instance of xs:integer, but not xs:positiveInteger

http://www.w3.org/TR/xpath20/#id-instance-of

operand instance of datatype

http://www.w3.org/TR/xpath20/#id-instance-of See example100

operand cast as datatype

• You can use the XPath cast as boolean operator to make operand be a particular datatype:

equivalent

See example101http://www.w3.org/TR/xpath20/#id-cast

operand castable as datatype

• You can use the XPath castable as boolean operator to determine if an operand can be cast to a particular datatype:

See example102http://www.w3.org/TR/xpath20/#id-castable

if (//Member[1]/MembershipFee castable as xs:integer) then (//Member[1]/MembershipFee cast as xs:integer) * 2 else false()

name, local-name, namespace-uri

• name() returns whatever is inside <…>• local-name() returns the name that's after the colon

<…:…>• namespace-uri() returns the namespace

See example103

string(node)

• This extracts the data of a node and returns it as a string.

http://www.w3.org/TR/xquery-operators/#func-string See example104

base-uri(node?),document-uri(node)

• These return the filepath/URL to where the XML is executing.

http://www.w3.org/TR/xquery-operators/#func-base-uri

See example105http://www.w3.org/TR/xquery-operators/#func-document-uri

Kind Tests

• Here are different ways to select a kind of item:node(): selects any kind of node

(element, attribute, text, comment, PI, namespace)

text(): selects a text nodeelement(): selects an element nodeelement(Member): selects Member

element nodesattribute(): selects attribute nodesattribute(id): selects id attribute nodesdocument(): selects the document nodecomment(): selects a comment nodeprocessing-instruction(): selects a PI node

Occurrence Indicators

• Use + to indicate one or more

• Use * to indicate zero or more

• Use ? to indicated zero or one

See example107

Please look at these examples; they illustrate the kind test and occurrence indicators

XPath 2.0 is a Strongly Typed Language

• Each XPath 2.0 function returns a value of a specific datatype. The argument(s) that are passed to the function must be of the required datatype.

• Also, the XPath 2.0 operators require the operands be of a required datatype. For example, you cannot perform arithmetic operations on strings without explicitly telling the processor to treat your strings like numbers.

XPath 2.0 is a Strongly Typed Language

• Consider this expression:'3' + 2

Here's the error message that you will get:Arithmetic operator is not defined for arguments of types (xs:string, xs:integer)

• Conversely, in XPath 1.0 the processor automatically coerces the string into a number.

See example35

Advantages of a Strongly Typed System

• Early and reliable identification of errors.– Example: '3' + 2 will generate an error because the type

of the first operand is not appropriate for the operator.• Implementations (XPath processors) can optimize

performance if they know about the types of the data.– Example: Consider this comparison:

//planet/* = 'mars'If the processor knows the datatypes of each child of <planet> then it can just compare the string children against 'mars'

Disadvantages of a Strongly Typed System

• XPath authoring is complicated because more attention must be paid to types.– Example: if you want to compare a number against a

number that is represented as a string then you have to explicitly cast the number to a string and then do the comparison.

• Supporting an extensive type system puts a burden on implementers of XPath. This is why schema awareness is optional for implementers.

XML Schema Datatypes

• XPath 2.0 uses the datatypes defined in the XML Schema Datatypes Specification

XPath Functions are Strongly Typed

• Each XPath function requires arguments to be of a certain datatype.

• Each XPath function returns a result as a certain datatype.

• Example: here is the signature of the current-dateTime function:

current-dateTime() as xs:dateTime Read as: "The current-dateTime function is invoked without any arguments; it returns a value that has the datatype: XML Schema dateTime."

XPath Operators are Strongly Typed

• Each XPath operator requires the operands to be of a certain datatype.

• Each XPath operator returns a result as a certain datatype.

• Example: you can subtract two dateTime values and the result is of type xs:durationcurrent-dateTime() - xs:dateTime('1970-01-01T00:00:00Z') returns P14275DT15H49M28.796S Read as: "The duration between now (Jan. 31, 2009, 10:49am) and Jan. 01, 1970 is 14,275 days, 15 hours, 49 minutes, 28.796 seconds."

See example36

Constructor Functions• Constructor functions are used to construct atomic values with the

specified types.• Example: the constructor:

xs:dateTime('1970-01-01T00:00:00Z')constructs an atomic value whose type is xs:dateTime.

• The signature of the xs:dateTime constructor is:xs:dateTime($arg as xs:anyAtomicType?) as xs:dateTime?

• There is a constructor function for each of the W3C built-in atomic types.

• If the argument is a node, the atomic value is extracted and that value is cast to the type.

• If the argument is an empty sequence, the result is an empty sequence.• The complete list of constructor functions.

196xs:string($arg as xs:anyAtomicType?) as xs:string?xs:boolean($arg as xs:anyAtomicType?) as xs:boolean?xs:decimal($arg as xs:anyAtomicType?) as xs:decimal?xs:float($arg as xs:anyAtomicType?) as xs:float?Implementations ·may· return negative zero for xs:float("-0.0E0"). xs:duration($arg as xs:anyAtomicType?) as xs:duration?xs:dateTime($arg as xs:anyAtomicType?) as xs:dateTime?xs:time($arg as xs:anyAtomicType?) as xs:time?xs:date($arg as xs:anyAtomicType?) as xs:date?xs:gYearMonth($arg as xs:anyAtomicType?) as xs:gYearMonth?xs:gYear($arg as xs:anyAtomicType?) as xs:gYear?xs:gMonthDay($arg as xs:anyAtomicType?) as xs:gMonthDay?xs:gDay($arg as xs:anyAtomicType?) as xs:gDay?xs:gMonth($arg as xs:anyAtomicType?) as xs:gMonth?xs:hexBinary($arg as xs:anyAtomicType?) as xs:hexBinary?xs:base64Binary($arg as xs:anyAtomicType?) as xs:base64Binary?xs:anyURI($arg as xs:anyAtomicType?) as xs:anyURI?xs:QName($arg as xs:anyAtomicType) as xs:QName?xs:normalizedString($arg as xs:anyAtomicType?) as xs:normalizedString?xs:token($arg as xs:anyAtomicType?) as xs:token?xs:language($arg as xs:anyAtomicType?) as xs:language?xs:NMTOKEN($arg as xs:anyAtomicType?) as xs:NMTOKEN?xs:Name($arg as xs:anyAtomicType?) as xs:Name?xs:NCName($arg as xs:anyAtomicType?) as xs:NCName?xs:ID($arg as xs:anyAtomicType?) as xs:ID?xs:IDREF($arg as xs:anyAtomicType?) as xs:IDREF?xs:ENTITY($arg as xs:anyAtomicType?) as xs:ENTITY?xs:integer($arg as xs:anyAtomicType?) as xs:integer?xs:nonPositiveInteger($arg as xs:anyAtomicType?) as xs:nonPositiveInteger?xs:negativeInteger($arg as xs:anyAtomicType?) as xs:negativeInteger?xs:long($arg as xs:anyAtomicType?) as xs:long?xs:int($arg as xs:anyAtomicType?) as xs:int?xs:short($arg as xs:anyAtomicType?) as xs:short?xs:byte($arg as xs:anyAtomicType?) as xs:byte?xs:nonNegativeInteger($arg as xs:anyAtomicType?) as xs:nonNegativeInteger?xs:unsignedLong($arg as xs:anyAtomicType?) as xs:unsignedLong?xs:unsignedInt($arg as xs:anyAtomicType?) as xs:unsignedInt?xs:unsignedShort($arg as xs:anyAtomicType?) as xs:unsignedShort?xs:unsignedByte($arg as xs:anyAtomicType?) as xs:unsignedByte?xs:positiveInteger($arg as xs:anyAtomicType?) as xs:positiveInteger?xs:yearMonthDuration($arg as xs:anyAtomicType?) as xs:yearMonthDuration?xs:dayTimeDuration($arg as xs:anyAtomicType?) as xs:dayTimeDuration?xs:untypedAtomic($arg as xs:anyAtomicType?) as xs:untypedAtomic?

New Datatypes

• The XPath 2.0 working group decided that the XML Schema datatypes are not complete, so they created a few new ones and added them to the XML Schema datatypes.

xs:anyAtomicType

• xs:anyAtomicType is an abstract type that is the base type of all atomic values.

• All datatypes, including the original XML Schema datatypes, are subtypes of xs:anyAtomicType

• "Abstract" means that it cannot be used directly; instead, a subtype must be used.

xs:untypedAtomic

• Any value that has not been associated with a schema type has the type xs:untypedAtomic.

xs:dayTimeDuration

• This is a subtype of xs:duration. It has only day, hour, minute, and second components.

• Subtracting two xs:date values yields a result of type xs:dayTimeDuration

current-date() - xs:date('1970-01-01')

P1Y2M3DT10H30M12.3S

P428DT10H30M12.3S

xs:duration

xs:dayTimeDuration

See example37

subtype

Subtracting Two Dates

• Here's an example of subtracting two xs:date values:

current-date() - xs:date('1970-01-01')

• The resulting value is an xs:dayTimeDuration value.

• Here's how it is specified in the XPath 1.0 and XPath 2.0 Functions and Operators specification:

op:subtract-dates($arg1 as xs:date, $arg2 as xs:date) as xs:dayTimeDuration?

http://www.w3.org/TR/xquery-operators/#func-subtract-dates

"When subtracting two values, each of type xs:date, the resulting value is of type xs:dayTimeDuration."

xs:yearMonthDuration

• This is also a subtype of xs:duration. It has only has the year and month components.

P1Y2M3DT10H30M12.3S

xs:duration

xs:yearMonthDuration

subtype

Datatype of Literals and Expressions

• datatype of current-dateTime() - xs:dateTime('1970-01-01T00:00:00Z') is xs:dayTimeDuration

• datatype of current-date() - xs:date('1970-01-01') is xs:dayTimeDuration

• datatype of 3 is xs:integer

• datatype of 3.14 is xs:decimal

• datatype of "3" is xs:string

• datatype of true is Unknown xs:untypedAtomic

• datatype of true() is xs:boolean

• datatype of 1E3 is xs:double

See example38

Datatype of Input Data Unassociated with a Schema

• datatype of //planet[1]/mass is Unknown xs:untypedAtomic

• datatype of //planet[1]/mass/text() is Unknown xs:untypedAtomic

See example39

Datatype of Arithmetic Operations

• datatype of 2 + 2 is xs:integer

• datatype of 2.0 + 2.0 is xs:decimal

• datatype of 2.0 + 2 is xs:decimal

• datatype of 6 div 2 is xs:integer

• datatype of 6.0 div 2.0 is xs:decimal

• datatype of 6.0 div 2 is xs:decimal

See example40

Numeric Types

• The 4 main numeric types supported in XPath 2.0 are:– xs:decimal

– xs:integer

– xs:float

– xs:double

• All arithmetic operators and functions that can be performed on these types can also be performed on their subtypes.

xs:decimal

• Numeric literals that contain only digits and a decimal point (no letter E or e) are considered to be decimal numbers with the type xs:decimal.

• Example: 25.5 and 25.0 are xs:decimal values.

xs:integer

• Numeric literals that contain only digits (no decimal point or the letter E or e) are considered to be integer numbers with the type xs:integer.

• Example: 25 is an integer value.

xs:float and xs:double

• Numeric literals that contain the letter E or e are considered to be double numbers with the type xs:double.

• Example: 1E3 and 1e3 are xs:double values.

See example41

How a Value becomes Numeric

• The value is a numeric literal• The value is selected from an input document that is

associated with a schema that declares it to have a numeric type

• The value is the result of a function that returns a number, e.g. count(…) returns xs:integer

• The value is the result of a numeric constructor function, e.g. xs:float("25.83") returns a xs:float value

• The value is the result of an explicit cast, e.g., //planet[1]/mass cast as xs:decimal

• The value is cast automatically when it is passed to a function

The number() Function

• The number() function is almost equivalent to the xs:double() constructor function.

• Both return a value of type xs:double.• Differences:

– number("hi") = NaN– xs:double("hi") = error– number(()) = NaN– xs:double(()) = error

See example42

Numeric Type Promotion

• If an operation, such as comparison or an arithmetic operation, is performed on values of two different primitive numeric types, one value's type is promoted to the type of the other.

Operand #1 Operand #2 Promoted to

xs:decimal xs:float xs:float

xs:decimal xs:double xs:double

xs:float xs:double xs:double

Example: 1.0 + 1.2E0 = 2.2E0xs:decimal xs:double xs:double

promote

xs:double

Numeric type promotion happens automatically in arithmetic expressions and comparison expressions. It also occurs in calls to functions that expect numeric values.

See example43

Subtype Substitution

• Wherever a type is expected, you can substitute it with any of its derived types.

• Example: a function that expects a xs:decimal value can be invoked with an xs:integer value since integer derives from decimal.

xpath 2.0 w3/tr/xpath20/ w3/tr/xquery-operators

Documents

xquery - departamento de lenguajes y sistemas informticos

xquery novelties revisited (

xml xquery

展馆总平面图第十五届中国国际铸造博览会...

pal gov.tutorial3.session3.xpath & xquery (lab1)

xpath (and xquery) - logowanie - uniwersytet...

xml transformations, views and updates based on xquery

adt 2010 xml/xquery data management monetdb/xquery (1/2)...

xquery basi di dati ii sara romano. cosè xquery? xquery è...

automatic translation between xquery and xcerpt

xqpn – kolorowane sieci petriego do przetwarzania danych...

Модифициране на xml данни с xquery...

web technologien – xml, xquery, xpath und...

xquery · · 2005-04-07conjunto de documentos que...

vorlesung datenbankeinsatz ws 04/05 ipd Übung 2: xml /...

uporaba tehnologije xquery na primeru iskalnika … ·...

c07. xquery -...

approximate counting of frequent query patterns over xquery...

consultando documentos xml com xquery vânia maria ponte...

zwei welten - berlin damals und heute - abschluss...