exvisxml , uma ferramenta emblemática na análise documental
DESCRIPTION
eXVisXML , uma ferramenta emblemática na análise documental. Daniela da Cruz, Pedro Rangel Henriques Departamento de Informática Universidade do Minho. Context. Motivation. Motivation. Motivation. Motivation. Motivation. XML Document Visualization. - PowerPoint PPT PresentationTRANSCRIPT
EXVISXML, UMA FERRAMENTA EMBLEMÁTICA NA ANÁLISE DOCUMENTAL
Daniela da Cruz, Pedro Rangel Henriques
Departamento de InformáticaUniversidade do Minho
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
CONTEXT 6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
MOTIVATION 6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
MOTIVATION 6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
MOTIVATION 6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
MOTIVATION 6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
MOTIVATION 6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT VISUALIZATION
The role of the visualization technology (in PC and SE) is recognized as very fruitful.
The use of SV features allows us to capture a great amount of information in a faster way
Graphical representations cause a positive impact in learning process
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT VISUALIZATION
Retrieve information from plain documents efficiently IS NOT AN EASY TASK
Machine manipulation: XSL and other production-systems can easily
extract information and transform them
Human manipulation: It is not as easy as desirable The annotation is complex / Document is too big
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT VISUALIZATION
Many tools appear to aid in the visualization of XML documents: XML Schema Designer (Microsoft) Xpath Analyzer (Altova) …
Although these tools offer highlighted syntax, and easy manipulation (collapse/expand), their view is a hierarchical and textual.
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
TRADITIONAL XML DOCUMENT VISUALIZATION 6
Março
, 20
09
Univ
ersid
ade d
e A
veiro
OUR PROPOSAL FOR XML DOCUMENT VISUALIZATION
In this context, we want to get a visualization that makes easier the comprehension process.
However, we should take care with the graphical or iconic representations hence it depends on problem domain.
Inspired in Alma, the eXVisXML interface for the visual inspection of XML documents is divided into 3 main parts:
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
OUR PROPOSAL FOR XML DOCUMENT VISUALIZATION
One window that displays the source document;
One window exhibiting the textual hierarchy
One window to show the tree associated with the source document (graphical);
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
OUR PROPOSAL FOR XML DOCUMENT VISUALIZATION 6
Março
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT SLICING
Slicing concept appears in 1979, by Weiser.
Its applied to a program considering a slicing criterion (a pair composed by a line number and a set of variables).
The objective is to find the statements that possibly affect those variables.
This technique can be also applied to XML documents. How?
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT SLICING
XML document + slicing criterion (a Xpath expression can be regarded as a slicing criterion, but simplified)
A document slice is a new XML document composed by those elements that are strictly necessary to maintain the tree structure.
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT SLICING
It is proved, by Josep Silva, in Slicing XML documents, that slicing techniques applied to XML and DTD documents produce valid XML and DTD slices with the respect to the slicing criterion.
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT SLICING
Given the whole XML document of Romeo and Juliet screenplay
and
The slicing criterion Greg
the result is:
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENTS SLICING 6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
Effective management of any process requires quantification, measurement, and modeling.
Software metrics provide a quantitative basis for the development and validation of models of the software development process
Metrics can be used to improve software productivity and quality
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
In the field of XML, quality assessment is also relevant because the approach followed by engineers or end-users, to design the annotation schema or even to markup existent tests, is many times improvised and naïf.
Concepts like well-formedness or validity are not sufficient to appraise XML documents.
So, a set of metrics were defined to form the basis of the quality measurement of a XML document.
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
Size Structure Complexity Structure Depth Fan-in / Fan-out Instability Tree impurity Attributes per Element Non-used components Text length
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
Sucessor Graph
Given a DTD, we say that a new component (element/attribute) is an immediate successor of the element under definition.
Then, we introduce an arrow (oriented edge) from the element to the component.
Example:< !ELEMENT Item (FileName, Artist?) >
<!ELEMENT FileName (#PCDATA)>
<!ELEMENT Artist (#PCDATA)>
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
SUCESSOR GRAPH (ROMEO AND JULIET
SCREENPLAY) 6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
Size
Given a DTD, its size (i.e. the value for this metric) is the total number of nodes in the SG (number of DTD components).
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
Structure complexity
Where e is the number of edges in the SG, n is the number of nodes in the SG and n_idref is the number of IDREF attributes.
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
Structure Depth
According to Meike Klettke, in Metrics for XML document collections, a SG with a depth much higher than 7 is complex and reveals a bad DTD design.
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
Fan-in / Fan-out
For the graph as a whole, the average and the maximum values for those parameters can be useful to spot unusual nodes, which can be inspected to detect the anomaly and fix the problem.
Elements with a high Fan-in/Fan-out value are more complex than other elements with a lower value.
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
Instability
A node with a low instability allows us to conclude that it is less dependent of other nodes, while many nodes are depend on it.
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
Tree Impurity
A tree impurity of 0% means that a graph is a tree and a tree impurity of 100% means that it is a fully connected graph.
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
Attributes per Element
The AttrsEle(DTD) metric allows us to figure out the average number of attributes defined per element in the DTD.
The AttrsEle(XML) metric, applied directly to the XML document, allows us to figure out the average number of attributes actually used per effective elements present in the XML document.
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
Non-used Components
if Attr(DTD) represents the set of attributes defined in the DTD, and Attr(XML) represents the set of actual attributes (the attributes used in the XML document instance), then NonAttr(XML) is the set of non-used attributes.
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
XML DOCUMENT METRICS
Text Length
where, length(PCDATA) computes the total length of the document's text (the sum of the length of all text fragments, i.e., text associated with element tags, or untagged text), and nPCDATA is the number of text fragments (the number of PCDATA leaves that appear in the XML document tree).
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
METRIC RESULTS (ROMEO AND JULIET SCREENPLAY)
Metric Value
Size 27
Structure Complexity 13
Structure Depth 7
Fan-in (node scene) 3
Fan-out (node scene) 6
Instability 3,3%
Tree impurity 58,9%
Attributes per Elem (DTD) 0,08
Attributes per Elem (XML) 0,027
Non Used Components (Elem)
1 (stagedir element)
Text Length (Elem) 37,46
Text Length (Attr) 1
6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro
CONCLUSION 6 M
arço
, 20
09
Univ
ersid
ade d
e A
veiro