ikc 2015
TRANSCRIPT
![Page 1: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/1.jpg)
Mariem Harmassi, Daniela Grigori, Khalid Belhajjame
LAMSADE, Université Paris Dauphine
Mining Workflow Repositories for Improving Fragments Reuse
![Page 2: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/2.jpg)
IKC 2015
Workflows
A business process specified using the BPMN notation A Scientific Workflow
system (Taverna)
A workflow consists of an orchestrated and repeatable pattern of business activity enabled by the systematic organization of resources into processes that transform materials, provide services, or process information (Workflow Coalition)
2
![Page 3: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/3.jpg)
IKC 2015
Scientific WorkflowsScientific workflows are
increasingly used by scientists as a means for specifying and enacting their experiments.
They tend to be data intensive
The data sets obtained as a result of their enactment can be stored in public repositories to be queried, analyzed and used to feed the execution of other workflows.
3
![Page 4: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/4.jpg)
IKC 2015
Workflows are difficult to designThe design of scientific workflows, just like business
process, can be a difficult taskDeep knowledge of the domainAwareness of the resources, e.g., programs and
web services, that can enact the steps of the workflowPublish and share workflows, and promote their
reuse.myExperiment, CrowldLab, Galaxy, and other various
business process repositoryReuse is still an aim.
There are no capabilities that support the user in identifying the workflows, or fragments thereof, that are relevant for the task at hand.
4
![Page 5: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/5.jpg)
IKC 2015
Fragment look-up in the life cycle of workflow design
Design Workflow Search Fragments
Run Workflow
PublishWorkflowWorkflow
repositories5
![Page 6: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/6.jpg)
IKC 2015
Workflow Fragments SearchWhy is it useful for?
The workflow designer knows the steps of the fragment and their dependencies, but does not know the resources (programs or web services) that can be used for their implementation.
The designer may want to know how colleagues and third parties designed the fragment (best practices)
Elements of the solution1. Filtering: Instead of search the whole repository,
we limit the number of workflows in the repository to be examined to those that are relevant to the user
2. Identify the fragments that are reccurrent in the workflows retrieved in (1)
6
![Page 7: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/7.jpg)
IKC 2015
1 - Filtering step
Workflow
XML
Workflow graph
List of keyword
s
List of keywords
& synonyms
Wordnet
BPRepository
Filter
Else
7
![Page 8: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/8.jpg)
IKC 2015
2- Identify Recurrent FragmentsWe use graph mining algorithms to identify
the fragments in the repository that are recurrent.We use the SUBDUE algorithm.
Which graph representation to use to represent (workflow) fragments?We examined a number of workflow
representation
8
![Page 9: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/9.jpg)
IKC 2015
Representation Aatt1
att2
att3
att4
att5
next
operator
And
operator
sequence
next
operand
operator
Xor
type
type
operand
next
operand
typeoperand operan
d
Representation Batt1
att2
att3
att4
att5
next
Split-And
next
Join-Xor
J-Xor
sequence
next
sp-andsp-and
9
![Page 10: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/10.jpg)
IKC 2015
Representation C
att2
att3
att4
att5
att1
S-att1-att2 S-att1-att3
seq-att2-att4
seq-att4-att5
att2
att3
att5
att1
S-att1-att2 S-att1-att3
seq-att3-att5
10
![Page 11: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/11.jpg)
IKC 2015
att1
att2
att3
att4
att5
And_att1_att3
And_att1_att2
XOR_att3_att5
SEQ_att2_att4
XOR_att4_att5
Representation D Representation D1att1
att2
att3
att4
att5
And
And
XOR
SEQ
XOR
11
![Page 12: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/12.jpg)
IKC 2015
Experiments
1st experiment: To assess the suitability of the graph representations for mining workflow graphsEffectiveness : Precision/ RecallMemory space : Disk space, DIVExecution time
2nd experiment: To assess the impact of the filtering step in narrowing the search to relevant workflow fragments.
12
![Page 13: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/13.jpg)
IKC 2015
Experiment 1: Dataset
We created three datasets of workflow specifications, containing respectively 30, 42, and 71 workflows.
9 out of these workflows are similar to each other and, as uch contain recurrent structures, that should be detected by the mining algorithm.
Despite the small size of the collection, these datasets allowed to distinguish to a certain extent between the different representations.
13
![Page 14: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/14.jpg)
IKC 2015
Experimentation1:Input Data size
14
![Page 15: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/15.jpg)
IKC 2015
Experiment1: Effectiveness (Precision/ Recall)
15
![Page 16: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/16.jpg)
IKC 2015
Representation Aatt1
att2
att3
att4
att5
next
operator
And
operator
sequence
next
operand
operator
Xor
type
type
operand
next
operand
typeoperand operan
d
Representation Batt1
att2
att3
att4
att5
next
Split-And
next
Join-Xor
J-Xor
sequence
next
sp-andsp-and
16
![Page 17: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/17.jpg)
IKC 2015
Experiment1: Effectiveness (Precision/ Recall)
17
![Page 18: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/18.jpg)
IKC 2015
Experiment1: Execution Time
≥ 55 times
≥ 25 times
≈ 4 times
≈ 5 times
18
![Page 19: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/19.jpg)
IKC 2015
Experiment1: Summarycontrol nodes : recurrent patterns typical coding scheme
related to the model rule Recall
Labeling the edges: specializations of the same abstract workflow.Precision
Xor as a set of alternatives: duplication , loss of informations Recall Precision
The Representation D1 seems to be therefore the one that performs best
19
![Page 20: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/20.jpg)
IKC 2015
Experiment 2Data sets: All Taverna 1 workflows (498
workflows) from myExperimentUser query: We use a small fragment from a
workflow in myExperiment.
20
![Page 21: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/21.jpg)
IKC 2015
Conclusion
Methodology for improving the reusabilityModel of representation D + Filter
Improve the filter Test others similarity measures
Need to assess the usefulness of the technics presented in practice. And how they can be incorporated in the workflow design life cycle.In the context of the Contextual and
Aggregrated Information Retrieval (CAIR) project
21
![Page 22: Ikc 2015](https://reader037.vdocuments.pub/reader037/viewer/2022101301/58ef2ee91a28abd8628b45c7/html5/thumbnails/22.jpg)
Mariem Harmassi, Daniela Grigori, Khalid Belhajjame
LAMSADE, Université Paris Dauphine
Mining Workflow Repositories for Improving Fragments Reuse