ikc 2015

22

Click here to load reader

Upload: khalid-belhajjame

Post on 13-Apr-2017

666 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Ikc 2015

Mariem Harmassi, Daniela Grigori, Khalid Belhajjame

LAMSADE, Université Paris Dauphine

Mining Workflow Repositories for Improving Fragments Reuse

Page 2: Ikc 2015

IKC 2015

Workflows

A business process specified using the BPMN notation A Scientific Workflow

system (Taverna)

A workflow consists of an orchestrated and repeatable pattern of business activity enabled by the systematic organization of resources into processes that transform materials, provide services, or process information (Workflow Coalition)

2

Page 3: Ikc 2015

IKC 2015

Scientific WorkflowsScientific workflows are

increasingly used by scientists as a means for specifying and enacting their experiments.

They tend to be data intensive

The data sets obtained as a result of their enactment can be stored in public repositories to be queried, analyzed and used to feed the execution of other workflows.

3

Page 4: Ikc 2015

IKC 2015

Workflows are difficult to designThe design of scientific workflows, just like business

process, can be a difficult taskDeep knowledge of the domainAwareness of the resources, e.g., programs and

web services, that can enact the steps of the workflowPublish and share workflows, and promote their

reuse.myExperiment, CrowldLab, Galaxy, and other various

business process repositoryReuse is still an aim.

There are no capabilities that support the user in identifying the workflows, or fragments thereof, that are relevant for the task at hand.

4

Page 5: Ikc 2015

IKC 2015

Fragment look-up in the life cycle of workflow design

Design Workflow Search Fragments

Run Workflow

PublishWorkflowWorkflow

repositories5

Page 6: Ikc 2015

IKC 2015

Workflow Fragments SearchWhy is it useful for?

The workflow designer knows the steps of the fragment and their dependencies, but does not know the resources (programs or web services) that can be used for their implementation.

The designer may want to know how colleagues and third parties designed the fragment (best practices)

Elements of the solution1. Filtering: Instead of search the whole repository,

we limit the number of workflows in the repository to be examined to those that are relevant to the user

2. Identify the fragments that are reccurrent in the workflows retrieved in (1)

6

Page 7: Ikc 2015

IKC 2015

1 - Filtering step

Workflow

XML

Workflow graph

List of keyword

s

List of keywords

& synonyms

Wordnet

BPRepository

Filter

Else

7

Page 8: Ikc 2015

IKC 2015

2- Identify Recurrent FragmentsWe use graph mining algorithms to identify

the fragments in the repository that are recurrent.We use the SUBDUE algorithm.

Which graph representation to use to represent (workflow) fragments?We examined a number of workflow

representation

8

Page 9: Ikc 2015

IKC 2015

Representation Aatt1

att2

att3

att4

att5

next

operator

And

operator

sequence

next

operand

operator

Xor

type

type

operand

next

operand

typeoperand operan

d

Representation Batt1

att2

att3

att4

att5

next

Split-And

next

Join-Xor

J-Xor

sequence

next

sp-andsp-and

9

Page 10: Ikc 2015

IKC 2015

Representation C

att2

att3

att4

att5

att1

S-att1-att2 S-att1-att3

seq-att2-att4

seq-att4-att5

att2

att3

att5

att1

S-att1-att2 S-att1-att3

seq-att3-att5

10

Page 11: Ikc 2015

IKC 2015

att1

att2

att3

att4

att5

And_att1_att3

And_att1_att2

XOR_att3_att5

SEQ_att2_att4

XOR_att4_att5

Representation D Representation D1att1

att2

att3

att4

att5

And

And

XOR

SEQ

XOR

11

Page 12: Ikc 2015

IKC 2015

Experiments

1st experiment: To assess the suitability of the graph representations for mining workflow graphsEffectiveness : Precision/ RecallMemory space : Disk space, DIVExecution time

2nd experiment: To assess the impact of the filtering step in narrowing the search to relevant workflow fragments.

12

Page 13: Ikc 2015

IKC 2015

Experiment 1: Dataset

We created three datasets of workflow specifications, containing respectively 30, 42, and 71 workflows.

9 out of these workflows are similar to each other and, as uch contain recurrent structures, that should be detected by the mining algorithm.

Despite the small size of the collection, these datasets allowed to distinguish to a certain extent between the different representations.

13

Page 14: Ikc 2015

IKC 2015

Experimentation1:Input Data size

14

Page 15: Ikc 2015

IKC 2015

Experiment1: Effectiveness (Precision/ Recall)

15

Page 16: Ikc 2015

IKC 2015

Representation Aatt1

att2

att3

att4

att5

next

operator

And

operator

sequence

next

operand

operator

Xor

type

type

operand

next

operand

typeoperand operan

d

Representation Batt1

att2

att3

att4

att5

next

Split-And

next

Join-Xor

J-Xor

sequence

next

sp-andsp-and

16

Page 17: Ikc 2015

IKC 2015

Experiment1: Effectiveness (Precision/ Recall)

17

Page 18: Ikc 2015

IKC 2015

Experiment1: Execution Time

≥ 55 times

≥ 25 times

≈ 4 times

≈ 5 times

18

Page 19: Ikc 2015

IKC 2015

Experiment1: Summarycontrol nodes : recurrent patterns typical coding scheme

related to the model rule Recall

Labeling the edges: specializations of the same abstract workflow.Precision

Xor as a set of alternatives: duplication , loss of informations Recall Precision

The Representation D1 seems to be therefore the one that performs best

19

Page 20: Ikc 2015

IKC 2015

Experiment 2Data sets: All Taverna 1 workflows (498

workflows) from myExperimentUser query: We use a small fragment from a

workflow in myExperiment.

20

Page 21: Ikc 2015

IKC 2015

Conclusion

Methodology for improving the reusabilityModel of representation D + Filter

Improve the filter Test others similarity measures

Need to assess the usefulness of the technics presented in practice. And how they can be incorporated in the workflow design life cycle.In the context of the Contextual and

Aggregrated Information Retrieval (CAIR) project

21

Page 22: Ikc 2015

Mariem Harmassi, Daniela Grigori, Khalid Belhajjame

LAMSADE, Université Paris Dauphine

Mining Workflow Repositories for Improving Fragments Reuse