masahiro tanaka university of tsukuba - hpcs lab.tanaka/publications/open/rubyconf2010... ·...

69
Masahiro Tanaka University of Tsukuba 2010-11-14 1 RubyConf X

Upload: lamkiet

Post on 25-Mar-2018

218 views

Category:

Documents


4 download

TRANSCRIPT

Masahiro Tanaka

University of Tsukuba

2010-11-14 1RubyConf X

Masahiro Tanaka

NArray author

majored in Astronomy

Research fellow in Computer Science◦ at Center for Computational Sciences,

University of Tsukuba

◦ since 2009

2010-11-14 2RubyConf X

Research Fields

◦ Computer Science:

High Performance Computing

Computational Informatics

◦ Computational Science:

Particle Physics, Astrophysics,

Material Science, Life Science,

Biology, Environmental Science

2010-11-14 3RubyConf X

FIRST

◦ 512 cores+BladeGRAPE

◦ 36 TFLOPS

PACS-CS

◦ 2,560 cores

◦ 14.4 TFOPS

T2K Tsukuba

◦ 10,368 cores

◦ 95 TFLOPS

2010-11-14RubyConf X 4

Conference on SuperComputer

> 10,000 participants

2010-11-14RubyConf X 5

We are here

SC10 venueErnest N. Morial

Convention Centerexhibit Nov 15-18

2010-11-14 6RubyConf X

CCS booth at SC09

2010-11-14RubyConf X 7

Pwrake : a Distributed Workflow

Engine for e-Science

2010-11-14 8RubyConf X

2010-11-14 9RubyConf X

LHCParticle Accelerator

ALMARadio Observatory

2010-11-14 10RubyConf X

2010-11-14RubyConf X 11

http://www.sinet.ad.jp/case-examples/tsukuba

: Sharing QCD Simulation data

Computationally intensive science that is carried out in highly distributed network environments,

or

Science that uses immense data sets that require grid computing

◦ (Wikipedia).

2010-11-14 12RubyConf X

The term was created by John Taylor,

◦ Director General of the United Kingdom's Office of Science and Technology

◦ in 1999

2010-11-14RubyConf X 13

is a key issue for e-Science.

2010-11-14 14RubyConf X

Performance of single core does no more increase.

2010-11-14 15RubyConf X

2010-11-14 16RubyConf X

2010-11-14 17RubyConf X

Parallelize your program

Scalability is an key issue

2010-11-14RubyConf X 18

P : parallelizable

1-P : sequential

N : # of processors

Speed-up formula :

2010-11-14RubyConf X 19

Number of Processors

Speed u

p

1

1− P + P/N

P<1

1

1− P

MapReduce

MPI

OpenMP

thread

Parallel programming languages

process

2010-11-14RubyConf X 20

Independent processes can be parallelized

Without parallel programming

Workflow System is required

2010-11-14RubyConf X 21

input1

program

output1

...input2

program

output2

input3

program

output3

input4

program

output4

...

...

2010-11-14RubyConf X 22

Description of procedures

It is like building a program

2010-11-14RubyConf X 23

cc –c –o a.o a.c

a.c b.c

b.oa.o

cc –o prog …

prog

cc –c –o b.o b.c

Task: Ellipse Node

File: Rectangle Node

Dependency: Edge

DAG

◦ Directed Acyclic Graph

2010-11-14 24RubyConf X

Input File

Task

Output File

Dependency

Montage

◦ software for producing a custom mosaic image from multiple shots of images.

◦ http://montage.ipac.caltech.edu/

2010-11-14 25RubyConf X

Tasks:

◦ Projection

◦ Brightness correction

◦ Coadding

1 image : 1 process

26RubyConf X 2010-11-14

mProjectPP

mDiff

mBgModel

mBackground

mAddmFitplane

m1

= a'1x+b'

1y+c'

1

m2

= a'2x+b'2y+c'

2

a1x+b

1y+c

1=0 a

2x+b

2y+c

2=0

Final image

Input images

flow

Inputfiles

Outputfile

2010-11-14 27RubyConf X

◦ invoke task based on dependency

◦ assign a task to an available computer

◦ parallel execution for independent tasks

2010-11-14RubyConf X 28

ProcessProcess

Process

Workflow System

Workflowdefinition

for Grid Computing

2010-11-14RubyConf X 29

• DAGMan• Pegasus• Triana• ICENI• Taverna• GrADS• GridFlow• UNICORE• Globus workflow• Askalan• Karajan• Kepler

from “A Taxonomy of Scientific Workflow Systems for Grid Computing”Jia Yu and Rajkumar Buyya (2005)

Define DAG in XML

◦ Human cannot write complex XML.

◦ Need to write a program to generate XML

2010-11-14RubyConf X 30

<adag xmlns="http://www.griphyn.org/chimera/DAX"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.griphyn.org/chimera/DAX http://www.griphyn.org/chimera/dax-1.8.xsd"

count="1" index="0" name="test">

<filename file="2mass-atlas-981204n-j0160056.fits" link="input"/>

<job id="ID000001" name="mProject" version="3.0" level="11" dv-name="mProject1" dv-version="1.0">

<argument>

<filename file="2mass-atlas-981204n-j0160056.fits"/>

<filename file="p2mass-atlas-981204n-j0160056.fits"/>

<filename file="templateTMP_AAAaaa01.hdr"/>

</argument>

<uses file="2mass-atlas-981204n-j0160056.fits" link="input" dontRegister="false" dontTransfer="false"/>

<uses file="p2mass-atlas-981204n-j0160056.fits" link="output" dontRegister="true" dontTransfer="true" temporaryHint="tmp"/>

<uses file="p2mass-atlas-981204n-j0160056_area.fits" link="output" dontRegister="true" dontTransfer="true" temporaryHint="tmp"/>

<uses file="templateTMP_AAAaaa01.hdr" link="input" dontRegister="false" dontTransfer="false"/>

</job>

<child ref="ID003006">

<parent ref="ID000001"/>

<parent ref="ID000006"/>

</child>

DAG XML

DSL to define task dependency

Rule

◦ define multiple tasks at once

◦ avoid redundancy

Skip finished tasks

◦ based on timestamp of file

2010-11-14RubyConf X 31

Grid Explorer : Grid and Cluster shell

◦ http://www.logos.ic.i.u-tokyo.ac.jp/gxp/

◦ written in Python.

GXP Make

◦ GNU Make-based workflow system

◦ Distributed & Parallel execution

2010-11-14RubyConf X 32

Makefile

◦ same input files

◦ same tasks

◦ executed repeatedly

Scientific Workflows have different aspects.

2010-11-14 33RubyConf X

Same workflow for different files

◦ “rule” may solve, but is not enough.

2010-11-14RubyConf X 34

File set 1file file file

File set 2file file file

Workflow

result 1 result 2

Task dependencies rely on :

◦ Not only file name

◦ Parameters, e.g. Geometry

2010-11-14RubyConf X 35

A B

CD

A

task

B C D

task task task

Entire workflow is unknown at first

Result of a task affect

◦ Output files

◦ Afterward tasks

2010-11-14RubyConf X 36

Task

file3file1 file2

Task Task

create Makefile during Make execution

tricky way

MakefileMakefile.sub: prerequisite

awk –f hoge.awk $< > $@

target: Makefile.sub

make -f $<

Makefile.subtarget1: source1

target2: source2

create

invoke

2010-11-14 37RubyConf X

Scientific workflow requires powerfuland flexible definition language.

You probably know the solution.

◦ What is it?

2010-11-14 38RubyConf X

2010-11-14RubyConf X 39

Build tool

Internal DSL

Programming power of Ruby

2010-11-14RubyConf X 40

file "file2" => "file1" do

sh "program file1 > file2"

end

2010-11-14RubyConf X 41

file1

program

file2

for x in LIST

file x[1] => x[0] do |t|

sh "your_program …"

end

end

2010-11-14 42RubyConf X

How do your write it with Rake?

2010-11-14RubyConf X 43

Task

file3file1 file2

Task Task

task :A do

task :B do

puts “B”

end

end

task :default => :A

No task depends on Task B

task :A do

b = task :B do

puts “B”

end

b.invoke

end

task :default => :A

Rake::Task#invoke

Invoke Task B immediately after definition

multitask

◦ Rake built-in feature

◦ Parallelize prerequisite tasks of multitask

◦ Ruby thread

Problem

◦ No control for the number of thread.

◦ All the prerequisite tasks are invoked at the same time.

2010-11-14RubyConf X 46

http://drake.rubyforge.org/

Specify the number of threads

All the independent task are automatically parallelized.

◦ multitask is not necessary

2010-11-14RubyConf X 47

Find task

2010-11-14RubyConf X 48

D E F

C

A

B

Queue to workers

worker thread 1

worker thread 2

Queue from workers

B

Available task

...

Remote process execution

Dynamic task definition

◦ dRake does not allows “invoke” method.

Performance issue

2010-11-14 49RubyConf X

Need Powerful Scientific Workflow tool

Existing

◦ Rake : Powerful for writing workflow

◦ dRake : Parallel execution

Missing

◦ Remote Process Invocation

◦ Scalability

2010-11-14RubyConf X 50

2010-11-14RubyConf X 51

ProcessProcess

Process

Rake

Rakefile

Gfarm filesystem

file file file

Pwrakeextension

2010-11-14RubyConf X 52

Parallel + distributed

Workflow

extension for Rake

repository:

◦ http://github.com/masa16/pwrake

2010-11-14RubyConf X 53

Same syntax as Rake.

Parallelize task, file

◦ no multitask

Replace “sh” method

◦ invoke process through SSH

Scalability

2010-11-14 54RubyConf X

Why SSH

◦ Secure

◦ Probably SSH port is available

SSH class for Pwrake

◦ Original implementation

◦ Performance issues

2010-11-14RubyConf X 55

Worker thread in Ruby

Ruby thread uses single-core◦ GVL

sh process uses multi-core.

for x in LIST

file x[1] => x[0] do |t|

sh "your_program …"

end

end

2010-11-14RubyConf X 56

here uses multi-core

2010-11-14RubyConf X 57

x10faster

dRake

Pwrake

Our approach:

use Distributed Filesystem

◦ file sharing

◦ consistent file timestamp

◦ I/O performance

2010-11-14RubyConf X 58

Storage

CPU CPU CPU

file1 file2 file3

Storage

CPU CPU CPU

Storage I/O becomes

bottleneck StorageStorage

file1 file2 file3

Network File System Distributed File System

Parallel Processing

59RubyConf X 2010-11-14

2010-11-14RubyConf X 60

Wide-area distributed file system

Global namespace to federate storages

Main developer : Prof. Osamu Tatebe

Open source development◦ http://datafarm.apgrid.org/

LocalStorage

LocalStorage

LocalStorage

InternetGfarm File System

/

/dir1

file1 file2

/dir2

file3 file4

Computer nodes

LocalStorage

LocalStorage

LocalStorage

61RubyConf X 2010-11-14

use Local I/O for performance

assign task based on File locality

implement as a function of Pwrake

62RubyConf X 2010-11-14

LocalStorage

LocalStorage

LocalStorage

File System Nodes

file1 file2 file3

LocalStoragefile4

Job forFile 1

Job forFile 3

Job forFile 3

Slow

Fast

Locality-aware task assignment for Gfarm

2010-11-14RubyConf X 63

Task1

Task2

Task3

AffinityQueue

worker thread1

worker thread2

worker thread3

pushwith hostname

Queue for

host1

popwith hostname

Queue for

host2

Queue for

host3

… …

1 node4 cores

2 nodes8 cores

4 nodes16 cores

8 nodes32 cores

64RubyConf X 2010-11-14

NFSdoes not

scale

1 node4 cores

2 nodes8 cores

4 nodes16 cores

8 nodes32 cores

65RubyConf X 2010-11-14

Gfarmscales

20% Speedup

using locality

Montage workflow

2010-11-14RubyConf X 66

Geographically distributed wokrflow

Fault tolerance

2010-11-14RubyConf X 67

Rake

◦ is so powerful to be used for Scientific definition language.

Pwrake

◦ Parallel and Distributed Workflow extension for Rake

Gfarm

◦ for scalable I/O performance

2010-11-14 68RubyConf X

Pwrake site

◦ https://github.com/masa16/pwrake

Questions?

2010-11-14RubyConf X 69