parallel processing with ipython

16
Parallel Processing with IPython January 22, 2010

Upload: enthought-inc

Post on 27-May-2015

6.660 views

Category:

Technology


3 download

DESCRIPTION

In this screencast, Travis Oliphant gives an introduction to IPython, an extremely useful tool for task-based parallel processing with Python.

TRANSCRIPT

Page 1: Parallel Processing with IPython

Parallel Processing with IPython

January 22, 2010

Page 2: Parallel Processing with IPython

Enthought Python Distribution (EPD)

MORE THAN SIXTY INTEGRATED PACKAGES

• Python 2.6

• Science (NumPy, SciPy, etc.)

• Plotting (Chaco, Matplotlib)

• Visualization (VTK, Mayavi)

• Multi-language Integration (SWIG,Pyrex, f2py, weave)

• Repository access

• Data Storage (HDF, NetCDF, etc.)

• Networking (twisted)

• User Interface (wxPython, Traits UI)

• Enthought Tool Suite (Application Development Tools)

Page 3: Parallel Processing with IPython

Enthought Training Courses

Python Basics, NumPy, SciPy, Matplotlib, Chaco, Traits, TraitsUI, …

Page 4: Parallel Processing with IPython

PyCon

http://us.pycon.org/2010/tutorials/

Introduction to TraitsIntroduction to Enthought Tool Suite

Fantastic deal (normally $700 at PyConget the same material for $275)

Corran Webster

Page 5: Parallel Processing with IPython

Upcoming Training ClassesMarch 1 – 5, 2009 Python for Scientists and Engineers Austin, Texas, USA

March 1 – 5, 2009 Python for Quants London, UK

http://www.enthought.com/training/

Page 6: Parallel Processing with IPython

6

Parallel Processingwith

IPython

Page 7: Parallel Processing with IPython

7

IPython.kernel

• IPython's interactive kernel provides a simple (but powerful) interface for task-based parallel programming.

• Allows fast development and tuning of task-parallel algorithm to better utilize resources.

Page 8: Parallel Processing with IPython

8

Getting started --- local clustermanually WINDOWSUNIX and OSX (and now WINDOWS)

# run ipcluster to start-up a # controller and a set of engines$ ipcluster local –n 4Your cluster is up and running.

...

You can then cleanly stop the cluster from IPython using:

mec.kill(controller=True)

You can also hit Ctrl-C to stop it, or use from the cmd line:

kill -INT 20465

Creates several key-files in ~/.ipython/security :

ipcontroller-engine.furl ipcontroller-mec.furl ipcontroller-tc.furl

# run ipcontroller and then# ipengine for each desired engine> start /B C:\Python25\Scripts\ipcontroller.exe> start /B C:\Python25\Scripts\ipengine.exe> start /B C:\Python25\Scripts\ipengine.exe> start /B C:\Python25\Scripts\ipengine.exe...2009-02-11 23:58:26-0600 [-] Log opened.2009-02-11 23:58:28-0600 [-] Using furl file: C:\Documents and Settings\demo\_ipython\security\ipcontroller-engine.furl2009-02-11 23:58:28-0600 [-] registered engine with id: 32009-02-11 23:58:28-0600 [-] distributing Tasks2009-02-11 23:58:28-0600 [Negotiation,client] engine registration succeeded, got id: 3

Creates several key-files in %HOME%\_ipython\security :

ipcontroller-engine.furl ipcontroller-mec.furl ipcontroller-tc.furl

Page 9: Parallel Processing with IPython

9

Getting started -- distributed• Run ipcontroller on a host and create .furl files

• Creates separate .furl files to be used by the different connections (engine, multiengine client, task client).

• Places .furl files by default in ~/.ipython/security (UNIX or Mac OSX) or %HOME%\_ipython\security (Windows).

• Takes --<connection>-furl-file=FILENAME options where <connection> is engine, multiengine, or task to place the .furl files somewhere else.

• Ensure the ipcontroller-engine.furl file is available to each host that will run an engine and run ipengine on these hosts.• Either place it in the default security directory

• Use the –furl-file=FILENAME option to ipengine

• Ensure the multiengine (task) .furl file is available to each host that will run a multiengine (task) client. • Either place it in the default security directory

• Pass the FILENAME as the first argument to the constructor

Page 10: Parallel Processing with IPython

10

Initialize client

TASKCLIENTMULTIENGINECLIENT

# * allows fine-grained control# * each engine has an id number# * more intuitive for beginners# optional argument can be # location of mec furl-file# created by the controller>>> mec = client.MultiEngineClient()>>> mec.get_ids()[0 1 2 3]

>>> from IPython.kernel import client

# * does not expose individual # engines# * presents a load-balanced,# fault-tolerant queue# optional argument can be # location of tc furl-file# created by the controller>>> tc = client.TaskClient()

mec.map -- parallel mapmec.parallel –- parallel functionmec.execute -- execute in parallelmec.push -- push datamec.pull -- pull datamec.scatter -- spread outmec.gather -- collect backmec.kill -- kill engines and controller

tc.map –- parallel maptc.parallel –- function decoratortc.run -- run Taskstc.get_task_result – get result

client.MapTask –- function-likeclient.StringTask –- code-string

Page 11: Parallel Processing with IPython

11

MultiEngineClientSCALAR FUNCTION PARALLEL VECTORIZED FUNCTION

# Using map>>> def func(x):... return x**2.5 * (3*x – 2)# standard map>>> result = map(func, range(32))# mec.map>>> parallel_result = mec.map(func, range(32))

# mec.parallel >>> pfunc = mec.parallel()(func)

@mec.paralleldef pfunc(x): return x**2.5 * (3*x – 2)

>>> parallel_result2 = pfunc(range(32))

or using decorators

Page 12: Parallel Processing with IPython

12

TaskClient – Load BalancingSCALAR FUNCTION PARALLEL VECTORIZED FUNCTION

# Using map>>> def func(x):... return x**2.5 * (3*x – 2)# standard map>>> result = map(func, range(32))# mec.map>>> parallel_result = tc.map(func, range(32))

# mec.parallel >>> pfunc = tc.parallel()(func)

@tc.paralleldef pfunc(x): return x**2.5 * (3*x – 2)

>>> parallel_result2 = pfunc(range(32))

or using decorators

Page 13: Parallel Processing with IPython

13

MultiEngineClient EXECUTE CODESTRING IN PARALLEL

>>> from enthought.blocks.api import func2str# decorator that turns python-code into a string>>> @func2str... def code():... import numpy as np... a = np.random.randn(N,N)... eigs, vals = np.linalg.eig(a)... maxeig = max(abs(eigs))>>> mec['N'] = 100>>> result = mec.execute(code)>>> print mec['maxeig'][10.471428625885835, 10.322386155553213, 10.237638983818622, 10.614715948426941]

Page 14: Parallel Processing with IPython

14

TaskClient – Load Balancing QueueEXECUTE CODESTRING IN PARALLEL

>>> from enthought.blocks.api import func2str# decorator that turns python-code into a string>>> @func2str... def code():... import numpy as np... a = np.random.randn(N,N)... eigs, vals = np.linalg.eig(a)... maxeig = max(abs(eigs))>>> task = client.StringTask(str(code), push={'N':100}, pull='maxeig') >>> ids = [tc.run(task) for i in range(4)]>>> res = [tc.get_task_result(id) for id in ids]>>> print [x['maxeig'] for x in res][10.439989436983467, 10.250842410862729, 10.040835983392991, 10.603885977189803]

Page 15: Parallel Processing with IPython

Parallel FFT On Memory Mapped File

ProcessorsTime

(seconds)Speed Up

1 11.75 1.0

2 6.06 1.9

4 3.36 3.5

8 2.50 4.7

Page 16: Parallel Processing with IPython

EPDhttp://www.enthought.com/products/epd.php

Enthought Training:http://www.enthought.com/training/

Webinarshttp://www.enthought.com/training/webinars.php