the mines jtk and multicore computing · 115,000 lines of source code ... teapot dome data courtesy...

62
The Mines JTK and multicore computing Dave Hale Center for Wave Phenomena Colorado School of Mines

Upload: lebao

Post on 23-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

The Mines JTK and multicore computing

Dave HaleCenter for Wave PhenomenaColorado School of Mines

Page 2: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Mines Java Toolkit (JTK)

Page 3: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Mines Java Toolkit (JTK)

a software librarylike libcwp, VTK, ...

Page 4: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Mines Java Toolkit (JTK)

not a processing systemlike Seismic Unix, ProMAX, ...

Page 5: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Mines Java Toolkit (JTK)

not an interpretation systemlike OpendTect, SeisWorks, ...

Page 6: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Mines Java Toolkit (JTK)

used in systems forprocessing, interpretation,

teaching, finance, ...

Page 7: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Mines Java Toolkit (JTK)

13 packages280 classes

Page 8: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Mines Java Toolkit (JTK)

cross-platformWindows, Linux, Mac OS X

Page 9: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Mines Java Toolkit (JTK)

115,000 lines of source code+ 48,000 lines of comments

Page 10: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Mines Java Toolkit (JTK)

open-sourceCommon Public License

used in commercial software

Page 11: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

interactive graphics

Page 12: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

image processing

Image courtesy of WesternGeco

Page 13: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

structure-oriented smoothing

Page 14: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

structure-oriented semblance

Page 15: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

3D interactive visualization

Teapot Dome data courtesy ofRocky Mountain Oilfield Testing Center,US Department of Energy &Transform Software & Services

Page 16: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

of multiple objects

Page 17: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

of multiple objects

Page 18: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

with transparency

Page 19: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

with custom objects

Page 20: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

multicore computing

Page 21: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

CPU monitor (I wish)

Page 22: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

CPU monitor (typical)

Page 23: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Dear Students,

If you are not yet writing parallelprograms, say, because you workonly with small 2D images, thenyour half-life is roughly two years,which is about how long it takesfor the number of cores to double.

Cheers,Dave

November 29, 2010

Page 24: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

The half-life of any software library

not designed for parallel computing

is two years.

Page 25: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Second-order linear recursion

yi = b0xi + b1xi−1 + b2xi−2 − a1yi−1 − a2yi−2

for i = 0, 1, 2, . . . , n− 1

• useful

• easy to implement

• inherently serial (not parallel)

• 9 FLOPs per load and store

Page 26: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

float yim2 = 0.0f; float yim1 = 0.0f; float xim2 = 0.0f; float xim1 = 0.0f; for (int i=0; i<n; ++i) { float xi = x[i]; float yi = b0*xi+b1*xim1+b2*xim2 -a1*yim1-a2*yim2; y[i] = yi; yim2 = yim1; yim1 = yi; xim2 = xim1; xim1 = xi; }

Second-order linear recursion

Page 27: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

float yim2 = 0.0f; float yim1 = 0.0f; float xim2 = 0.0f; float xim1 = 0.0f; for (int i=0; i<n; ++i) { float xi = x[i]; float yi = b0*xi+b1*xim1+b2*xim2 -a1*yim1-a2*yim2; y[i] = yi; yim2 = yim1; yim1 = yi; xim2 = xim1; xim1 = xi; }

Second-order linear recursion

load

Page 28: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

float yim2 = 0.0f; float yim1 = 0.0f; float xim2 = 0.0f; float xim1 = 0.0f; for (int i=0; i<n; ++i) { float xi = x[i]; float yi = b0*xi+b1*xim1+b2*xim2 -a1*yim1-a2*yim2; y[i] = yi; yim2 = yim1; yim1 = yi; xim2 = xim1; xim1 = xi; }

Second-order linear recursion

compute

Page 29: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

float yim2 = 0.0f; float yim1 = 0.0f; float xim2 = 0.0f; float xim1 = 0.0f; for (int i=0; i<n; ++i) { float xi = x[i]; float yi = b0*xi+b1*xim1+b2*xim2 -a1*yim1-a2*yim2; y[i] = yi; yim2 = yim1; yim1 = yi; xim2 = xim1; xim1 = xi; }

Second-order linear recursion

store

Page 30: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

float yim2 = 0.0f; float yim1 = 0.0f; float xim2 = 0.0f; float xim1 = 0.0f; for (int i=0; i<n; ++i) { float xi = x[i]; float yi = b0*xi+b1*xim1+b2*xim2 -a1*yim1-a2*yim2; y[i] = yi; yim2 = yim1; yim1 = yi; xim2 = xim1; xim1 = xi; }

Second-order linear recursion

Page 31: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

void solr1(float a1, float a2, float b0, float b1, float b2, int n, float* x, float* y) {

... // details inside library code}

1D arrays (C/C++)

Page 32: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

void solr1(float a1, float a2, float b0, float b1, float b2, int n, float* x, float* y) {

... // details inside library code}

1D arrays (C/C++)

n

Page 33: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

2D arrays (C/C++)void solr2(float a1, float a2, float b0, float b1, float b2, int n1, int n2, float** x, float** y) {

for (int i2=0; i2<n2; ++i2)solr1(a1,a2,b0,b1,b2,n1,x[i2],y[i2]);

}

n1

n2

Page 34: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Serial (C/C++)void solr2(float a1, float a2, float b0, float b1, float b2, int n1, int n2, float** x, float** y) {

for (int i2=0; i2<n2; ++i2)solr1(a1,a2,b0,b1,b2,n1,x[i2],y[i2]);

}

n1

n2

Page 35: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Serial (C/C++)void solr2(float a1, float a2, float b0, float b1, float b2, int n1, int n2, float** x, float** y) {

for (int i2=0; i2<n2; ++i2)solr1(a1,a2,b0,b1,b2,n1,x[i2],y[i2]);

}

the problem

Page 36: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Parallel (C/C++)void solr2(float a1, float a2, float b0, float b1, float b2, int n1, int n2, float** x, float** y) {

#pragma omp parallel for schedule(dynamic)for (int i2=0; i2<n2; ++i2)

solr1(a1,a2,b0,b1,b2,n1,x[i2],y[i2]);}

OpenMP

Page 37: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Serial (C/C++)void solr2(float a1, float a2, float b0, float b1, float b2, int n1, int n2, float** x, float** y) {

for (int i2=0; i2<n2; ++i2)solr1(a1,a2,b0,b1,b2,n1,x[i2],y[i2]);

}

Page 38: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Serial (Java)void solr2(float a1, float a2, float b0, float b1, float b2, float[][] x, float[][] y) {

int n2 = x.length;for (int i2=0; i2<n2; ++i2)

solr1(a1,a2,b0,b1,b2,n1,x[i2],y[i2]);}

Page 39: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Parallel (Java)void solr2(float a1, float a2, float b0, float b1, float b2, float[][] x, float[][] y) {

int n2 = x.length;loop(n2, new LoopInt() {public void compute(int i2) {

solr1(a1,a2,b0,b1,b2,x[i2],y[i2]);}});

}Java fork-join framework

(with edu.mines.jtk.util.Parallel)

Page 40: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Serial (Scala)def solr2(a1:Float, a2:Float, b0:Float, b1:Float, b2:Float, x:Float2 x, y:Float2):Unit = {

x.indices.foreach(i2=>solr1(a1,a2,b0,b1,b2,x(i2),y(i2))

}

Page 41: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Parallel (Scala)def solr2(a1:Float, a2:Float, b0:Float, b1:Float, b2:Float, x:Float2 x, y:Float2):Unit = {

x.indices.par.foreach(i2=>solr1(a1,a2,b0,b1,b2,x(i2),y(i2))

}

Scala parallel arrays

Page 42: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

0

5

10

15

20

25

1 core 2 cores 12 cores

GFL

OP

s2D: 1000 x 50000

Java Scala g++ (OpenMP)

Page 43: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

0

5

10

15

20

25

1 core 2 cores 12 cores

GFL

OP

s2D: 1000 x 50000

Java Scala g++ (OpenMP)

speedup: 2.4

Page 44: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

0

5

10

15

20

25

1 core 2 cores 12 cores

GFL

OP

s2D: 1000 x 50000

Java Scala g++ (OpenMP)

speedup: 2.4

speedup: 9.4

Page 45: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

0

5

10

15

20

25

1 core 2 cores 12 cores

GFL

OP

s2D: 1000 x 5000

Java Scala g++ (OpenMP)

Page 46: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

0

5

10

15

20

25

1 core 2 cores 12 cores

GFL

OP

s2D: 1000 x 500

Java Scala g++ (OpenMP)

Page 47: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

0

5

10

15

20

25

1 core 2 cores 12 cores

GFL

OP

s2D: 1000 x 50

Java Scala g++ (OpenMP)

Page 48: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

0

5

10

15

20

25

1 core 2 cores 12 cores

GFL

OP

s2D: 1000 x 5

Java Scala g++ (OpenMP)

0.1

Page 49: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

3D arrays (C/C++)void solr3(float a1, float a2, float b0, float b1, float b2, int n1, int n2, int n3, float*** x, float*** y) {

for (int i3=0; i3<n3; ++i3)solr2(a1,a2,b0,b1,b2,n1,n2,x[i3],y[i3]);

}

Page 50: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

3D arrays (C/C++)void solr3(float a1, float a2, float b0, float b1, float b2, int n1, int n2, int n3, float*** x, float*** y) {

for (int i3=0; i3<n3; ++i3)solr2(a1,a2,b0,b1,b2,n1,n2,x[i3],y[i3]);

}

Is solr2 parallel?

Page 51: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

3D arrays (C/C++)void solr3(float a1, float a2, float b0, float b1, float b2, int n1, int n2, int n3, float*** x, float*** y) {

for (int i3=0; i3<n3; ++i3)solr2(a1,a2,b0,b1,b2,n1,n2,x[i3],y[i3]);

}

Should we make thisloop over i3 parallel?

Page 52: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

3D arrays (C/C++)void solr3(float a1, float a2, float b0, float b1, float b2, int n1, int n2, int n3, float*** x, float*** y) {

#pragma omp parallel for schedule(dynamic)for (int i3=0; i3<n3; ++i3)

solr2(a1,a2,b0,b1,b2,n1,n2,x[i3],y[i3]);}

OpenMP

Any nested loopswill now be serial,if using OpenMP.

Page 53: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

0

5

10

15

20

25

1 core 2 cores 12 cores

GFL

OP

s3D: 1000 x 5 x 50000Java Scala g++ (OpenMP)

Page 54: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

0

5

10

15

20

25

1 core 2 cores 12 cores

GFL

OP

s3D: 1000 x 50 x 5000Java Scala g++ (OpenMP)

Page 55: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

0

5

10

15

20

25

1 core 2 cores 12 cores

GFL

OP

s3D: 1000 x 500 x 500Java Scala g++ (OpenMP)

Page 56: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

0

5

10

15

20

25

1 core 2 cores 12 cores

GFL

OP

s3D: 1000 x 5000 x 50Java Scala g++ (OpenMP)

Page 57: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

0

5

10

15

20

25

1 core 2 cores 12 cores

GFL

OP

s3D: 1000 x 50000 x 5Java Scala g++ (OpenMP)

Page 58: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Multicore computingin software libraries

Libraries must usestandard frameworks

Page 59: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Multicore computingin software libraries

C: OpenMP

Page 60: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Multicore computingin software libraries

C++: Intel’s TBB(Threading Building Blocks,

which use fork-join)

Page 61: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Multicore computingin software libraries

Java: fork-join

Page 62: The Mines JTK and multicore computing · 115,000 lines of source code ... Teapot Dome data courtesy of Rocky Mountain Oilfield Testing Center, ... only with small 2D images, then

Multicore computingin software libraries

Scala: parallel arrays(which use fork-join)