automatic generation of e cient codes from mathematical … · good news of formura 1/2 1 : 184...
TRANSCRIPT
Automatic Generation of Efficient Codes fromMathematical Descriptions of Stencil Computation
Takayuki Muranushi1 Seiya Nishizawa1 Hirofumi Tomita1
Keigo Nitadori1 Masaki Iwasawa1 Yutaka Maruyama1
Hisashi Yashiro1 Yoshifumi Nakamura1 Hideyuki Hotta2
Junichiro Makino3 Natsuki Hosono4 Hikaru Inoue5
1RIKEN Advanced Institute for Computational Science2Chiba University 3Kobe University4Kyoto University 5Fujitsu Ltd.
Sep 22, 2016
for FHPC 2016 workshop / ICFP’16 Nara, Japan
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 1 / 37
Programming Language
Formura
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 2 / 37
Programming language Formura
Domain specific language for stencil computaion
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 3 / 37
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 4 / 37
Good news of Formura 1/2
1:184 Petaflops (11.62% of the peak)
on 663,552 cores
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 5 / 37
Good news of Formura 1/2
ACM Gordon Bell Prize Finalist
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 6 / 37
Good news of Formura 2/2
@�
@t= �
3X
i=1
@
@xi(�vi)
ddt_� = - � fun(i) @ i (� * v i)
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 7 / 37
Formura
is a functional programming language
is implemented in a functional programming
language (Haskell)
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 8 / 37
Backend: How we generate efficient codes
Backend: How we generateefficient codes
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 9 / 37
Backend: How we generate efficient codes
Stencil Computation
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 10 / 37
Backend: How we generate efficient codes
Byte / Flops of hardwares are decreasing
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 11 / 37
Backend: How we generate efficient codes
Naive implementation of stencil computation
The optimalB
F=
2He
Ce
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 12 / 37
Backend: How we generate efficient codes
Temporal Blocking
The optimalB
F=
2He
Ce
0@ 1
NF
+2dNs
NT
1A
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 13 / 37
Backend: How we generate efficient codes
Decompose & fuse array computations in space-time
manifest :: a[i]b[i] = a[i-1] + a[i] + a[i+1]
manifest :: c[i] = b[i-1] * b[i] * b[i+1]d[i] = c[i-1] + c[i] + c[i+1]
manifest :: e[i] = d[i-1] * d[i] * d[i+1]
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 14 / 37
Backend: How we generate efficient codes
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 15 / 37
Backend: How we generate efficient codes
In which language shall we code?
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 16 / 37
Backend: How we generate efficient codes
Paraiso : a DSL embedded inHaskell (Muranushi, 2012)
among Nikola (Mainland & Morrisett, 2010), Obsidian
(Svensson, 2011), Accelerate (Chakravarty et al., 2011),
SPOC (Bourgoin et al., 2012), NOVA (Collins et al.,
2014), and LMS series (Rompf, 2012).
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 17 / 37
Backend: How we generate efficient codes
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 18 / 37
Backend: How we generate efficient codes
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 19 / 37
Backend: How we generate efficient codes
Paraiso: a bad sell
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 20 / 37
Backend: How we generate efficient codes
Our team
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 21 / 37
Formura : a standalone DSL
Formura : a standalone DSL
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 22 / 37
Formura : a standalone DSL
Design principle of Formura
�Simple enough�Rich enough
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 23 / 37
Formura : a standalone DSL
Syntax of Formura
# dimension declarationdimension :: 3# array declarationdouble [] :: vx, vy, vz# array computationA2[i,j,k] = A[i-1] + A[i+1]# Tuplev = (vx, vy, vz)# Lambda expressiontripe = fun (x) 3 * x
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 24 / 37
Formura : a standalone DSL
Tuples are functions
(a,b) 1 = b(f,(h,p,c)) 1 2 = c
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 25 / 37
Formura : a standalone DSL
Inferred promotion to tuples and functions
x + (a,b) = (x+a,x+b)(x,y) + (a,b) = (x+a,y+b)
(x,y) + (a,b,c) = ?
(f + g) x = f x + g x(f + g + 1) x = f x + g x + 1
rk4 = fun(ddt) \fun(sys_0) let \
sys_q4 = sys_0 + dt/4 * ddt(sys_0)sys_q3 = sys_0 + dt/3 * ddt(sys_q4)sys_q2 = sys_0 + dt/2 * ddt(sys_q3)sys_next = sys_0 + dt * ddt(sys_q2)
in sys_next
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 26 / 37
Formura : a standalone DSL
Differentiation Operators
ddx = fun(a) (a[i+1/2,j,k] - a[i-1/2,j,k])/dxddy = fun(a) (a[i,j+1/2,k] - a[i,j-1/2,k])/dyddz = fun(a) (a[i,j,k+1/2] - a[i,j,k -1/2])/ dz
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 27 / 37
Formura : a standalone DSL
Nabla and Summation
@ = (ddx ,ddy ,ddz)� = fun (e) e 0 + e 1 + e 2
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 28 / 37
Formura : a standalone DSL
Evaluation of formura expression
� fun(i) @ i (� * v i)
� = fun (e) e 0 + e 1 + e 2
�! (fun(i) @ i (� * v i)) 0
+ (fun(i) @ i (� * v i)) 1
+ (fun(i) @ i (� * v i)) 2
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 29 / 37
Formura : a standalone DSL
Evaluation of formura expression
� fun(i) @ i (� * v i)� = fun (e) e 0 + e 1 + e 2
�! (fun(i) @ i (� * v i)) 0
+ (fun(i) @ i (� * v i)) 1
+ (fun(i) @ i (� * v i)) 2
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 29 / 37
Formura : a standalone DSL
Evaluation of formura expression
� fun(i) @ i (� * v i)� = fun (e) e 0 + e 1 + e 2
�! (fun(i) @ i (� * v i)) 0
+ (fun(i) @ i (� * v i)) 1
+ (fun(i) @ i (� * v i)) 2
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 29 / 37
Formura : a standalone DSL
Evaluation of formura expression
(fun(i) @ i (� * v i)) 0
�! @ 0 (� * v 0))@ = (ddx ,ddy ,ddz)v = (vx,vy ,vz)
(a,b,c) 0 = a
�! ddx (� * vx)
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 30 / 37
Formura : a standalone DSL
Evaluation of formura expression
(fun(i) @ i (� * v i)) 0�! @ 0 (� * v 0))
@ = (ddx ,ddy ,ddz)v = (vx,vy ,vz)
(a,b,c) 0 = a
�! ddx (� * vx)
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 30 / 37
Formura : a standalone DSL
Evaluation of formura expression
(fun(i) @ i (� * v i)) 0�! @ 0 (� * v 0))
@ = (ddx ,ddy ,ddz)v = (vx,vy ,vz)
(a,b,c) 0 = a
�! ddx (� * vx)
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 30 / 37
Formura : a standalone DSL
Evaluation of formura expression
(fun(i) @ i (� * v i)) 0�! @ 0 (� * v 0))
@ = (ddx ,ddy ,ddz)v = (vx,vy ,vz)
(a,b,c) 0 = a
�! ddx (� * vx)
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 30 / 37
Formura : a standalone DSL
Evaluation of formura expression
ddx (� * vx)
ddx = fun(a) (a[i+1/2,j,k] - a[i-1/2,j,k])/dx
�! ((� * vx)[i+1/2,j,k] -
(� * vx)[i-1/2,j,k])/dx
�! (�[i+1/2,j,k] * vx[i+1/2,j,k] -
�[i-1/2,j,k] * vx[i-1/2,j,k])/dx
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 31 / 37
Formura : a standalone DSL
Evaluation of formura expression
ddx (� * vx)ddx = fun(a) (a[i+1/2,j,k] - a[i-1/2,j,k])/dx
�! ((� * vx)[i+1/2,j,k] -
(� * vx)[i-1/2,j,k])/dx
�! (�[i+1/2,j,k] * vx[i+1/2,j,k] -
�[i-1/2,j,k] * vx[i-1/2,j,k])/dx
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 31 / 37
Formura : a standalone DSL
Evaluation of formura expression
ddx (� * vx)ddx = fun(a) (a[i+1/2,j,k] - a[i-1/2,j,k])/dx
�! ((� * vx)[i+1/2,j,k] -
(� * vx)[i-1/2,j,k])/dx
�! (�[i+1/2,j,k] * vx[i+1/2,j,k] -
�[i-1/2,j,k] * vx[i-1/2,j,k])/dx
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 31 / 37
Formura : a standalone DSL
Evaluation of formura expression
ddx (� * vx)ddx = fun(a) (a[i+1/2,j,k] - a[i-1/2,j,k])/dx
�! ((� * vx)[i+1/2,j,k] -
(� * vx)[i-1/2,j,k])/dx
�! (�[i+1/2,j,k] * vx[i+1/2,j,k] -
�[i-1/2,j,k] * vx[i-1/2,j,k])/dx
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 31 / 37
Formura : a standalone DSL
Evaluation of formura expression
� fun(i) @ i (� * v i)
�! (�[i+1/2,j,k] * vx[i+1/2,j,k] -
�[i-1/2,j,k] * vx[i-1/2,j,k])/dx +
(�[i,j+1/2,k] * vy[i,j+1/2,k] -
�[i,j-1/2,k] * vy[i,j-1/2,k])/dy +
(�[i,j,k+1/2] * vz[i,j,k+1/2] -
�[i,j,k-1/2] * vz[i,j,k-1/2])/dz
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 32 / 37
Formura : a standalone DSL
Evaluation of formura expression
� fun(i) @ i (� * v i)
�! (�[i+1/2,j,k] * vx[i+1/2,j,k] -
�[i-1/2,j,k] * vx[i-1/2,j,k])/dx +
(�[i,j+1/2,k] * vy[i,j+1/2,k] -
�[i,j-1/2,k] * vy[i,j-1/2,k])/dy +
(�[i,j,k+1/2] * vz[i,j,k+1/2] -
�[i,j,k-1/2] * vz[i,j,k-1/2])/dz
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 32 / 37
Formura : a standalone DSL
Evaluation of formura expression
3X
i=1
@
@xi(�vi)
� fun(i) @ i (� * v i)
�! (�[i+1/2,j,k] * vx[i+1/2,j,k] -�[i-1/2,j,k] * vx[i-1/2,j,k])/dx +(�[i,j+1/2,k] * vy[i,j+1/2,k] -
�[i,j-1/2,k] * vy[i,j-1/2,k])/dy +(�[i,j,k+1/2] * vz[i,j,k+1/2] -�[i,j,k-1/2] * vz[i,j,k-1/2])/dz
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 33 / 37
Formura : a standalone DSL
More to talk about
Modular Reifiable Matching (MRM)(Oliveira et al., 2015) +Pattern synoynm solves “expression problem”
Details of code transformation paths
Varieties of temporal blocking methods
How we have gave proof to certain types of temporal blockingmethods
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 34 / 37
Conclusion
Conclusion
Functional programming
is a good choice for user interface
�! weather scientists and astronomers can
use it
is crucial in implementing all the program
transformations
�! achieves high performance
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 35 / 37
Conclusion
Conclusion
1.184 PflopsFormura
T. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 36 / 37
Bibliography
Bibliography I
Bourgoin, M., Chailloux, E., & Lamotte, J.-L. 2012, Parallel ProcessingLetters, 22, 1240007
Chakravarty, M. M., Keller, G., Lee, S., McDonell, T. L., & Grover, V.2011, in Proceedings of the sixth workshop on Declarative aspects ofmulticore programming, ACM, 3–14
Collins, A., Grewe, D., Grover, V., Lee, S., & Susnea, A. 2014, inProceedings of ACM SIGPLAN International Workshop on Libraries,Languages, and Compilers for Array Programming, ACM, 8
Mainland, G., & Morrisett, G. 2010in , ACM, 67–78
Oliveira, B. C. d. S., Mu, S.-C., & You, S.-H. 2015, in Proceedings of the8th ACM SIGPLAN Symposium on Haskell, ACM, 82–93
Rompf, T. 2012, PhD thesis, ECOLE POLYTECHNIQUE FEDERALE DELAUSANNE
Svensson, J. 2011, PhD thesis, Chalmers University of TechnologyT. Muranushi et al. (RIKEN AICS) Formura Sep 22, 2016 37 / 37