challenges in fully generating multigrid solvers for the
TRANSCRIPT
![Page 1: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/1.jpg)
Challenges in Fully GeneratingMultigrid Solvers for theSimulation of non-Newtonian Fluids
Sebastian Kuckuk
FAU Erlangen-Nürnberg
18.01.2016
HiStencils 2016, Prague, Czech Republic
![Page 2: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/2.jpg)
Outline
![Page 3: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/3.jpg)
Outline
● Scope and Motivation
● Project ExaStencils
● From Poisson to Fluid Dynamics
● 1st Challenge – Code Generation
● 2nd Challenge – Optimization
● 3rd Challenge – Parallelization
● Preliminary Results
● Conclusion and Outlook
![Page 4: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/4.jpg)
Scope and Motivation
![Page 5: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/5.jpg)
PDEs
● Goal: Solve a partial differential equation approximately by solving a
discretized form
5
Optical Flow (LIC Visualization)Particles in Flow
∆𝑢 = 𝑓 in Ω𝑢 = 𝑔 in 𝜕Ω
𝐴 𝑢ℎ = 𝑓ℎ
![Page 6: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/6.jpg)
Multigrid
6
Smoothing of
High Frequency
Errors
Coarsened
Representation of
Low Frequency Errors
![Page 7: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/7.jpg)
Multigrid V-Cycle
7
Smoothing
Restriction Prolongation
& Correction
Coarse Grid Solving
![Page 8: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/8.jpg)
Computational Grids
● Regular grids
● Block-Structured grids -> deferred for now
8
![Page 9: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/9.jpg)
Main Focus: HPC
● Highly optimized and highly scalable geometric multigrid solvers
● ‘Traditional’ supercomputers like JUQUEEN and SuperMUC
● Alternative architectures like Piz Daint and Mont Blanc
9
![Page 10: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/10.jpg)
Project ExaStencils
![Page 11: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/11.jpg)
14
• Sebastian Kuckuk
• Harald Köstler
• Ulrich Rüde
• Christian Schmitt
• Frank Hannig
• Jürgen Teich
• Hannah Rittich
• Matthias Bolten
• Alexander Grebhahn
• Sven Apel
• Stefan Kronawitter
• Armin Größlinger
• Christian Lengauer
Project ExaStencils
![Page 12: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/12.jpg)
Project ExaStencils
● Multi-layered DSL for the specification of
● PDEs
● Discretizations
● (MG) Solver components
● Parallel behavior and application specifics
● Fully automatic code generation fueled by our Scala framework
● Automatic application of low-level optimizations
● Powerful performance characteristics prediction based on
● Local Fourier Analysis (LFA) and
● Software Product Lines (SPL) technology
15
![Page 13: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/13.jpg)
From Poisson to Fluid Dynamics
![Page 14: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/14.jpg)
Overall Goal
● Simulation of non-isothermal/non-
Newtonian fluid flows
● Suspensions of particles or
macromolecules
● E.g. pastes, gels, foams, drilling
fluids, food products, blood, etc.
● Importance in mining, chemical
and food industry as well as
medical applications
17
https://www.youtube.com/watch?v=G1Op_1yG6lQ
![Page 15: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/15.jpg)
Overall Goal cnt’d
● FORTRAN90 Code
● Finite volume
discretization
● Staggered grid
● SIMPLE Algorithm
● TDMA solvers
● OMP parallelization
18
Fully generate a replica source code for “Parallel finite volume method
simulation of three-dimensional fluid flow and convective heat transfer for
viscoplastic non-Newtonian fluids”, D. A. Vasco, N. O. Moraga and G. Haase
![Page 16: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/16.jpg)
Governing Equations (w/o BC)
Poisson NNF
∆𝑢 = 𝑓−𝛻𝑇 𝐻𝛻𝑣 + 𝐷 𝑣𝑇 ⋅ 𝛻 𝑣 + 𝛻𝑝 + 𝐷
0𝜃0
= 0
𝛻𝑇𝑣 = 0−𝛻𝑇 𝛻𝜃 + 𝐺 ⋅ v𝑇 ⋅ 𝛻 𝜃 = 0
with𝐺, 𝐷 dependent on physical properties and temperature-dependent density, and
viscosity 𝐻 Γ(𝑣) dependent on physical
properties and computed w.r.t. various models
20
![Page 17: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/17.jpg)
The SIMPLE Algorithm
● Semi-Implicit Method for Pressure Linked Equations
● Concept:
23
Solve for velocity componentsSolve for velocity componentsSolve for velocity components
Solve for pressure correction
Apply pressure correction
Update properties
![Page 18: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/18.jpg)
The SIMPLE Algorithm
● Temperature can be added as a separate step
24
Solve for velocity componentsSolve for velocity componentsSolve for velocity components
Solve for pressure correction
Apply pressure correction
Solve for temperature
Update properties
![Page 19: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/19.jpg)
Staggered grid
● Finite volume discretization on a staggered grid
25
values associated with the
x-staggered grid, e.g. U
values associated with the
y-staggered grid, e.g. V
values associated with the
cell centers, e.g. p and θ
control volumes associated
with cell-centered values
control volumes associated
with x-staggered values
control volumes associated
with y-staggered values
![Page 20: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/20.jpg)
Differences to Poisson
● From model problem to application
27
Poisson NNF
∆𝑢 = 𝑓−𝛻𝑇 𝐻𝛻𝑣 + 𝐷 𝑣𝑇 ⋅ 𝛻 𝑣 + 𝛻𝑝 + 𝐷
0𝜃0
= 0
𝛻𝑇𝑣 = 0−𝛻𝑇 𝛻𝜃 + 𝐺 ⋅ v𝑇 ⋅ 𝛻 𝜃 = 0
Linear Non-linear
Scalar PDE (one unknown)3 velocity components, pressure and temperature with varying localizations
Simple boundary conditions Mixed boundary conditions
Finite differences Finite volumes
One grid Staggered grids
Uniform spacing Local refinement
![Page 21: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/21.jpg)
1st Challenge – Code Generation
![Page 22: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/22.jpg)
Extensions
Necessary preparations:
● Interface from/to FORTRAN
● Data and functions
Allows replacing single portions of the code, one step at a time
● Data layouts for staggered grids
● Requires, apart from the language extension itself,
adapted field layouts
new communication routines
specialized boundary conditions
(refined coarsening and interpolation stencils)
29
![Page 23: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/23.jpg)
Porting code
● Straight-forward for simple kernels
30
subroutine advance_fields ()
!$omp parallel do &
!$omp private(i,j,k) &
!$omp firstprivate(l1,m1,n1) &
!$omp shared(rho,rho0) &
!$omp schedule(static) default(none)
do k=1,n1
do j=1,m1
do i=1,l1
rho0(i,j,k)=rho(i,j,k)
end do
end do
end do
!$omp end parallel do
Function AdvanceFields@finest ()
: Unit {
loop over rho@current {
rho[next]@current =
rho[active]@current
}
advance rho@current
}
![Page 24: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/24.jpg)
Porting code
● But what about more complicated code? How do we get from this …
31
! if not at the boundary
fl = xcvi(i) * v(i,jp,k) * (fy(jp)*rho(i,jp,k) + fym(jp)*rho(i,j,k))
flm = xcvip(im) * v(im,jp,k) * (fy(jp)*rho(im,jp,k) + fym(jp)*rho(im,j,k))
flownu = zcv(k) * (fl+flm)
gm = xcvi(i) * vis(i,j,k) * vis(i,jp,k) / (ycv(j)*vis(i,jp,k) + ycv(jp)*vis(i,j,k) + 1.e-30)
gmm = xcvip(im) * vis(im,j,k) * vis(im,jp,k) /(ycv(j)*vis(im,jp,k) +
ycv(jp)*vis(im,j,k) + 1.e-30)
diff = 2. * zcv(k) * (gm+gmm)
call diflow(flownu,diff,acof)
adc = acof + max(0.,flownu)
anu(i,j,k) = adc - flownu
![Page 25: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/25.jpg)
Porting code
● … to this?
32
flownu@current = integrateOverXStaggeredNorthFace (
v[active]@current * rho[active]@current )
Val diffnu : Real = integrateOverXStaggeredNorthFace (
evalAtXStaggeredNorthFace ( vis@current, "harmonicMean" ) )
/ vf_stagCVWidth_y@current@[0, 1, 0]
AuStencil@current:[0, 1, 0] = -1.0 * ( calc_diflow ( flownu@current, diffnu )
+ max ( 0.0, flownu@current ) - flownu@current )
![Page 26: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/26.jpg)
Extended boundary conditions
33
1 Function ApplyBC_u@finest ( ) : Unit {
2 loop over u@current only duplicate [ 1, 0] on boundary {
3 u@current = 0.0
4 }
5 loop over u@current only duplicate [-1, 0] on boundary {
6 u@current = 0.0
7 }
8 loop over u@current only duplicate [ 0, 1] on boundary {
9 u@current = wall_velocity
10 }
11 loop over u@current only duplicate [ 0, -1] on boundary {
12 u@current = -1.0 * wall_velocity
13 }
14 }15
16 Field Solution < global, DefNodeLayout, ApplyBC_u >@finest
![Page 27: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/27.jpg)
Virtual fields
● Rework and extension of the way geometric information is used
● Many virtual fields (positions of (staggered) nodes/ faces/ cells,
interpolation and integration parameters, etc.)
● Tradeoff between on-the-fly calculation and explicit storage
34
1 // previous access through functions doesn’t allow offset access
2 rhs@current = sin ( nodePosition_x@current ( ) )3
4 // newly introduced virtual fields allow virtually identical behavior ...
5 rhs@current = sin ( vf_nodePosition_x@current )6
7 // ... and in addition allow offset accesses
8 Val dif : Real = (
9 vf_nodePosition_x@current@[ 1, 0, 0]
10 - vf_nodePosition_x@current@[ 0, 0, 0] )
![Page 28: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/28.jpg)
Specialized functions
● Specialized evaluation and integration functions
35
1 // evaluate with respect to cells of the grid
2 evalAtSouthFace ( rho[active]@current )3
4 // integrate expressions across faces of grid cells
5 integrateOverEastFace (
6 u[active]@current * rho[active]@current )7
8 // integrate expressions across faces of cells of the staggered grid
9 integrateOverXStaggeredEastFace (
10 u[active]@current * rho[active]@current )11
12 // integrate expressions using specialized interpolation schemes
13 integrateOverZStaggeredEastFace (
14 evalAtZStaggeredEastFace ( vis@current, "harmonicMean" ) )
![Page 29: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/29.jpg)
2nd Challenge – Optimization
![Page 30: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/30.jpg)
Optimization of Solver Components
● Very similar to what we usually do (when using SIMPLE)
● 7P stencils with variable coefficients
● ‘Standard’ options for low-level optimization are viable
● address pre-calculation
● (temporal and spatial) blocking
● vectorization
● polyhedral loop transformations
● etc.
37
![Page 31: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/31.jpg)
Performance distribution
● Rough breakdown of time spent
● Preliminary test with 64^3 cells without applied optimizations
38
update LSE solve update propertiesvelocity u velocity v
velocity w pressure correction
temperature
![Page 32: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/32.jpg)
Optimization of LSE Updates
● Has to be redone many times -> costly
● No temporal blocking
● Reuse due to symmetry in some subexpressions possible
● Vectorization challenging (but possible)
39
![Page 33: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/33.jpg)
Other Open Points
● Choice of discretization specifics / grid spacing / numerical
components are crucial
● Representation of the LSEs on coarser levels
● Optimization of solver considerably harder when handling all
components at once
● Multiple unknowns are combined in vectors
● Stencil coefficient become matrices (e.g. 8x8)
● Inverting the center coefficient becomes a matrix inversion
40
![Page 34: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/34.jpg)
3rd Challenge – Parallelization
![Page 35: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/35.jpg)
Parallelization
● Free for shared memory parallelism
● OpenMP code is emitted by our generator without further effort
● Distributed memory parallelism is also manageable
● Communication patterns and routines are available
● Points of communication have to be annotated (to be automated in the
future)
● More difficult: domain partitioning of non-uniform grids
=> approach from the reference work is strictly serial
42
![Page 36: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/36.jpg)
Domain Partitioning
● Easy for regular domains
43
Each domain consists
of one or more blocks
Each block consists
of one or more fragments
Each fragment consists
of several data points / cells
![Page 37: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/37.jpg)
Parallelization
● Communication statements are added automatically when
transforming Layer 3 to Layer 4 where they may be reviewed or
adapted
45
/* communicates all applicable layers */communicate Solution@current/* communicates only ghost layers */communicate ghost of Solution[active]@current/* communicates duplicate and first two ghost layers */communicate dup, ghost[0, 1] of Solution[active]@current/* asynchronous communicate */begin communicate Residual@current//...finish communicating Residual@current
![Page 38: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/38.jpg)
Preliminary results
![Page 39: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/39.jpg)
Preliminary results
● Initial version without non-Newtonian model
● Comparison is difficult due to the different exit criteria, e.g.
● Solving the single components
● Reference: fixed to 1 solve step (TDMA)
● Our implementation
● At least one step (RBGS or GMG)
● SIMPLE
● Reference: a certain percentage of all cells (including boundaries)
doesn’t change beyond a fixed threshold
● Our implementation: convergence for all components is achieved after
updating the LSE but before starting the solve routine
47
𝑟 ≤ 𝛼 1 + 𝛽 𝑏𝑟𝑖+1 − 𝑟𝑖 < 𝛼
![Page 40: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/40.jpg)
Preliminary results
● One socket of our local cluster (Intel Xeon E7-4830)
● 16 OMP threads
● 100 timesteps
48
0,01
0,1
1
10
100
1000
16³ 32³ 64³ 128³Ex
ec
uti
on
tm
e p
er
tim
ers
tep
[s]
Problem Size
Reference RBGS Multigrid
![Page 41: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/41.jpg)
Conclusion and Outlook
![Page 42: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/42.jpg)
Conclusion
● Generating solvers for simulating non-Newtonian fluids
● 95 %
● Optimization
● 30 %
● Parallelization
● 75 %
50
![Page 43: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/43.jpg)
Outlook
● Enhanced performance optimizations
● Comparison with analytical models
● Study of parallel performance characteristics
● DSL extensions for relevant concepts
● More complex numeric components
● (Multi-) GPU support
51
![Page 44: Challenges in Fully Generating Multigrid Solvers for the](https://reader031.vdocuments.pub/reader031/viewer/2022020623/61f17359a922eb13b747da86/html5/thumbnails/44.jpg)
Thank you for yourAttention!
Questions?