VORtechfloating point arithmeticThe consistency, accuracy and performance dilemma
Werner Kramer
17-05-2016
17-05-2016
VORtech 20 jaar (en trakteerde kennis)
http://www.meetup.com/VORtech-Scientific-Software-Engineering/
17-05-2016
floating point arithmetic
• IEEE 754 Binary floating-point standard
• Pick two:
• Reproducibility
• Accuracy
• Performance
• Testing results with finite
accuracy
17-05-2016
SIMONA-4315Results depend on the location of the grid fileSIMONA-4328Unexpected differences when cleaning roughcombination-files
PS. The RMM-model is known to be slightly unstable.
17-05-2016
SIMONA-4256test-models give different results for release2014 and trunk
Change in revision r5784:waquaref.tab#14 RESTART NCHAR = 7 OPT#IREP = 1 NAME = EXP_RESTART NCHAR = 3 JREP = 1 TYPE = 3 OPTIREP = 1 NAME = SDS_RESTART NCHAR = 3 JREP = 1 TYPE = 3 MAND
+IREP = 1 NAME = TIME_EPS NCHAR = 8 JREP = 1 TYPE = 2 DEF = 1e-4#
Triggered a difference in a matrix factorization (with duplicate eigen values),which resulted in up to 10 cm difference in the water level.
Differences disappear when compiling with debug flags.
17-05-2016 6
IEEE Floating point number
fraction between 1 and 2 (significand)most significant bit (MSB) not stored
0.15625 * 2 = 0.31250 00.31250 * 2 = 0.62500 00.62500 * 2 = 1.25000 10.25000 * 2 = 0.50000 00.50000 * 2 = 1.00000 1
0. 00101
1. 01 2
biased exponent = exponent + 127
sign bit: 0 = positive, 1 = negative
0. 1562532 bit memory value for a decimal number
17-05-2016 7
Some examples
decimal S exponent fraction1.000000 0 01111111 00000000000000000000000-1.000000 1 01111111 000000000000000000000000.5000000 0 01111110 000000000000000000000000.1000000 0 01111011 100110011001100110011010.7578125 0 01111110 10000100000000000000000
0.0000E+00 0 00000000 000000000000000000000000.0000E+00 1 00000000 00000000000000000000000Infinity 0 11111111 00000000000000000000000-Infinity 1 11111111 00000000000000000000000(s)NaN * 11111111 (0)1**********************
• 1.0/0.0 = • 1.0/0.0 = • 0.0/0.0 = 0.0 × = NaN• sqrt(-1.) = NaN
17-05-2016
Distribution of values
• Spacing is dependent on the value of the exponent
• Without denormals there would be a gap around zero
17-05-2016 9
Rounding errors
Addition
3.0 +6.0 1.10 × 2
1.10 × 2 +1.10 × 20.11 × 2 +
10.01 × 21.001 × 2 ?
a) round to nearest (roundTiesToEven ) 1.00 × 2 = 8
b) round down (towards ) 1.00 × 2 = 8
c) round up (towards ) 1.01 × 2 = 10
d) round towards zero 1.00 × 2 = 8
17-05-2016
Factors that affect reproducibility
• Floating-point semantics
• Use of higher-precision intermediate results
fused multiply add instruction (fma) A*x + y
• Differences in math libraries (e.g. sin function)-fimf-arch-precision=(high, medium, low)
• Data alignment changing vectorization
• Parallelism changing operation order
• Implementation differences between processors
-fimf-arch-consistency=true math library gives same results
across processors
17-05-2016
Reassociation
• Addition and multiplication are mathematically associative,
but not computationally associative
• (a+b)+c = a+(b+c)
• (a*b)*c = a*(b*c)
• Divide using multiply by reciprocal x*y => x*(1/y)
• C and C++ disallow reassociation, specify left-to-right order• Fortran allows reordering as long as parentheses are honored
(–assume protect_parens)• Compiler may not obey these by default
17-05-2016
Reassociation
integer ::i, nreal, dimension(n) :: A = 1.0real :: C = -1.0, tiny = 1e-20
do i = 1, nA(i) = A(i) + C + tiny
end do
original code optimized codeinteger ::i, nreal, dimension(n) :: A = 1.0real :: C = -1.0, tiny = 1e-20
C + tinydo i = 1, n
A(i) = A(i) + Cend do
-fp-model keyword
• fast : value-unsafe optimizations (default)
• precise(source): value-safe optimizations only
• strict : precise + diable fma
17-05-2016
Vectorization
• Vector operation works on multiple data at once (e.g. 16 byte
block = 4 reals)
• Vectorized math functions are very slightly less accurate but
faster than the scalar versions
• Unaligned data -> both scalar and vector versions are called
• Can change results run-to-run!
• OS stack alignment
• Address Space Layout Randomization
17-05-2016
Vectorization
https://software.intel.com/en-us/articles/what-are-peel-and-remainder-loops-fortran-vectorization-support
0x00
0x04
0x08
0x0c
0x10
0x14
0x18
0x1c
0x20
0x24
0x28
real(kind=4),dimension(DIM_A) :: x
16 byte
16 byte
vectorized loop:vector operation works on 16byte block at once (SSE2)
do i=1,DIM_Aa(i) = sin(a(i))
end do
remainder:scalar operation on remainingarray elements
peel loop:loop iterations in scalar modeuntil it reaches a 16 byteboundary
17-05-2016
SIMONA testing & improvements
Field max(dif) time 99%(dif) rms(dif) mean(dif)solution_flow.sep.sep 0.001162 1395.00 0.000525 0.000170 0.000119solution_flow.up.up 0.006302 975.00 0.001386 0.000284 0.000106solution_flow.vp.vp 0.011412 1185.00 0.002075 0.000657 0.000127
test suite containing a large number of models
test suite has quantify random option -qrinverts loop when solving matrix system
use -fp-model source to validate codemodifications
change sensitive parts to double precision
release with -fp-model source?
17-05-2016
Technical/legacySoftware correctness is determined by comparison to previous(baseline) results.
Debugging/portingWhen developing and debugging, a higher degree of run-to-run stabilityis required to find potential problems.
LegalAccreditation or approval of software might require exact reproduction ofpreviously defined results.
Customer perceptionDevelopers may understand the technical issues with reproducibility butstill require reproducible results since end users or customers will bedisconcerted by the inconsistencies.
Why Reproducibility