![Page 1: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/1.jpg)
High-Performance SVD for big data
Andreas Stathopoulos Eloy Romero Alcalde
Computer Science Department, College of William & Mary, USA
Bigdata’18
High-Performance SVD for big data Computer Science Department College of William & Mary 1/50
![Page 2: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/2.jpg)
The SVD problem
Given a matrix A with dimensions m × n find k singular triplets (σi ,ui , vi ) such that
Avi ≈ σiui and {ui}, {vi} are orthonormal bases.
Applications:
applied math (norm computation, reduction of dimensionality or variance)
combinatorial scientific computing, social network analysis
computational statistics and Machine Learning (PCA, SVM, LSI)
low dimensionality embeddings (ISOMAP, MDS)
SVD updating of streaming data
High-Performance SVD for big data Computer Science Department College of William & Mary 2/50
![Page 3: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/3.jpg)
Notation for SVD
σi , i-th singular value, Σ = diag([σ1 . . . σk ]), σ1 ≤ · · · ≤ σkui , i-th left singular vector, U = [u1 . . . uk ]
vi , i-th right singular vector, V = [v1 . . . vk ]
Full decomposition,k = min{m, n}:
A = UΣV T =
min{m,n}∑i=1
ui σi vTi
Partial decomposition,k < min{m, n}:
A ≈ UΣV T =k∑
i=1
ui σi vTi
Given vi , then σi = ‖Avi‖2, and ui = Avi/σi .
Given ui , then σi = ‖ATui‖2, and vi = ATui/σi .
High-Performance SVD for big data Computer Science Department College of William & Mary 3/50
![Page 4: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/4.jpg)
Relevant Aspects
The performance of various methods depends on the next aspects:
Matrix dimensions, m rows and n columns
Sparsity
Number of singular triplets sought
If few singular triplets are sought, which ones
Accuracy required
Spectral distribution/gaps/decay
High-Performance SVD for big data Computer Science Department College of William & Mary 4/50
![Page 5: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/5.jpg)
Accuracy of singular values
Accuracy of σi : distance between the exact singular value σi and an approximation σi
It can be bound as
|σi − σi | ≤√‖rli‖2
2 + ‖rri ‖22
≤ ‖ATAvi − σi vi‖σi + σi
≤ ‖AAT ui − σi ui‖σi + σi
,
where rli = Avi − σi ui and rri = AT ui − σi viWe say a method obtains accuracy ‖A‖2f (ε) if |σi − σi | . ‖A‖2f (ε) where ε is themaximum of the rounding error of the arithmetic used in the method and the matrix
High-Performance SVD for big data Computer Science Department College of William & Mary 5/50
![Page 6: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/6.jpg)
Accuracy of singular vectors
Individual vectors:
Angle between the exact singular vector vi and an approximation vi
Bound: sin∠(vi , vi ) .
√‖rli‖2 + ‖rri ‖2
min{|σi − σi−1|, |σi − σi+1|}Not useful when singular values are close
As low rank approximations, A ≈ UUTA:
‖A− UUTA‖2: the largest singular value not included in U
‖A− UUTA‖F : the sum of all singular values not included in U
These norms do not converge to zero unless U captures the entire rank of A
Accuracy means closeness to ‖A− UUTA‖ or smaller than a certain fraction of ‖A‖.
Note that ‖A‖2 = maxσi and ‖A‖F =√∑
σ2i
High-Performance SVD for big data Computer Science Department College of William & Mary 6/50
![Page 7: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/7.jpg)
Methods
High-Performance SVD for big data Computer Science Department College of William & Mary 7/50
![Page 8: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/8.jpg)
Direct Methods: Bidiagonal reduction
Aka svd() in Matlab/Octave and R, numpy.linalg.svd() in NumPy
Based on reducing the input matrix into a bidiagonal matrix by similarity transformations
Cost: spatial O(mn), temporal O(mn2)
Limitations:
Challenging parallelization in shared and distributed memory
Densification. The nonzero structure of the input matrix is not exploited
Cost does not reduce if few singular triplets needed or if matrix is low rank
Best used when:
Small matrix dimensions (m = n = 4000 takes ∼ 20 sec on a 2018 MacBook Pro)
More than 10% of the spectrum is needed
High-Performance SVD for big data Computer Science Department College of William & Mary 8/50
![Page 9: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/9.jpg)
Direct Methods: Squaring the matrix
1 Form M = ATA
2 Decompose M = V ΛV T
3 Solution: V = V , Σ = Λ12 , U = AV Λ−
12
Cost: spatial O(n2 + mk + nk), temporal O(mn2), for k wanted singular triplets
Improvement:
Easy parallelization
The nonzero structure can be exploited
The spatial cost reduces with k and if left and/or right singular vectors are not required
Limitations:
Accuracy is ‖σi − σi‖2 ≤‖A‖2
σiε, where ε is the rounding error for A
Best used when:
One of the dimensions is small, n . 4000
Approximate solutions are enough
High-Performance SVD for big data Computer Science Department College of William & Mary 9/50
![Page 10: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/10.jpg)
Direct Methods: QR
1 Decompose A = QR
2 Decompose R = UΣV T
3 Solution: V = V , Σ = Σ, U = QU
Improvement:
Easy parallelization using Tall Skinny QR (TSQR)
Better accuracy than squaring the matrix using proper variant of TSQR
Limitations:
More expensive than the previous method for sparse matrices
Best used when:
Dense input matrix
One of the dimensions is small, n . 4000
High-accurate solutions are required (TSQR variant may affect accuracy)
High-Performance SVD for big data Computer Science Department College of William & Mary 10/50
![Page 11: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/11.jpg)
Iterative Methods
Warning: Risk of missing triplets. See perturbation analysis
In practice, this is usually not an issue, probably because of the random nature of themethods
They introduce options that have impact on time and quality of the computed solution
Oversampling/restarting parametersStopping criterion
Critical kernels:
Matrix-multi vector product, AX and ATXOrthogonalization
High-Performance SVD for big data Computer Science Department College of William & Mary 11/50
![Page 12: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/12.jpg)
Subspace Iteration (aka Randomized SVD)
Basic method:
V0 = orth(rand(n, k ′)), for k ′ ≥ k, Vi = orth(ATAVi−1)
Vi converges towards the k largest singular vectors of ATA
Detailed method that achieve full precision:
1 Set i = 0 and V0 = orth(rand(n, k ′))
2 Set Ui = orth(AVi )
3 Stop if Vi and Ui are accurate enough
4 Set Vi+1 = orth(ATUi )
5 Increment i and go to step 2
Stopping criterion:
After some number of iterations
Monitor the convergence of singular values or ‖A− UUTA‖Monitor the residual minR‖ATUi − ViR‖2, where R is a k × k matrix
High-Performance SVD for big data Computer Science Department College of William & Mary 12/50
![Page 13: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/13.jpg)
Subspace Iteration (aka Randomized SVD)
1 Set i = 0 and V0 = orth(rand(n, k ′))
2 Set Ui = orth(AVi )
3 Stop if Vi and Ui are accurate enough
4 Set Vi+1 = orth(ATUi )
5 Increment i and go to step 2
Comments:
Requires a high-performance orthogonalization kernel, ie., TSQR or SVQB
Can take advantage of high-performance matrix-multivector product
Can be accelerated by replacing A by p(A), where the polynomial p boosts the wanted part of thespectrum, eg, p(x) = x3
Tuning of oversampling, k ′ ≥ k, may be required
High-Performance SVD for big data Computer Science Department College of William & Mary 12/50
![Page 14: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/14.jpg)
RSVD Oversampling: Example
20 40 600
0.5
1
Matrix’s spectrum
0 5 10 15
10−6
10−4
10−2
100
Iterations
Err
or,‖(UU
T−
UU
T)A‖ F
Convergence of the 10 largest singular values with RSVD
k′ = 10
k′ = 20
k′ = 30
k′ = 40
Note: ‖(UUT − UUT )A‖F =
√√√√ 10∑i=1
σ2i − σ
2i
High-Performance SVD for big data Computer Science Department College of William & Mary 13/50
![Page 15: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/15.jpg)
Lanczos Bidiagonalization
Basic method:
v0 = orth(rand(n, 1)), Vi = orth([v0, ATA v0, (ATA)2 v0, . . . , (ATA)i v0])
Vi converges towards the largest and smallest singular vectors of A
Comments:
Robust implementations are challenging: several choices for orthogonalization andrestarting
Popular variant: IRLBA
Suitable for finding the largest and the smallest singular values
Usually converges in less MV than RSVD, but not necessarily faster in time
High-Performance SVD for big data Computer Science Department College of William & Mary 14/50
![Page 16: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/16.jpg)
RSVD vs LBD
Goal: satisfy ‖A− UUTA‖2/F = ‖A− UUTA‖2/F (1 + ε)
Gap Convergence Analysis:
RSVD converges in O
(log(nε
)( σkσk+1
− 1
)−1)
iterations [Musco & Musco,’15]
Block Lanczos converges in O
(log(nε
)( σkσk+1
− 1
)−1/2)
iterations [Saad]
Gap-Free Convergence Analysis (with high probability):
RSVD converges in O
(log n
ε
)iterations [Rokhlin et al.’09, Halko et al.’11]
Block Lanczos in O
(log n√ε
)iterations [Musco & Musco,’15]
High-Performance SVD for big data Computer Science Department College of William & Mary 15/50
![Page 17: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/17.jpg)
The interplay between rank of the random block and gaps
If p is oversampling, q the # of RSVD iterations, then [Ming Gu, ’16]
‖A− UUTA‖22/F − ‖A− UUTA‖2
2/F
F 2≤ k C
(σk+p+1
F
)2(σk+p+1
σk
)4q
Normalization choices explain the behavior of different norms:
F = ‖A‖2 givesσk+p+1
σ1F = ‖A− UUTA‖2 gives
σk+p+1
σk+1
F = ‖A‖F givesσk+p+1∑
i σiF = ‖A− UUTA‖F gives
σk+p+1∑i>k σi
More importantly, theσk+p+1
F factor can be much smaller thanσk+p+1
σk
For low accuracy, sometimes it is faster to just increase p!
High-Performance SVD for big data Computer Science Department College of William & Mary 16/50
![Page 18: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/18.jpg)
RSVD vs LBD: Example of Matrix-Vector Cost
50 100 150 200 250 300
10−2
10−1
100
101
Number of matrix-vector products
Err
or,‖(UU
T−
UU
T)A‖ F
Convergence of the 10 largest singular values with RSVD
RSVD k′ = 10
RSVD k′ = 20
RSVD k′ = 30
RSVD k′ = 40
Unrestarted LBD
Restarted LBD 10,20
High-Performance SVD for big data Computer Science Department College of William & Mary 17/50
![Page 19: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/19.jpg)
RSVD vs LBD: Example of Orthogonalization Cost (flops)
100 101 102 103 104
10−6
10−4
10−2
100
Number of inner products in orthogonalization (proportional to flops)
Err
or,‖(UU
T−
UU
T)A‖ F
Convergence of the 10 largest singular values with RSVD
RSVD k′ = 10
RSVD k′ = 20
RSVD k′ = 30
RSVD k′ = 40
Unrestarted LBD
Restarted LBD 10,20
High-Performance SVD for big data Computer Science Department College of William & Mary 18/50
![Page 20: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/20.jpg)
RSVD vs LBD: Example of Orthogonalization Cost (loads)
100 101 102 103 104
10−6
10−4
10−2
100
Number of vectors read in orthogonalization (proportional to memory loads)
Err
or,‖(UU
T−
UU
T)A‖ F
Convergence of the 10 largest singular values with RSVD
RSVD k′ = 10
RSVD k′ = 20
RSVD k′ = 30
RSVD k′ = 40
Unrestarted LBD
Restarted LBD 10,20
High-Performance SVD for big data Computer Science Department College of William & Mary 19/50
![Page 21: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/21.jpg)
Davidson-type Methods
1 V ← V0, where V0 are initial guesses
2 Find approximate eigenpair (λ, x), x ∈ span{V }3 Stop if the error of (λ, x) is small enough
4 Compute a correction t for (λ, x)
5 Add t to V
6 Restarting: If size(V ) ≥ m
1 Find k approximate eigenpair (λi , xi ), x ∈ span{V }2 V ← span{xi}3 Optionally add x from previous iteration to V
7 Go to step 2
High-Performance SVD for big data Computer Science Department College of William & Mary 20/50
![Page 22: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/22.jpg)
Davidson-type Methods
Suitable for:
Problems with slow convergence
Computing very accurate triplets
Computing a few singular triplets, eg., the norm, the condition number
Problems with educated guesses and preconditioners, eg, diag(ATA)
Comments:
Can find several eigenpairs one by one (with locking) or at the same time (block method)
Flexible: can be configured as RSVD, as LBD, or anywhere in-between
Possible corrections to expand the space:
Generalized-Davidson: t = Pr , where P a preconditioner and r = Ax− xλ the residualJacobi-Davidson: solve iteratively (I − xxT )(A− λI )(I − xxT )t = r
Not straightforward to implement. Use software packages
High-Performance SVD for big data Computer Science Department College of William & Mary 20/50
![Page 23: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/23.jpg)
Davidson-type Methods: LOBPCG, PHSVDS, GKDLOBPCG basic method:
v0 = orth(rand(n, 1)), vi+1 = Rayleigh-Ritz({vi−1, vi , ATA vi})vi converges towards the largest/smallest singular vector of A
PHSVDS basic method:
1 Compute eigenpairs from ATA
2 If more accuracy needed, compute eigenpairs from
0 AT
A 0
with initial guesses from
step 1
GKD basic method:
Davidson-type extension to LBD
Possible corrections
Generalized Davidson: t = Pr , with P a preconditioner and r = ATu− vσ the residualJacobi-Davidson: solve approximately (I − vvT )(ATA− σI )(I − vvT )t = r
All methods are randomized in terms of their initial guesses
High-Performance SVD for big data Computer Science Department College of William & Mary 21/50
![Page 24: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/24.jpg)
Streaming/Online Methods
Often the rows of A are streamed, and n is too large to form ATA
Several online or incremental SVD (or PCA) algorithms[Weng et al. ’03, Ross et al.’08, Baker et al.’12, Degras&Cardot’16]
Window of m’−k rows
SVD of the m−row buffer
update k−bufferTruncate and
Best current k space
A m x n
High-Performance SVD for big data Computer Science Department College of William & Mary 22/50
![Page 25: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/25.jpg)
Fast Frequent Directions (FFD) vs Incremental SVD (iSVD)
1 B = []; choose m′ larger than the number of wanted singular triplets, k
2 Take the next m′ rows from A and append them to B for a total of 2m′ rows
3 Factorize B, UΣV T = B, descending σi4 Truncation options
B =√
Σ21:m′,1:m′ − I σ2
m′+1VT1:end,1:m′ (FDD)1
B = Σ1:m′,1:m′ VT1:end,1:m′ (iSVD)2
5 Go to step 2 if A has new rows
6 Factorize B, UΣV T = B
7 V = V , Σ = Σ, U = AB−1
1See [Gashami-Liberty-Phillips-Woodruff, ’15]2See [Baker et al.’12]
High-Performance SVD for big data Computer Science Department College of William & Mary 23/50
![Page 26: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/26.jpg)
Fast Frequent Directions (FFD) vs Incremental SVD (iSVD)
FFD:
bounds the error of the low rank approximation:
‖A− UUTA‖2/F ≤ ‖A− UUTA‖2/F
(1 + k
m−k
)multiple passes cannot improve the error
iSVD:
guarantees an improvement to the singular vector error
multiple passes converge to singular space
Online methods vs iterative methods
Online methods much lower accuracy/slower convergence
Best approach known if A is read once, and computing ATA or QR of A is too expensive
Suitable for dense rectangular matrices and sparse tall-skinny matrices
High-Performance SVD for big data Computer Science Department College of William & Mary 24/50
![Page 27: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/27.jpg)
Illustrative experiments
A synthetic modelA(25000× 5000), dense, constructed as
A = SDU +N(0, 1)
ζ
with S ,U unitary and N(0, 1) noise.
D = diag(linspace(1/m, 1)), for ranks m = 10 and m = 200
When ζ = 0.1 the matrix is dominated by noiseWhen ζ �
√d/m the matrix is basically rank m
High-Performance SVD for big data Computer Science Department College of William & Mary 25/50
![Page 28: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/28.jpg)
Methods
PRIMME’s GD+K (soft locking, block size 1)PRIMME’s LOBPCG
both PRIMME methods stopped when ‖ri‖ < 10−2‖A‖2
RSVD (Halko et al.)
performs always 2 iterations
FD (fast frequent directions)
1 pass
High-Performance SVD for big data Computer Science Department College of William & Mary 26/50
![Page 29: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/29.jpg)
A = SDU + N/ζ, m = 10, ζ = 0.1 Noise dominates
PRIMMEprimmeLOBRSVDFD
PRIMMEprimmeLOBRSVDFD
Relatively good projection error without singular space
High-Performance SVD for big data Computer Science Department College of William & Mary 27/50
![Page 30: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/30.jpg)
A = SDU + N/ζ, m = 10, ζ = 22.3 Rank-10 dominates
PRIMMEprimmeLOBRSVDFD
PRIMMEprimmeLOBRSVDFD
Low rank is captured by RSVD, less by FD
High-Performance SVD for big data Computer Science Department College of William & Mary 28/50
![Page 31: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/31.jpg)
A = SDU + N/ζ, m = 200, ζ = 0.1 Noise dominates
PRIMMEprimmeLOBRSVDFD
PRIMMEprimmeLOBRSVDFD
Similar to m = 10, but longer times for eigensolvers (and better spaces)
High-Performance SVD for big data Computer Science Department College of William & Mary 29/50
![Page 32: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/32.jpg)
A = SDU + N/ζ, m = 200, ζ = 3 Noise less strong
PRIMMEprimmeLOBRSVDFD
PRIMMEprimmeLOBRSVDFD
Increased spectral decay helps all methods, especially FD
High-Performance SVD for big data Computer Science Department College of William & Mary 30/50
![Page 33: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/33.jpg)
A = SDU + N/ζ, m = 200, ζ = 5 Rank-200 stronger than noise
PRIMMEprimmeLOBRSVDFD
PRIMMEprimmeLOBRSVDFD
Quality of eigenspaces is unnecessary for low projection error
High-Performance SVD for big data Computer Science Department College of William & Mary 31/50
![Page 34: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/34.jpg)
A = SDU + N/ζ, m = 200, ζ = 50 Rank-200 dominates
PRIMMEprimmeLOBRSVDFD
PRIMMEprimmeLOBRSVDFD
Single-vector PRIMME method fluctuates in runtime
High-Performance SVD for big data Computer Science Department College of William & Mary 32/50
![Page 35: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/35.jpg)
3D Laplacian 27000, looking for m = 200
PRIMMEprimmeLOBRSVDFD
PRIMMEprimmeLOBRSVDFD
RSVD faster and competitive in projection error — but the returned vectors after twoiterations are far from an eigenspace
High-Performance SVD for big data Computer Science Department College of William & Mary 33/50
![Page 36: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/36.jpg)
Comparison of PHSVDS with other methods
Ma
tve
c r
atio
ove
r p
rim
me
_svd
s(jd
qm
r)
0
2
4
6
8
10
lshp3025
jagmesh8
dw2048
lpbn12plddb
well1850
1 smallest w/o preconditioning, tol = 1e-14
primme_svds(gd+k)primme_svds(jdqmr)IRRHLBJDSVDIRLBASVDIFP
Ma
tve
c r
atio
ove
r p
rim
me
_svd
s(jd
qm
r)
0
2
4
6
8
10
lshp3025
jagmesh8
dw2048
lpbn12plddb
well1850
10 smallest w/o preconditioning, tol = 1e-14
primme_svds(gd+k)primme_svds(jdqmr)IRRHLBJDSVDIRLBASVDIFP
Ma
tve
c r
atio
ove
r p
rim
me
_svd
s(jd
qm
r)
0
2
4
6
8
10
fidap4
jagmesh8
lshp3025deter4
plddblpbnl2
10 smallest with P = ilutp(1e-3) or RIF(1e-3), tol = 1e-14
primme_svds(gd+k)primme_svds(jdqmr)JDSVDSVDIFP
Ma
tve
c r
atio
ove
r p
rim
me
_svd
s
0
2
4
6
8
10
fidap4
jagmesh8
lshp3025deter4
plddblpbnl2
Shift and Invert technique for 10 smallest
primme_svds(C-1
)
MATLAB svds(B-1
)
svdifp(C-1
)
High-Performance SVD for big data Computer Science Department College of William & Mary 34/50
![Page 37: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/37.jpg)
Software
High-Performance SVD for big data Computer Science Department College of William & Mary 35/50
![Page 38: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/38.jpg)
SVD Solvers
Software Method Parallel Precon. Languages
PRIMME PHSVDS M S G Y C Fortran Matlab Python R JuliaPROPACK IRLBD S N Fortran, Matlab
SLEPc TRLBD, JD/GD+k on ATA M S G Y C Fortran Matlab PythonSVDPACK Lanczos – N Fortran
MLlib Arn. ATA, RSVD Hadoop N Scala Python JavaScipy Arn. ATA, RSVD S N Python
pdbML RSVD M S N RonlinePCA Arn. ATA, RSVD, iSVD S N R
RBGen (Trilinos) iSVD M S G N C++
M: MPI (distributed memory), S: shared memory, G: GPUs
Arn. ATA: Arnoldi on ATA
High-Performance SVD for big data Computer Science Department College of William & Mary 36/50
![Page 39: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/39.jpg)
Eigenvalue Solvers
Software Method Parallel Precon. Languages
Anasazi (Trilinos) KS/GD/LOBPCG M S G Y C++ Python(P)ARPACK Arnoldi M S N Fortran Matlab Python R Julia
BLOPEX LOBPCG M S Y C MatlabMAGMA LOBPCG S G Y C++PRIMME JD(QMR)/GD+k/LOBPCG M S Y C Fortran Matlab Python R
SciPy LOBPCG S Y Python
M: MPI (distributed memory), S: shared memory, G: GPUs
High-Performance SVD for big data Computer Science Department College of William & Mary 37/50
![Page 40: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/40.jpg)
PRIMME
High-Performance SVD for big data Computer Science Department College of William & Mary 38/50
![Page 41: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/41.jpg)
Installation
Download:
get tarball from github and extracthttps://github.com/primme/primme/releases/latest
developer’s version git clone https://github.com/primme/primme
Compile:
make # build lib/libprimme.amake matlab # build lib/libprimme.a and Matlab’s modulemake octave # build lib/libprimme.a and Octave’s modulemake python # build lib/libprimme.a and Python’s module
High-Performance SVD for big data Computer Science Department College of William & Mary 39/50
![Page 42: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/42.jpg)
Installation
Set makefile variables, examples:
make CC=clang CFLAGS=-O2
make CC=icc CFLAGS="-O2 -xHost"
Alternatively, modify Make flags.
BLAS/LAPACK related flags in CFLAGS:
-DF77UNDERSCORE Fortran function symbols like dgemm , dsyevx ; if not, remove the flagfrom Make flags
-DPRIMME BLASINT SIZE=64 BLAS/LAPACK compiled with 64-bits integer (ILP64)
Example: make CC=clang CFLAGS="-O2 -DPRIMME_BLASINT_SIZE=64"
High-Performance SVD for big data Computer Science Department College of William & Mary 40/50
![Page 43: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/43.jpg)
Installation
Other make actions:
make lib (default action) build static library libprimme.a
make solib build dynamic library libprimme.so/.dylib
make clean remove all object files (*.o)
make test test libprimme.a
make all tests long test useful for developers
For testing you may need to set Fortran compiler and linking options for BLAS & LAPACK ifthey aren’t in defaults’ system paths. For instance
make test F77=gfortran \
LDFLAGS="-L/my/blaslapack -llapack -lblas"
High-Performance SVD for big data Computer Science Department College of William & Mary 41/50
![Page 44: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/44.jpg)
How to install BLAS & LAPACK
For Linux:
# Debian, Ubuntusudo apt -get install libblas -dev liblapack -dev
# Fedora, RHEL, Centossudo yum install blas -devel lapack -devel
# SuSEsudo zypper install blas -devel lapack -devel
For Windows you can use binary packages from OpenBLAS (bin/libopenblas.dll includesBLAS & LAPACK).
Optimized BLAS:
ATLAS https://sourceforge.net/projects/math-atlas/
OpenBLAS http://www.openblas.net/
Intel c© MKL https://software.intel.com/en-us/intel-mkl/
High-Performance SVD for big data Computer Science Department College of William & Mary 42/50
![Page 45: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/45.jpg)
Compile and Linking User Code
For compiling, include $PRIMME DIR/include
For linking, link with PRIMME, LAPACK and BLAS, in that order; PRIMME library is in$PRIMME DIR/lib
Eg to compile ex eigs dseq.c in $PRIMME DIR/examples:
gcc -c ex_eigs_dseq.c -I$PRIMME_DIR/include
gcc -o exe ex_eigs_dsec.o -L$PRIMME_DIR/lib \
-lprimme -llapack -lblas
High-Performance SVD for big data Computer Science Department College of William & Mary 43/50
![Page 46: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/46.jpg)
Matlab/Octave basic examples (I)
A = diag (1:50); A(200 ,1) = 0; % rectangular matrix of size 200x50
s = primme_svds(A,10) % the 10 largest singular values
s = primme_svds(A,10,’S’) % the 10 smallest singular values
s = primme_svds(A,10 ,25) % the 10 closest singular values to 25
opts = struct ();
opts.tol = 1e-4; % set toleranceopts.method = ’primme_svds_normalequations ’ % set svd solver methodopts.primme.method = ’DEFAULT_MIN_TIME ’ % set first stage eigensolver methodopts.primme.maxBlockSize = 2; % set block size for first stage[u,s,v] = primme_svds(A,10,’S’,opts); % find 10 smallest svd triplets
opts.orthoConst = {u,v};
[s,rnorms] = primme_svds(A,10,’S’,opts) % find another 10
High-Performance SVD for big data Computer Science Department College of William & Mary 44/50
![Page 47: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/47.jpg)
Matlab/Octave basic examples (II)
% Compute the 5 smallest singular values of a rectangular matrix using% Jacobi preconditioner on (A’*A)A = sparse(diag (1:50) + diag(ones(49,1), 1));
A(200 ,50) = 1; % size(A)=[200 50]P = diag(sum(abs(A).^2));
precond.AHA = @(x)P\x;
s = primme_svds(A,5,’S’,[],precond) % find the 5 smallest values
% Estimation of the smallest singular valueA = diag ([1 repmat (2 ,1 ,1000) 3:100]);
[~,sval ,~,rnorm] = primme_svds(A,1,’S’,struct(’convTestFun ’,@(s,u,v,r)r<s*.1));
sval - rnorm % approximate smallest singular value
High-Performance SVD for big data Computer Science Department College of William & Mary 45/50
![Page 48: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/48.jpg)
Matlab/Octave comparison% Download https://www.cise.ufl.edu/research/sparse/mat/Bates/sls.matA=load(’sls.mat’); A=A.Problem.A;
% Computing the 20 smallest singular values with default settingsopts.tol = 1e-6;
[U,S,V,resNorms ,stats]= primme_svds(A,20,’S’,opts); % 1601s
% Using Jacobi preconditioner on A∗A, P=diag(A’*A)P=diag(sum(abs(A).^2)); precond.AHA = @(x)P\x;
[U,S,V,resNorms ,stats]= primme_svds(A,20,’S’,opts ,precond ); % 139s
% If the preconditioner is good, Generalized Davidson may perform better% than Jacobi-Davidson. Change the method of the underneath eigensolver to GD.opts.primme.method = ’DEFAULT_MIN_MATVECS ’;
[U,S,V,resNorms ,stats]= primme_svds(A,20,’S’,opts ,precond ); % 130s
% Block methods and large basis size may improve convergence and performanceopts.maxBasisSize = 80; opts.maxBlockSize = 4;
[U,S,V,resNorms ,stats]= primme_svds(A,20,’S’,opts ,precond ); % 88s
% Recent MATLAB’s svds does LBD on R−1R−∗, where A = QR. It requires more than 16GiB to run[U,S,V]=svds(A,20,’S’,struct(’tol’,1e-6)); % more than 3000s
The matrix sls is the second largest matrix in number of nonzeros with kind least squareproblem in the University of Florida Sparse Matrix CollectionRuntimes reported on a machine with an Intel R© CoreTM CPU i7-6700K and 16GiB ofphysical memory
High-Performance SVD for big data Computer Science Department College of William & Mary 46/50
![Page 49: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/49.jpg)
Matlab/Octave Initial Space
A = diag (1:1000)+ diag(ones (999,1) ,-1)+ diag(ones (999 ,1) ,1);
opts.tol = 1e-12; % norm(r) <= norm(A)*tol% Compute the 10 closest eigenvalues to 99.5[x,d,r,stats] = primme_eigs(A, 10, 99.5, opts);
stats =
numMatvecs = 6302
elapsedTime = 7.9344
...
v0 = eye (1000); v0 = v0 (: ,95:104);
opts.v0 = v0;
[x,d,r,stats] = primme_eigs(A, 10, 99.5, opts);
stats =
numMatvecs = 4042
elapsedTime = 5.2255
...
High-Performance SVD for big data Computer Science Department College of William & Mary 47/50
![Page 50: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/50.jpg)
Distributed matrix-vector product
PRIMME assumes the matrix is distributed by rows
Every process shoud set how many local rows have, primme.nLocal
A
n × n
x
n × 1
= y
n × 1
nLocal for proc 0
nLocal for proc 1
nLocal for proc 2
High-Performance SVD for big data Computer Science Department College of William & Mary 48/50
![Page 51: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/51.jpg)
Versus SLEPc on a distributed memory system (I)
10-1
100
101
Tim
e r
atio o
ver
PH
SV
DS
(JD
QM
R) 5 largest w/o preconditioning, tol = 1e-6
cage15
atmosmodl
Rucci1
LargeRegFile sls
cont1_l
PHSVDS(JDQMR)
PHSVDS(GD+k)
SLEPc LBD
10-2
10-1
100
101
102
Tim
e r
atio o
ver
PH
SV
DS
(JD
QM
R) 5 smallest w & w/o preconditioning, tol = 1e-6
cage15
atmosmodl
Rucci1
LargeRegFile sls
PHSVDS(JDQMR)
PHSVDS(GD+k)
SLEPc LBD
PHSVDS(GD+k)+Prec
High-Performance SVD for big data Computer Science Department College of William & Mary 49/50
![Page 52: High-Performance SVD for big data - College of Computing & … · 2018-12-04 · Can take advantage of high-performance matrix-multivector product Can be accelerated by replacing](https://reader031.vdocuments.pub/reader031/viewer/2022041014/5ec5afcc07c68169ed08b62b/html5/thumbnails/52.jpg)
Versus SLEPc on a distributed memory system (II)
Number of converged singular triplets100 101 102
Runtim
e (
Second)
100
101
102
103
104
Seeking the largest 120 singular tripletswithout preconditioning
cage15
atmosmodl
Rucci1
LargeRegFile
sls
cont1_l
Number of converged singular triplets100 101 102
Runtim
e (
Second)
100
101
102
103
104
Seeking the smallest 120 singular tripletswith preconditioning
cage15
atmosmodl
Rucci1
LargeRegFile
sls
High-Performance SVD for big data Computer Science Department College of William & Mary 50/50