supercomputing & multi-core have i/o problems that ... · pdf filesupercomputing &...

52
Supercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. 160 Saratoga Ave. Suite 150 Santa Clara, CA 95051 www.samplify.com (888) LESS-BITS +1 (408) 249-1500 That Compression Can Solve 27 Sep 2011

Upload: nguyencong

Post on 08-Mar-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Supercomputing & Multi-core Have I/O Problems

That Compression Can Solve

Samplify Systems, Inc.160 Saratoga Ave. Suite 150

Santa Clara, CA 95051www.samplify.com(888) LESS-BITS+1 (408) 249-1500

That Compression Can Solve27 Sep 2011

Page 2: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Outline

� Introduction to Samplify Systems

� Samplify Prism Compression

� Prism Results on Integers

� Prism Results on IEEE-754 Floats

� “Good Enough” Results & Uncertainty Quantification

…simply the bits that matter®©2011 Samplify Systems, Inc.

� “Good Enough” Results & Uncertainty Quantification

� High-Performance Computing & Multi-core Bottlenecks

� Why Compression Can Help

Samplify & NCAR Collaboration ?

2

Page 3: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

About Samplify

• Intellectual Property company in Santa Clara, CA providing:

• Intellectual property for leading FPGAs & ASICs

• Semiconductors

• Module and system level

Executive Team:

Al Wegener, Founder & CTO• Industry-recognized compression expert• Inventor Samplify Prism compression• TI, Graychip, Morphics, Studer ReVox

Tom Sparkman, CEO• Sales and Marketing Semico Executive• 19 years Maxim, Motorola

…simply the bits that matter®©2011 Samplify Systems, Inc.

• Module and system level solutions

• Private company with >$22M raised from VCs & strategics (IDT & Schlumberger)

• Founded in March 2007

• 25 employees

3

Richard Tobias, VP Engineering• Engineering Semico Executive• Toshiba Semi, Pixelworks, White Eagle

(Quicksilver)

• 19 years Maxim, Motorola

Allan Evans, VP Marketing• Marketing & Technology Executive• Successful exits at Savi (LMCO), Netro

(NTRO), Stanford Telecom (Newbridge)

Page 4: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Applications for Samplify Technology

First Markets:

Ultrasound – Higher resolution ultrasound machines, lower power portables, enable U/S “ODM” model in China

CT – Double number of x-ray sensors in existing hardware. Lower cost of data transport

New Markets:

High Speed Imaging –2x frame rate, resolution

HPC – Supercomputing

…simply the bits that matter®©2011 Samplify Systems, Inc.

Lower cost of data transport and storage

Wireless Base Stations –Lowers cost of data transport in wireless infrastructure. Especially important for LTE.

Wireless Repeaters – Dual-band over existing copper infrastructure

Storage –2x throughput & capacity

Broadcast – Reduce SDI coax links, long-range HDMI over UTP

Automotive – Driver assistance, collision avoidance, etc.

4

Page 5: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Samplify’s Prism™ Signal Compression

• No other solutions operate as fast as Samplify. We start where they stop.• No psycho-visual/acoustic tricks. Samplify’s compression free from artifacts.• Operates in real time. Latency is very low with only a few samples of delay.• Validated by Experts: Herfkens (Stanford), Senzig (GE), several wireless OEMs• Samplify holds granted patents on integrating any lossless and lossy compression

into data converters (US 7,088,276) and in wireless base stations (US 8,005,152)

Q-CELP

…simply the bits that matter®©2011 Samplify Systems, Inc.

1 ksample/sec 40 Gsample/sec

…simply the bits that matter

Samplify spans 1ks-40Gs

10 ksps

ADPCM

Speech

LPC

100 ksps

Audio

to 50 Msps

Video

Q-CELP

5

Page 6: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Samplify Prism Eliminates Signal Whitespace

0 500 1000 1500 2000 2500 3000 3500 4000-150

-100

-50

0

50

100

150

� Time domain whitespace: peak to average ratio of signals

� Frequency domain whitespace: oversampling of narrowband signals

� Full resolution not delivered by ADCs and DSP algorithms

�� No “a priori” signal information No “a priori” signal information

…simply the bits that matter®©2011 Samplify Systems, Inc.

6

12 Bit Resolution

10.5 Effective Bits

�� No “a priori” signal information No “a priori” signal information requiredrequired

Using floating point does not repeal the Nyquist criterion !!

Page 7: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Prism Compression Algorithm & Modes

CompressionEngine

US 5,839,100

AdaptationEngine

Bit ratemonitor

Samplifycontroller

Compressed packetsInput samples

Param.tracking

RateTrakOptiBit

RateTrakVeribit

…simply the bits that matter®©2011 Samplify Systems, Inc.

monitorcontroller

MODE CONTROL RESULTS

tracking

7

• LOSSLESS• FIXED RATE• FIXED QUALITY

Page 8: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Samplify’s Customer Signal Database

3000+ customer signal files; 700+ GB of data, including:

• Medical (CT, ultrasound, MRI, digital x-ray, PET)• Wireless (GSM, W-CDMA, cdma2000, LTE, WiMax)• Instrumentation (scopes, waveform generators, SerDes)• Military/defense (radar, SAR, spectra)

…simply the bits that matter®©2011 Samplify Systems, Inc.

8

• Military/defense (radar, SAR, spectra)• Automotive (RGB, infrared, ultrasound, radar)• Geophysical (sonobuoys, oil/gas exploration)• Video (NTSC, PAL, HD)• Print and still images (CMYK, YCrCb, RGB, infrared)• Floating-point data sets (seismic, drug discovery, molecular

simulation, astrophysics, weather satellite, fluid dynamics)

Page 9: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Samplify Compression Results (Integers)

Signal Type Sample rate @ sample width

LosslessC. R.

Fixed rate C. R.& quality metrics

Typical customers

Wireless baseband (3G, LTE)

30.72 Msamp/sec @ 16 bits I & Q

1.2:1 – 1.5:1 1.6:1 – 2.3:1EVM, PCDE, ACLR

Ericsson, Huawei, ZTE

Wireless RF (3G, LTE)

600 Msamp/sec @ 16 bits I & Q

2:1 – 3:1 3:1 – 5:1EVM, PCDE, ACLR

Ericsson, Huawei, ZTE

Computedtomography

320,000 chans, 5 ksamp/sec @ 20 bits

1.6:1 – 2.7:1 3:1 – 4.5:1Radiologists & SSIM

GE, Philips, Toshiba

Ultrasound 64 - 256 chans, 1.5:1 – 2:1 2:1 – 3:1 GE, Siemens,

…simply the bits that matter®©2011 Samplify Systems, Inc.

9

Ultrasound(ADC)

64 - 256 chans, 50 Msamp/sec @ 12 bits

1.5:1 – 2:1 2:1 – 3:1Sonographers & SSIM

GE, Siemens, Sonosite

Ultrasound (beamformer)

4 beams, 12 Msamp/sec @ 18 bits

2:1 – 3:1 3:1 – 4:1Sonographers & SSIM

GE, Siemens, Sonosite

Images & video 60 frames/sec, 6 Msamp/sec @ 8 bits

1.5:1 – 2:1 2:1 – 3:1viewers, PSNR, SSIM

1000+ frames/sec

Oscilloscope (SerDes & LVDS)

60 Gsamp/sec@ 8 bits

1.3:1 – 2:1 2:1 – 4:1BER, rise/fall time

Agilent, Tektronix

Radar 3 Gsamp/sec@ 10 bits

2:1 – 3:1 3:1 – 5:1pd, pfa

Lockheed, Northrop

Page 10: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Integer Compression: CT Scanners

Example 1:

Compression of CT X-ray Sensor Values (20-bit integers)

…simply the bits that matter®©2011 Samplify Systems, Inc.

10

20 bits/sample x 3,000 samples/sec per detector

X 912 detectors per rowX 64 rows

= 3.5 Gbps

Page 11: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Integer Compression: CT Scanners

Bottleneck 1:

slip ringBottleneck

#1

…simply the bits that matter®©2011 Samplify Systems, Inc.

11

Bottleneck 2:

disk array

x-ra

y c

ou

nt

sensor number

1 200 500 800 1000

105

103

x-ray

source

x-ray

sensors

patient

Bottleneck#2

Page 12: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Lossy Compression Methodology

Compress

Decompress

100 200 300 400 500

50

100

150

200

250

300

350

400

450

500

A

Compression

“samplified”

image

“samplified” (compressed + decompressed)

projectiondata

…simply the bits that matter®©2011 Samplify Systems, Inc.

ImageReconstruction

Compress

100 200 300 400 500

50

100

150

200

250

300

350

400

450

500

BCT Projection

Data Files

Compression

ratios:

3:1, 4:1, etc.

12

original

imageoriginal

projection

data

Page 13: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Image Pair (SSIM_min = 0.9307)

…simply the bits that matter®©2011 Samplify Systems, Inc.

13

Page 14: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Success: 3:1 Compression for CT

Of 419 image pairs, Dr. Herfkens correctly identified 17 “samplified” images:

RadiologistJudgment

Number of Images

Pct of images

“Left & right images 402 of 419 95.9%

…simply the bits that matter®©2011 Samplify Systems, Inc.

14

“Left & right images look identical”

402 of 419 95.9%

“Few minor streaks” 1 of 419 0.2%

“Streaks in soft tissue” 16 of 419 3.9%

…but no effect on the radiologist’s clinical diagnosis

using images created from “samplified” x-rays !!

Page 15: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Integer Compression: 4G Wireless

Example 2:

Compression of 4G Wireless Baseband Signals

…simply the bits that matter®©2011 Samplify Systems, Inc.

15

16 bits/sample � 32 bits per (I, Q) sample pairx 30.72 Msamples/sec per antenna-carrierX 12 antenna-carriers per fiber-optic link

= 11.8 Gbps

Page 16: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

LTE Requires Distributed Base Stations

LTERRU LTE requires up

to 10 Gbps CPRI per sector

Remote radio units required for macro-celldeployments

To maintain coverage, LTE radio units deployed metro fiber.

LTERRU

MIMO technology for 4G makes passive antennas no longer feasible

CPRI incompatible with SONET/SDH �dark fiber required: DWDM/CWDM/PON

…simply the bits that matter®©2011 Samplify Systems, Inc.

16

LTEBBU

LTERRU

Hybrid 3G/4GRRU

Up to 10 km

sector

Each LTE RRU requires 8wavelengths across DWDM (6 for CWDM)

� 10 Gbps CPRI links very expensive!� LTE fiber optic CAPEX & OPEX up to 12x greater than 3G!

DWDM can support only 20 LTE RRUs;CWDM only 2

DWDM/PON

DWDM/CWDM/PON

Page 17: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

LTE Requires Distributed Base Stations

LTERRU

Samplify Prism IQ eliminates 10 Gbps CPRI links saving CAPEX

LTERRU

…simply the bits that matter®©2011 Samplify Systems, Inc.

17

LTEBBU

DWDM

LTERRU

Hybrid 3G/4GRRU

Up to 10 km

Samplify Prism IQ reduces OPEX of DWDM backhaul by 75%

� LTE fiber optic CAPEX & OPEX up to 12x greater than 3G!

Quadruple number of LTE RRUs deployed across dark fiber

Page 18: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Success: Save ~$1500 per 4G CPRI Link

Component No Compression Compression

Fiber Optic Line Rate 9.8 Gbps 3.027 Gbps

Radio Head FPGA Stratix IV GX Cyclone IV GX

FPGA Price (1K, 2009) $560.00 $65.00

Fiber Optic Transceivers $590.00 $100.00

Baseband FPGA Stratix IV GX Cyclone IV GX

BB FPGA Price (1K, 2009) $560.00 $65.00

2 Fibers at 6.144 Gbps required

withoutcompression

4x6.144 Gbps SFP fiber optic modules

…simply the bits that matter®©2011 Samplify Systems, Inc.

18

Total $1,710.00 $230.00

Cost Savings per Sector $1,480.00Installation cost of

2nd fiber optic cable (150 ft)

� SAM2308 enables deployment of LTE-capable RRUs today with single fiber optic cable

� No tower climbing required to install second fiber optic cable to upgrade to LTE

Page 19: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Compression Saves Mobile Industry $13.5B for LTE Deployment

� Industry expects 1M LTE base stations to be deployed world wide per year

� 3 sectors/CPRI links per base station

# LTE Base Stations Deployed per year

1M

Number of Sectors/CPRI links per base station

3

Number yrs of peak 3

…simply the bits that matter®©2011 Samplify Systems, Inc.

� LTE peak deployment years 2012-2014

�Compression saves $13.5B per year

19

Number yrs of peak deployment

3

Number of LTE CPRI Links

9M

Cost Savings per Link

$1,500

Total Savings $13.5B

Page 20: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Virtually Lossless at 7.5 Effective Bits (2:1 compression)

Configuration:• TD-LTE Downlink• 20 MHz BW• E-TM 3.1 per 3GPP

TS36.141

Results:• EVM = 0.55% rms

…simply the bits that matter®©2011 Samplify Systems, Inc.

20

• EVM = 0.55% rms

�Virtually lossless: Equivalent to Agilent test equipment

Page 21: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

4

5

6

7

8

EV

M (

%)

4:1 Compression for LTE (Downlink)

No compression = 15 bits

EVM limit for LTE Downlink at 64 QAM

is 8%

Prism IQ achieves 3.75 effective bits at

8% EVM = 4:1 compression

…simply the bits that matter®©2011 Samplify Systems, Inc.

3 4 5 6 7 8 9 100

1

2

3

Effective Number of Bits

EV

M (

%)

21

At 7.5 effective bits (2:1 compression)

EVM performance is equivalent to Agilent

test equipment

compression

Page 22: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Integer Compression: Imaging

Example 3:

Compression for 40 Mpixeland 2k frames/sec Cameras

…simply the bits that matter®©2011 Samplify Systems, Inc.

22

16 bits/pixel x 40 Mpixel/frame x 30 fps =

= 19 Gbps

16 bits/pixel x 1 Mpixel /frame x 2k fps =

= 32 Gbps

Page 23: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Prism Lossless Compression

• Lossless means bit-exact replica of original

• Samplify SignalZIP lossless compression achieved minimum 1.76:1 compression

2.09 : 1 1.90 : 1

…simply the bits that matter®©2011 Samplify Systems, Inc.

1.76:1 compression

• Algorithm operates in real time on FPGA

• Switch from lossless to lossy with a register setting

9/28/2011 V1.1

2.09 : 1

1.83 : 1

1.90 : 1

1.76 : 1

23

Page 24: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Prism Fixed-Rate Compression

• Fixed rate provides high quality compression at a given rate

• Minimal image degradation between different steps of

2.65:1Original

…simply the bits that matter®©2011 Samplify Systems, Inc.

different steps of compression

• Algorithm operates in real time on FPGA

• Switch from lossless to lossy with a register setting

9/28/2011 V1.1

2.65:1

3.15:1 3.60:1

Original

24

Page 25: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Infrared Imaging

Across 40 infrared images, Prism HD achieved

…simply the bits that matter®©2011 Samplify Systems, Inc.

25

~4:1 lossless

(12 grayscale bits per pixel)

Page 26: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Bayer Matrix Image Results

File Name

File Size

(bytes) CR lossless

SSIM @

2.0:1

SSIM @

2.5:1

SSIM @

3.0:1

SSIM @

3.5:1

SSIM @

4.0:1

Cam1-b.bin 3956064 1.70 0.9968 0.9887 0.9760 0.9598 0.9413

Cam1-g1.bin 3956064 1.62 0.9953 0.9858 0.9690 0.9473 0.9245

Cam1-g2.bin 3956064 1.62 0.9954 0.9860 0.9690 0.9482 0.9248

Cam1-r.bin 3956064 1.55 0.9951 0.9811 0.9596 0.9279 0.9127

Cam2-b.bin 3956064 2.12 1.0000 0.9946 0.9919 0.9853 0.9775

…simply the bits that matter®©2011 Samplify Systems, Inc.

Cam2-g1.bin 3956064 1.90 0.9980 0.9929 0.9837 0.9699 0.9566

Cam2-g2.bin 3956064 1.90 0.9979 0.9928 0.9842 0.9696 0.9590

Cam2-r.bin 3956064 1.84 0.9967 0.9927 0.9827 0.9669 0.9469

Cam3-b.bin 3956064 1.73 0.9960 0.9894 0.9775 0.9606 0.9449

Cam3-g1.bin 3956064 1.65 0.9955 0.9858 0.9692 0.9480 0.9275

Cam3-g2.bin 3956064 1.65 0.9958 0.9866 0.9688 0.9496 0.9282

Cam3-r.bin 3956064 1.61 0.9950 0.9840 0.9651 0.9381 0.9180

26

Page 27: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Example: HD Video @ 2.5:1 compression

…simply the bits that matter®©2011 Samplify Systems, Inc.

27

{-2, +5} {-3, +3} {-3, +6}

Page 28: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Compression of Floats: Prism FP*

Compression for High-Performance Computing

(HPC)

* floating point

…simply the bits that matter®©2011 Samplify Systems, Inc.

28

• Compressing Integers and Floating-Pt Values• For HPC Scientific, Technical & Multi-core Apps

FP

Page 29: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Prism FP Compression for HPC

Prism FP features:

• User-selectable lossless & lossy modes• Compresses integers and floating-point values• Low complexity (“fits under a bond pad or two”)• Low latency (< 6 clks to comp or decomp 4 numbers)• Trade higher latency for better compression

…simply the bits that matter®©2011 Samplify Systems, Inc.

29

• Trade higher latency for better compression• Scalable to PCIe Gen3, DDR3, & optical rates

• Targeted at HPC applications:

>> Prism FP solves multi-core I/O problems <<

Page 30: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Floating-point Basics

The ONLY Standard That Matters:

IEEE-754-2008

“mantissa”

…simply the bits that matter®©2011 Samplify Systems, Inc.

30

Page 31: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Prism FP Concept

Using floating-point representation:• doesn’t repeal the Nyquist criteria

• doesn’t reduce dynamic range requirements !!

+127(max exp)

exp = 523 bits {5 .. -17}

exp = -123 bits {-1 .. -23} exp = -7

23 bits {-7 .. -29}

10+38

Base 10 Base 2

± Inf, NaN

…simply the bits that matter®©2011 Samplify Systems, Inc.

0

Exponent: 5 5 4 2 -1 -2 -3 -5 -5 -7 -9 …

23 bits {-7 .. -29}

-128(min exp)

10-38

100

= 1.0000

Denorm,± Zero

equivalent

“noise floor”

31

Page 32: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Prism FP Results on Nvidia CUDA SDK

Signal & Datatype

Prism Real-time

Compression Rate

Prism Lossless

Comp Ratio

Prism Lossy Comp

Ratios & Quality Metrics

3G & 4G wireless,

16-bit integers

3 to 10 Gbps 1.2:1 – 1.5:1 1.6:1 – 2.3:1

EVM, PCDE, ACLR

Computed tomography,

20-bit integers

20 to 80 Gbps 1.6:1 – 2.7:1 3:1 – 4.5:1

Radiologists & SSIM

Medical ultrasound,

12-bit integers

50 to 300 Gbps 2:1 – 3:1 3:1 – 4:1

Sonographers & SSIM

Image sensors, 0.6 to 10 Gbps 1.5:1 – 2:1 2:1 – 3:1

…simply the bits that matter®©2011 Samplify Systems, Inc.

32

12-bit integers Viewers, PSNR, SSIM

Oscilloscopes,

8-bit integers

100 to 600 Gbps 1.3:1 – 2:1 2:1 – 4:1

BER, rise/fall time

k-means clustering,

32-bit floats

300 Mfloat/sec 1.4:1 – 2:1 2:1 – 4.5:1

SSIM, % error

Black-Sholes financial,

32-bit floats

100 Mfloat/sec 1.6:1 – 2.2:1 3:1 – 4:1

% error of mean and std

3D wireframe model,

32-bit floats

60 Mfloat/sec 1.9:1 – 2.6:1 2:1 – 3.5:1

visual inspection, SSIM

Example 1

Example 2

Page 33: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

k-means Clustering (from CUDA SDK)

Resulting oval measurements:

• location (xi, yi) and 2.5:1 compression

…simply the bits that matter®©2011 Samplify Systems, Inc.

33

• location (xi, yi) and

• axis length (Lx, Ly)

differ in the 6th decimal place, e.g.:

3.55873 vs. 3.55875

2.5:1 compression

Page 34: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Graphics: FP Wireframe & Textures

original decompressed

…simply the bits that matter®©2011 Samplify Systems, Inc.

34

2.75:1 compression, SSIM = 0.99

Page 35: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Geophysical Exploration Data Bottlenecks From Acquisition to Data Processing

3. Data storage &

Formats:

• LIS, DLIS

• SEG-D, -Y

• WellLog ML

…simply the bits that matter®©2011 Samplify Systems, Inc.

35

1. Seismic sensor acquisition

3. Data storage &intermediate results

4. Computation

2. Remote data transmission

�Data sets are petabytes in size!

Page 36: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Prism FP Results for HPC Seismic

Signal Type Signal Description Lossy Comp Ratio & Quality Metric or Resolution

Images Downhole imaging 20:1 to 60:1 @ SSIM > 0.99

Acoustic traces 5 acoustic files 2:1 to 4:1 @ 80+ dB

Acoustic archives Trace headers & signals 2:1 @ 99.1 dB3:1 @ 69.6 dB

Earth models Delta, epsilon, velocity 2:1 @ 137 dB3:1 @ 70 dB

…simply the bits that matter®©2011 Samplify Systems, Inc.

36

Forward path RTM Reverse Time Migration intermediate signal

3:1 to 4:1 @ 55 - 75 dB

Noise-reducedacoustic traces

Reverse Time Migration input signal 2:1 to 4:1 @ 45 - 60 dB

Pressure (Type 1) 4 pressure waveforms 2.66:1 to 3.47:1 @ 0.01 psi5.24:1 to 6.57:1 @ 0.1 psi

Pressure (Type 2) 1 pressure waveform 4.33:1 @ 0.01 psi6.2:1 @ 0.1 psi

Temperature 4 temperature waveforms 15.9:1 to 19.3:1 @ 0.01º C21.9:1 to 22.6:1 @ 0.1º C

Page 37: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Objective Metrics of Signal Quality:

…simply the bits that matter®©2011 Samplify Systems, Inc.

How to Quantify “Good Enough” Results

37

Page 38: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Prism Compression’s Effects on Results ?

Q: How does compression affect users’ signal quality?A: IT’S COMPLICATED – JUST TRY IT!

• Medical imaging:• computed tomography (CT): SSIM + radiologists’ assessment• ultrasound: working with 10+ Asian and 2 US ultrasound mfrs (sonographer assessment)

…simply the bits that matter®©2011 Samplify Systems, Inc.

(sonographer assessment)

• Wireless:• Measure EVM, ACLR, spectral emissions masks, PCDE

• Seismic: • Ask geophysicists to assess the quality of 3D Earth images• SSIM on 3D Earth “slices”• Try on both input signals (acoustic traces) and intermediate sigs

38

Page 39: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Simple Signal Quality Metrics

x(i) = original signaly(i) = decompressed signal

d(i) = x(i) – y(i) <<< difference signal

Some representative signal quality metrics include:

1. mean(d) error mean2. std(d) error standard deviation

…simply the bits that matter®©2011 Samplify Systems, Inc.

2. std(d) error standard deviation3. max(abs(d)) worst-case error4. SNR(x) – SNR(y) decrease in SNR5. 100 * rms(d) / rms(x) percent error6. FFT(y) – FFT(x) spectral effects

CAVEAT: These quality metrics are easy to measure, BUT they don’t tell you how the final results are affected !!

39

Page 40: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Image Quality Metrics

• Difference image: Di,j = Oi,j – Pi,j

• HU diffs: min(Di,j) and max(Di,j), vs.• Percentile-based HU diff thresholds

• Local contrast ratio:Contrast = sqrt (mean (∑ (O – O)2 ) )

…simply the bits that matter®©2011 Samplify Systems, Inc.

40

ContrastRMS = sqrt (mean (∑ (Oi,j – O)2 ) )

• Peak signal-to-noise ratio (PSNR) << not useful

• Just-noticeable differences (JND) << not available

• Masking effects (bone, air, etc.)• Structural Similarity (SSIM) << next page

Page 41: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Structural Similarity Metric (SSIM)

SSIM(O, P) = l(O, P) ● c(O, P) ● s(O, P)

= ( ) ● ( ) ● ( )2µOµP

µO + µP2 2

2σO σP

σ O + σ P2 2

σOP

σ O σ P

…simply the bits that matter®©2011 Samplify Systems, Inc.

41

Brightness(µ)

Contrast(σ)

“Structure”(cross-correlation)

Ref: Wang & Bovik, IEEE Signal Processing Magazine, Jan 2009

Page 42: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Uncertainty Quantification (1 of 2)

In general, uncertainty quantification has to incorporate research and development efforts in three key, irreducibletechnical areas:

…simply the bits that matter®©2011 Samplify Systems, Inc.

42

(1) Characterization of uncertainty in systemparameters and the external environment;

(2) Propagation of this uncertainty through largecomputational engineering models; and

(3) Verification and validation of the computationalmodels and incorporating the uncertainty of the models themselves into the global uncertainty assessment.

Page 43: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Uncertainty Quantification (2 of 2)

…simply the bits that matter®©2011 Samplify Systems, Inc.

43

Page 44: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

“What a Long, Strange Trip It’s Been”

“Multi-core Needs Compression” – REALLY??

…simply the bits that matter®©2011 Samplify Systems, Inc.

44

Page 45: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

a)b)

What is Numerical Data? Ints & Floats

…simply the bits that matter®©2011 Samplify Systems, Inc.

Figure 1

Prior Artc)

45

Page 46: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

NUMERICALINPUT(INTS /

FLOATS)

MULTI-CORENUMERICALPROCESSOR

NUMERICALOUTPUT(INTS /

FLOATS)

HPC is “Just” Numerical Processing

…simply the bits that matter®©2011 Samplify Systems, Inc.

INTERMEDIATERESULTS

(INTS / FLOATS)

46

Two kinds of HPC algs:

1. compute-bound2. I/O-bound

Samplify accelerates I/O-bound applications

Page 47: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

I/O Is A Real HPC & Multi-core Problem

GPU and multi-core trends:

• Cores scale (Moore’s Law), but I/O (pins, clks, mem speed) doesn’t• Core utilization (% busy) keeps decreasing (e.g. < 20% in seismic)• Nvidia GPUs with 16 lanes of PCIe Gen2 (8 GB/sec)

• In 2007: 192 SMPs (GeForce) � 41 MB/sec per core• In 2011: 512 SMPs (Fermi) � 15 MB/sec per core

• Intel x86

…simply the bits that matter®©2011 Samplify Systems, Inc.

47

• Intel x86• In 2006: 500 MB/sec per core • In 2011: 2 GB/sec for 4 cores � still 500 MB/sec per core

Int’l Supercomputing & Hot Chips Conferences:

• “Exascale is I/O-limited, while multi-core is easy” Jeffrey Vetter, DoE

• “Exascale is power-limited (20 MW/Exaflop)” Jack Dongarra, DoE

• “Communication-avoiding algorithms” Jim Demmel, UC Berkeley

Page 48: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

1. The real world is inherently noisy:• Real-world (vs. idealized) measurements contain noise• Signal-to-noise ratio (SNR) measures what part of measurements

are “useful” (ADC analogy: resolution vs. ENOB)• “Simulated real-world” computations add noise on purpose (Monte

Carlo)

2. The real world is inherently lowpass:

Why Lossy Comp is OK for HPC (1 of 2)

…simply the bits that matter®©2011 Samplify Systems, Inc.

2. The real world is inherently lowpass:• To a DSP guy, 2D Nyquist rate � choosing grid/mesh size for HPC• Time series of adjacent HPC grid/mesh points are correlated• Distance and time attenuate signals, often to r2 or r3 (e.g.

SerDes on backplanes, light in space, audio signals, etc.)• 2 kinds of HPC problems:

• those that can be validated against the real world, and • those that can’t (“theoretical” HPC problems…)

48

Page 49: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Why Lossy Comp is OK for HPC (2 of 2)

3. Application dyn range vs. Computational dyn range:

• The required dynamic range of HPC signals (input, intermediate, output) is typically lower than the dynamic range provided by 32/64-bit computational float engines

…simply the bits that matter®©2011 Samplify Systems, Inc.

49

• 32-bit and 64-bit floats are arbitrary:• Why not 21-bit or 16-bit mantissas? • Why 8-bit and 11-bit exponents? • Why not 5-bit or 16-bit exponents…

Simple goal: “good enough” answers … sooner and faster!

Page 50: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Future: Prism 4 for Multi-Core Engines

x86Core 1

x86Core 2

x86Core 3

x86Core 4

x86Core 5

x86Core 6

FrontSideBus

DDRx

&PCIeGen2

QPI or HT Ring,≤ 200 GB/sec (256-bit bus)

3 GHz cores,

DDRxDIMM

#2

DDRxDIMM

#1

C

C D C D C D

C DC DC D

Compress

C D

CD

CD

8 -18GB/sec

…simply the bits that matter®©2011 Samplify Systems, Inc.

x86 bottlenecks:o DDR3 (off-chip RAM)o PCIe (off-chip I/O)o Inter-core communicationso QPI and HyperTransport

50

PCIe Gen2 bus

3 GHz cores,1200 – 2000 pins

C

D

Compress

Decompress

8 GB/sec

GPU bottlenecks:o On-chip “shared RAM”o GDDR5 (video RAM)o PCIe (off-chip I/O)

Network bottlenecks:o Infinibando 10 GbE, 40 GbEo MPIo RapidIO

Page 51: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

How to Start? Send Samplify Signals, or Use Prism Software

Usual Samplify model: customers send Samplify (Al) > 700 GB

Option 1: Existing Prism 3 (ints) and Prism FP (floats) SW:

• Prism 3 for Windows and Matlab• Prism FP for Linux, Windows, and Matlab

Option 2: Easy Ports:

…simply the bits that matter®©2011 Samplify Systems, Inc.

51

Option 2: Easy Ports:

• fwrite_c, fread_c (for file I/O)• memcpy_c (for memory moves)

Option 3: More work, but possible:

• MPI_SEND_C, MPI_RECV_C (MPI)• What else?

Page 52: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector

Proposed Collaboration with NCAR

• Try Prism compression (Linux, Windows, Matlab)• Quantify your application’s BW and/or storage bottlenecks• Quantify your application’s sensitivity to input variations• Quantify your application’s “good enough” results level

or

…simply the bits that matter®©2011 Samplify Systems, Inc.

• Send Samplify your signals (in, intermediate, out) & we’ll do the work

Goal: publish collaboration results in 2012

Contact: Al [email protected]

408-221-1191

52