adaptive memory system over ethernet - usenix · pdf fileadaptive memory system over ethernet...

17
Adaptive Memory System over Ethernet Jun Suzuki, Teruyuki Baba, Yoichi Hidaka , Junichi Higuchi, Nobuharu Kami, Satoshi Uchida †† , Masahiko Takahashi, Tomoyoshi Sugawara, and Takashi Yoshikawa System Platforms Research Laboratories, NEC Corporation IP Network Division, NEC Corporation †† 2nd Computer Software Division, NEC Corporation

Upload: truongquynh

Post on 20-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Adaptive Memory System over Ethernet

Jun Suzuki, Teruyuki Baba, Yoichi Hidaka†, Junichi Higuchi, Nobuharu Kami, Satoshi Uchida††, Masahiko Takahashi,

Tomoyoshi Sugawara, and Takashi Yoshikawa

System Platforms Research Laboratories, NEC Corporation†IP Network Division, NEC Corporation

††2nd Computer Software Division, NEC Corporation

© NEC Corporation 2010Page 2 HotStorage ’10

Memory Scalability in Cloud Computing

• Cannot scale depending on service requirements• Service performance limited by memory• Slow block I/O devices

Needs for scaling memory beyond individually loaded amount

Computer memory is limited by individually loaded resources

MEMMEM

MEMMEM

MEMMEM

MEMMEM

MEMMEM

MEMMEM MEMMEM MEMMEM

© NEC Corporation 2010Page 3 HotStorage ’10

Resource Should Simultaneously be High-Performance and Networked

Networked• Resource share among multiple computers• Ease of management

High-Performance

• Large throughput, low latency• Avoid firmware process and memory copy to transfer data

© NEC Corporation 2010Page 4 HotStorage ’10

Related Works

(A) Intel Turbo Memory

• PCIe flash device for disk cache• Device driver between OS and disk driver

J. Matthews et al., “Intel Turbo Memory: Nonvolatile Disk Caches in the Storage Hierarchy of Mainstream Computer Systems”, ACM Trans. on Storage, vol.4, no. 2, article 4, 2008.

(B) Remote Page Swap

• Using memory of next machine with swapping• Standard interconnection, e.g., Ethernet

E. P. Markatos and G. Dramitinos, “Implementation of a Reliable Remote Memory Pager”, USENIX 1996 Annual Technical Conference, 1996.

SW

MEM

High Performance

Resource Shareby Network

TurboMemoryDriver

SATADriver

TurboMemory SATA

Block I/O from OS

© NEC Corporation 2010Page 5 HotStorage ’10

I/O Expansion BoxI/O Expansion Box

Our Method: Ethernet-Attached SSD as High-Speed Swap Device

PCIe SSD

High-Performance AND Resource Share

Standard EthernetStandard Ethernet

• Standard Ethernet, PCIe SSD

“ExpEther” Interconnect- PCIe over Ethernet [1]

[1] J. Suzuki et al., “ExpressEther – Ethernet-Based Virtualization Technology for Reconfigurable Hardware Platform”, 14th IEEE Symposium on High-Performance Interconnects, pages 45-51, 2006.

EE Brid.EE Brid. EE Brid.EE Brid. EE Brid.EE Brid.

PCIe DMAw/o Firmware, Memcopy

Hot-Plug and Remove

© NEC Corporation 2010Page 6 HotStorage ’10

Computer

Extending PCIe Tree over Ethernet• PCIe packet encapsulation into Ethernet frames• Ethernet region is PCIe switch

High-Speed Ethernet Transport [1]• Delay-based congestion control• < 8.5% of TCP-based delay

Memory

PCIe DMA over Ethernet

Memory

EthernetEthernet

ExpEtherBridge

ExpEtherBridge

ExpEtherBridge

ExpEtherBridge

HostBridgeHost

Bridge

ExpEtherBridge

ExpEtherBridge

PCIe bus PCIe DMA

CPUCPU

SSD BSSD BSSD ASSD A

No Driver

Standard Ethernet

[1] H. Shimonishi et al., “A Congestion Control Algorithm for Data Center Area Communications”, 2008 International CQR Workshop, 2008.

No Firmware ProcessNo Memory Copy

© NEC Corporation 2010Page 7 HotStorage ’10

Hot-Plug and Remove

ComputerA

ExpEtherBridge

ExpEtherBridge

ComputerB

ExpEtherBridge

ExpEtherBridge

ComputerC

ExpEtherBridge

ExpEtherBridge

ExpEtherBridge

ExpEtherBridge

SSDA

ExpEtherBridge

ExpEtherBridge

SSDB

SSDC

VLAN 1

Ethernet Ethernet VLAN 2

ExpEtherBridge

SSDs Assigned to Computer with VLAN Grouping• Adaptive assignment using system manager• PCIe-standard hot-plug and remove

ExpEtherBridge

SystemManager

© NEC Corporation 2010Page 8 HotStorage ’10

Evaluations

• Block I/O Performance of Ethernet-Attached SSD

• System Evaluation: In-Memory DB

© NEC Corporation 2010Page 9 HotStorage ’10

Evaluation Setups

SSDSSD

ComputerComputer

ExpEtherBridge

(a) ExpEther

ExpEtherBridge

ExpEtherBridge

(c) iSCSI with TOE(b) iSCSI

ExpEtherBridge

InitiatorInitiator

TOE NICTOE NIC

TargetTarget

TOE NICTOE NIC SSDSSD

InitiatorInitiator

NICNIC

TargetTarget

NICNIC SSDSSD

ConventionalConventional

ProposalProposal

© NEC Corporation 2010Page 10 HotStorage ’10

(b) Random Write IOPS

0

10000

20000

30000

40000

50000

60000

512 4K 16K 32K 64K 256K 1M 4M

Access Size

IOP

S

Direct Insertion

EE

iSCSI with TOE

iSCSI

(a) Random Read IOPS

0

10000

20000

30000

40000

50000

512 4K 16K 32K 64K 256K 1M 4M

Access Size

IOP

S

Direct Insertion

EE

iSCSI with TOE

iSCSI

Block I/O Performance (IOPS) of Ethernet-Attached SSD

Read Close to Host I/O Slot, Write Twice of TOE iSCSI

Host I/O Slot ExpEther iSCSI w/ TOE iSCSI

Ran. Read 100 92 50 14

Ran. Write 100 74 42 14

Ran. Read w/ Switch 100 91 46 14

Ran. Write w/ Switch 100 68 39 14

© NEC Corporation 2010Page 11 HotStorage ’10

Write IOPS Overhead and Its Solution

Computer

Memory Read Requests

SSD

Completion w/ Data

Number of SSD’s outstanding read requestlimited by its implementation

Increasing number of requests enhances performance close to host I/O slot

Up to 4 Requests

© NEC Corporation 2010Page 12 HotStorage ’10

System Evaluation: In-Memory Database

RDB: postgresql 8.1Bench: pgbench (TPC-B-like)CPU: Intel Core 2 QuadOS: CentOS 5.3 (Linux 2.6.18)Ethernet: 10GbESSD: 16-GB Partition of Fusion IO 160 GB (Write Improve Mode)#Client: 100Transaction per Client: 1000

SSDSSD

ComputerComputer

ExpEtherBridge

ExpEtherBridge

ExpEtherBridge

ExpEtherBridgeMemoryMemory

RDB FileRDB FileSwap

Placing RDB File on Ramdisk

© NEC Corporation 2010Page 13 HotStorage ’10

Scaling-Up beyond Main Memory

• Maintaining performance when DB files enlarged beyond system memory• >79% performance of all-in-memory at 4G Mem + ExpEther case

0

500

1000

1500

2000

2500

3000

3500

0 2 4 6 8 10 12

Total Size of RDB Files [GB]

Tra

nsa

ction p

er

Sec

.

4G Mem + EE (SSD)

2G Mem + EE (SSD)

8G Mem (HDD)

4G Mem (HDD)

2G Mem (HDD)

>79% Performance

Maintaining Performance

© NEC Corporation 2010Page 14 HotStorage ’10

Comparison with Conventional Protocol

0

500

1000

1500

2000

2500

3000

0 2 4 6 8 10 12

Total Size of DB Files [GB]

Tra

nsa

ction p

er

Sec.

Mem 4G + EE

Mem 2G + EE

Mem 4G + iSCSI

Mem 2G + iSCSI

Performance Gain

[Note] iSCSI with TOE could not be evaluated by software bug.Calculation indicates proposal outperforms it by 21%

Proposal outperforms iSCSI by 139% at best case

© NEC Corporation 2010Page 15 HotStorage ’10

Saving CPU Resource for Transaction Processing

High-speed swap saves CPU for user process

020406080

100

Mem4G +EE

Mem2G +EE

Mem8G

Mem4G +EE

Mem2G +EE

Mem8G

6G 10G

Total Size of RDB Files [GB]

CP

U U

til.

[%] software irq

hardware irq

waitnice

system

user

Swap v v v v v Consumed by page swap

© NEC Corporation 2010Page 16 HotStorage ’10

Conclusion

Adaptive Memory Expansionwith Ethernet-Attached SSD as High-Speed Swap Device

Standard Components• Standard Ethernet and PCIe SSD• No software driver for Ethernet expansion

High-Performance and Resource Share• PCIe DMA over Ethernet• Superior block-io performance than conventional protocol• PCIe hot-plug and remove

Proven System Merits• Maintains database performance beyond system memory

© NEC Corporation 2010Page 17 HotStorage ’10

Future Works

Simultaneous Share of SSD among multiple computers• PCIe I/O virtualization emerges• Efficient resource utilization• High-speed data share

Solve Performance Bottleneck of Storage and Database System• Network storage for system availability• Performance bottleneck by network storage