kilmo choi [email protected]

17
Embedded System Lab. Embedded System Lab. 최 최 최 Kilmo Choi [email protected] A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, Chengyong Wu

Upload: artie

Post on 25-Feb-2016

51 views

Category:

Documents


6 download

DESCRIPTION

A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems. Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao , Mingyu Chen, Chengyong Wu. Kilmo Choi [email protected]. Contents. Background and Motivation Bank-Level Partition Mechanism(BPM) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.

Embedded System Lab.최 길 모

Kilmo [email protected]

A Software Memory Partition Approach for Eliminating Bank-level Interference in Multi-

core Systems

Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, Chengyong Wu

Page 2: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Contents

Background and Motivation

Bank-Level Partition Mechanism(BPM)

Results

Conclusion

Reference

Page 3: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Background and Motivation

Memory bank

- The same set of memory access speed

Multicore platform

- Multiple banks can serve memory re-

quests independently and concurrently

Bank-Level Parallelism

Page 4: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Background and Motivation Row buffer conflict

Causes performance degradation(throughput slowdown and unfairness )

ex. row buffer hit rate decrease from 1 core(over 60%) to 16 core(35%)

Core CoreCore CoreCore Core

Row 0

Row 1

Row 2

Row 3

Activate Operation

R/W

Row 0

Row 1

Row 2

Row 3

R/W

Row 0

Row 1

Row 2

Row 3

R/W

Row 0

Row 1

Row 2

Row 3

R/W

Activate Operation

Core Core

Row-buffer Conflict

PrechargeOperation

Row-buffer Hit

Core 0 Access data in Row 1

Core 0 Access in the same page Core 1 Access data on Row 3

Page 5: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Bank-Level Partition Mechanism(BPM) Numerous new memory scheduling algorithms have been proposed

to address the interference problem However, these algorithms usually employ complex scheduling logic and need

hardware modification to memory controllers

Overview of BPM OS memory management system uses a page-coloring mechanism to partition

banks into several groups and maps each thread (process) to a specific bank group Address mapping policy

Page 6: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Bank-Level Partition Mechanism(BPM)

Core Core Core Core

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Bank Bank Bank Bank Bank Bank Bank Bank

Page 7: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Bank-Level Partition Mechanism(BPM) Advantages

row buffer conflict ↓ row buffer hit ↑

BPM is entirely software approach Flexible

Easier for OS to monitor thread’s behavior than hardware

Bank-level conflicts can be fully eliminated by exclusively mapping a thread’s data to specific banks

How much influence the performance of thread amount of available bank?

Page 8: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Bank-Level Partition Mechanism(BPM) Discover bank bit by software method(Algorithm)

(Uncached)

0

1

xFOR

x

Higher latency Row{ }

Left parts Remain{ }

Row 0

Row 1

Row 2

Row 3

Row hit

Row miss

0

1

yFOR y{FOR x}

Higher latency Column{ }

Left parts BANK{ }

Row 0

Row 1

Row 2

Row 3

Row miss

x0

1

Row{ }

Remain{ }

Row 0

Row 1

Row 2

Row 3Mapped to different

banks

Page 9: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Bank-Level Partition Mechanism(BPM)

Advantages

row buffer conflict ↓ row buffer hit ↑

BPM is entirely software approach Flexible

Easier for OS to monitor thread’s behavior than hardware

Page 10: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Results Environments

4 cores, 2.8GHz Intel Core i7-860 processor, 8GB DDR3 main memory with

64banks, 5 bank bits

CentOS Linux 5.4 with kernel 2.6.32.15

SPEC CPU2006 benchmarks

Page 11: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Results Overall system performance

Page 12: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Results Page-Policy and Power

Page 13: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Results BPM VS Cache-Partition-Only

The correlation between BPM improvements and Per-core bandwidth

Page 14: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Conclusion BPM is a new approach to eliminate the interference between threads

and improve the overall system performance

BPM achieves this goal by assign different group of banks to different

threads to eliminate inter-thread bank-level interference

This leads to the reduction of row buffer misses as well as the energy

consumption of memory system

Page 15: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

Reference J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gain-

ing Insights into Multicore Cache Partitioning: Bridging the Gap be-

tween Simulation and Real Systems. In HPCA-14, 2008.

Junghoon Kim, Junghan Kim, Youngik Eom. A Page Coloring Scheme

through Page Cache Separation for Improving Cache Performance, In

NIPA-2010

Dimitris Kaseridis, Jeffrey Stuecheli, Lizy Kurian John. Minimalist

Open-page: A DRAM Page-mode Scheduling Policy for the Many-

core Era. In MICRO 44, 2011

Page 16: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

부록 : Page Coloring

virtual page numberVirtual address page offset

physical page numberPhysical address Page offset

Address translation

Cache tag Block offsetSet indexCache address

Physically indexed cache

page color bits

… …

OS control

=

• Physically indexed caches are divided into multiple regions (colors).• All cache lines in a physical page are cached in one of those regions (colors).

OS can control the page color of a virtual page through address mapping (by selecting a physical page with a specific value in its page color bits).

Page 17: Kilmo  Choi rlfah926@naver.com

Embedded System Lab.최 길 모

부록 : Page Coloring

… …

...

………

………

Physically indexed cache

………

………

Physical pages are grouped1234

i+2

ii+1

…Process 1

1234

i+2

ii+1

…Process 2