kilmo choi [email protected]
DESCRIPTION
A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems. Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao , Mingyu Chen, Chengyong Wu. Kilmo Choi [email protected]. Contents. Background and Motivation Bank-Level Partition Mechanism(BPM) - PowerPoint PPT PresentationTRANSCRIPT
Embedded System Lab.
Embedded System Lab.최 길 모
Kilmo [email protected]
A Software Memory Partition Approach for Eliminating Bank-level Interference in Multi-
core Systems
Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, Chengyong Wu
Embedded System Lab.최 길 모
Contents
Background and Motivation
Bank-Level Partition Mechanism(BPM)
Results
Conclusion
Reference
Embedded System Lab.최 길 모
Background and Motivation
Memory bank
- The same set of memory access speed
Multicore platform
- Multiple banks can serve memory re-
quests independently and concurrently
Bank-Level Parallelism
Embedded System Lab.최 길 모
Background and Motivation Row buffer conflict
Causes performance degradation(throughput slowdown and unfairness )
ex. row buffer hit rate decrease from 1 core(over 60%) to 16 core(35%)
Core CoreCore CoreCore Core
Row 0
Row 1
Row 2
Row 3
Activate Operation
R/W
Row 0
Row 1
Row 2
Row 3
R/W
Row 0
Row 1
Row 2
Row 3
R/W
Row 0
Row 1
Row 2
Row 3
R/W
Activate Operation
Core Core
Row-buffer Conflict
PrechargeOperation
Row-buffer Hit
Core 0 Access data in Row 1
Core 0 Access in the same page Core 1 Access data on Row 3
Embedded System Lab.최 길 모
Bank-Level Partition Mechanism(BPM) Numerous new memory scheduling algorithms have been proposed
to address the interference problem However, these algorithms usually employ complex scheduling logic and need
hardware modification to memory controllers
Overview of BPM OS memory management system uses a page-coloring mechanism to partition
banks into several groups and maps each thread (process) to a specific bank group Address mapping policy
Embedded System Lab.최 길 모
Bank-Level Partition Mechanism(BPM)
Core Core Core Core
Row 0Row 1Row 2Row 3
row buffer
Row 0Row 1Row 2Row 3
row buffer
Row 0Row 1Row 2Row 3
row buffer
Row 0Row 1Row 2Row 3
row buffer
Row 0Row 1Row 2Row 3
row buffer
Row 0Row 1Row 2Row 3
row buffer
Row 0Row 1Row 2Row 3
row buffer
Row 0Row 1Row 2Row 3
row buffer
Bank Bank Bank Bank Bank Bank Bank Bank
Embedded System Lab.최 길 모
Bank-Level Partition Mechanism(BPM) Advantages
row buffer conflict ↓ row buffer hit ↑
BPM is entirely software approach Flexible
Easier for OS to monitor thread’s behavior than hardware
Bank-level conflicts can be fully eliminated by exclusively mapping a thread’s data to specific banks
How much influence the performance of thread amount of available bank?
Embedded System Lab.최 길 모
Bank-Level Partition Mechanism(BPM) Discover bank bit by software method(Algorithm)
(Uncached)
0
1
xFOR
x
Higher latency Row{ }
Left parts Remain{ }
Row 0
Row 1
Row 2
Row 3
Row hit
Row miss
0
1
yFOR y{FOR x}
Higher latency Column{ }
Left parts BANK{ }
Row 0
Row 1
Row 2
Row 3
Row miss
x0
1
Row{ }
Remain{ }
Row 0
Row 1
Row 2
Row 3Mapped to different
banks
Embedded System Lab.최 길 모
Bank-Level Partition Mechanism(BPM)
Advantages
row buffer conflict ↓ row buffer hit ↑
BPM is entirely software approach Flexible
Easier for OS to monitor thread’s behavior than hardware
Embedded System Lab.최 길 모
Results Environments
4 cores, 2.8GHz Intel Core i7-860 processor, 8GB DDR3 main memory with
64banks, 5 bank bits
CentOS Linux 5.4 with kernel 2.6.32.15
SPEC CPU2006 benchmarks
Embedded System Lab.최 길 모
Results Overall system performance
Embedded System Lab.최 길 모
Results Page-Policy and Power
Embedded System Lab.최 길 모
Results BPM VS Cache-Partition-Only
The correlation between BPM improvements and Per-core bandwidth
Embedded System Lab.최 길 모
Conclusion BPM is a new approach to eliminate the interference between threads
and improve the overall system performance
BPM achieves this goal by assign different group of banks to different
threads to eliminate inter-thread bank-level interference
This leads to the reduction of row buffer misses as well as the energy
consumption of memory system
Embedded System Lab.최 길 모
Reference J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gain-
ing Insights into Multicore Cache Partitioning: Bridging the Gap be-
tween Simulation and Real Systems. In HPCA-14, 2008.
Junghoon Kim, Junghan Kim, Youngik Eom. A Page Coloring Scheme
through Page Cache Separation for Improving Cache Performance, In
NIPA-2010
Dimitris Kaseridis, Jeffrey Stuecheli, Lizy Kurian John. Minimalist
Open-page: A DRAM Page-mode Scheduling Policy for the Many-
core Era. In MICRO 44, 2011
Embedded System Lab.최 길 모
부록 : Page Coloring
virtual page numberVirtual address page offset
physical page numberPhysical address Page offset
Address translation
Cache tag Block offsetSet indexCache address
Physically indexed cache
page color bits
… …
OS control
=
• Physically indexed caches are divided into multiple regions (colors).• All cache lines in a physical page are cached in one of those regions (colors).
OS can control the page color of a virtual page through address mapping (by selecting a physical page with a specific value in its page color bits).
Embedded System Lab.최 길 모
부록 : Page Coloring
… …
...
………
………
Physically indexed cache
………
………
Physical pages are grouped1234
…
i+2
ii+1
…Process 1
1234
…
i+2
ii+1
…Process 2