performance and energy implications of many-core caches for throughput computing
DESCRIPTION
Performance and Energy Implications of Many-Core Caches for Throughput Computing. C. J. Hughes, C. Kim, Y. Chen in Intel Labs IEEE Micro 2010. 2013010654 유승요. Throughput Computing. Throughput Computing Computing focuses on maximizing the throughput of workloads rather than latency - PowerPoint PPT PresentationTRANSCRIPT
LOGO
Performance and Energy Implica-tions of Many-Core Caches for
Throughput Computing
C. J. Hughes, C. Kim, Y. Chen in Intel LabsIEEE Micro 2010
2013010654 유승요
Throughput Computing
Throughput Computing Computing focuses on maximizing the throughput of workloads
rather than latency Huge number of calculation with parallelism
• Fit to many-core processor To keep many core busy
memory system must feed and facilitate efficient core-to-core communication
By using cache Hides latency of low level memory systems Fast core-to-core communication Sufficient bandwidth
Research Objective
Many-core cache design for throughput computing Considering power and performance
Different from traditional CPU cache More cores intercore communication Latency tolerant minimizing average ac-
cess time may not be best
Throughput Applications
Throughput Applications
Working set size Model use 256kB L2 cache larger working set benchmarks may be
slowed by L2 cache size Cache miss rate
Means L1 cache miss rate high miss rate means high lower level cache access rate
Prefetch coverage Percentage reduction in L1 misses when stride prefetcher added
high reduction rate means strong streaming pattern
Throughput Applications
Data sharing characteristics. Percentage of L2 cache lines and L2 cache accesses to data shared by given number of coresSharing degree
number of cores access in lineSpatial domain
Most data is private – 12 out of 15 benchmarks has more than 70 % none shared line with
Frequency domain Larger than percentage of spatial domain
Cache designConstraints
Two level caching• L1 private cache for each core• Last Level Cache(LLC)
Directory based hardware cache coherence• Entry contains tag and directory state information
Tiled design processor
Flexibility – key design criterion Determines a given line can reside in the LLC Affects the distance an LLC request and reply must travel
through the on-die network
Flexible design affects Access latency? more flexible design is better On-die bandwidth usage? more flexible design is better No. of unique lines? less flexible design is better Off-die bandwidth usage? less flexible design is better Effective cache capacities? less flexible design is better
Cache design -Private
Set core’s data to close location Private LLC Less unique cache
line
home tile Mapped by address
hashing function Tag directory in home
tile maintains line
Private design variation
Replication policy To increase unique cache lines Uncontrolled replication Controlled replication No replication
Uncontrolled replication Allow unlimited replication when unique line evicted
• Not in home tile, move it to home tile(migration)
Controlled replication Allow replication, but deprioritize line via replacement policy Use a bit for each line(reuse bit)
It follow cache line when it is evicted or transferred to other LLC If new line is inserted, line with 0 reuse bit is evicted.
Private design variation
No replication Shared line in
home tile Private line in access-
ing core’s tile LLC controller works
as directory controller Track private line with
Roaming Data Pointer(RDP)• When it is shared, re-
moves RDP entry and migrates line to home tile
Shared design
Shared Keep all line
in home tile
Maximize unique line
Increase in average access latency
and on-die traffic
Experimental setup
L1 cache withhardware stride prefetcher
64 switches for ring
Energy components LLC, tag directory, RDP
access On-die data messages On-die coherence
messages Off-die access
Performance and Energy Consumption
(a)Performance of the 5 LLC designs relative to shared
Least flexible design is better Flexible designs intended to minimize ac-
cess latency But in throughput computing, cache miss la-
tency is hided via multithreading or prefetch-ing
Critical path heavily rw shared line Least flexible designs centralized storage
• Home tile respond requires no acknowledgment Flexible designs private caches only
• Not allow muliple cache to cache transfer• Tag directory needs acknowledgment from tile
Performance
flexible design is better Saving on-die traffic Increased off-die traffic small effect
Unique line policy in private design Controlled and uncontrolled design Reduce off-die access
Energy consumption
Energy consumption of the 5 LLC designs relative to sharedP : private, U : uncontrolled replication, C : controlled replication, N : no replication, S : shared, T:tag directory buffer
Designing for performance and energy
Tag Directory Buffer Small FA buffer to hold clean lines Handle read request for clean lines Acts like shared based model Add a bit in each line’s directory entry
• to check concurrent share• Save space and traffic
Tag directory buffer hit rates of different buffer sizes
Alternatives
Sharing Migration Coping read shared lines to the home
LLC tiles Also needs acknowledgment from home
tile
Parallel reads Modify coherence protocol and directory
hardware to allow simultaneous transfer Not increasing data traffic Changing protocol and hardware Still needs cache-to-cache transfer
• Slower than tag directory buffer
Impact of increased read parallelism
(a)performance and (b) energy consumption for designs that attempt to increase read parallelism
Impact of increased read parallelism
Tag Directory Buffer Faster but increase in energy
• Copying data to home tile• Data reply from tag directory has longer path
Alternatives Parallel reads : slower than TDB
& same energy consumption with Con-trolled
Sharing migration : no performance boost & increase in energy
Read throughput isn’t sufficient
Conclusion
Tag Directory Buffer 10% faster than private designs 55% energy saving compare to shared
designs
Next Work More complex hierarchy More fundamental changes in hierarchy
LOGO
www.themegallery.com
www.themegallery.com