memory models in software and in hardware practical considerations
Post on 21-Dec-2015
219 views
TRANSCRIPT
![Page 1: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/1.jpg)
Memory Models
In Software and in Hardware
Practical Considerations
![Page 2: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/2.jpg)
Agenda
• Motivation
• Factors
• Levels of Memory Models– Models for software: Java, CLI
– Models for hardware: IA-32, IA-64
![Page 3: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/3.jpg)
MM Motivation and Factors
http://citeseer.nj.nec.com/adve95shared.html
![Page 4: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/4.jpg)
MM Motivation
• Multithreaded programming– Shared memory
• An example: producer/consumer queue
• Does it work correctly?– The program performs the operations in the correct order!
Task t = new Task();
queue.insert(t);
Task t = queue.get();
t.run();
Thread 1 Thread 2
![Page 5: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/5.jpg)
Memory Model Levels
Programmer-LevelModels
Programmer-LevelModels
Implementor-LevelModels (Virtual Machine)
Implementor-LevelModels (Virtual Machine)
Implementor-LevelModels (Hardware)
Implementor-LevelModels (Hardware)
IA-32, IA-64, Alpha, PowerPC, TSO, PSO,
etc.
Java Memory Model (Implementor View),
Microsoft CLI
Java MM, CLI MM, SC, Coherence, Release
Consistency, etc.
Compiler
VM
![Page 6: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/6.jpg)
Factors that Affect MM
• Compiler: performs optimizations
• [Virtual Machine]: yet more optimizations
• Processor: performs operations out of order
• Memory subsystem: delivers updates out of order
![Page 7: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/7.jpg)
MM Factors: Compiler & VM
• Compilers– Store values in registers– Reorder operations
• Example
int x = 0, answer = 0;
void f() { while (!answer) { x = x+1; }}
int x = 0, answer = 0;
void f() { while (!answer) { x = x+1; }}
int x = 0, answer = 0;
void f() { int tmp1 = x; int tmp2 = answer; while (!tmp2) { tmp1 = tmp1+1; } x = tmp1;}
int x = 0, answer = 0;
void f() { int tmp1 = x; int tmp2 = answer; while (!tmp2) { tmp1 = tmp1+1; } x = tmp1;}
No read from memory
No write to memory
Held in register all the time
![Page 8: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/8.jpg)
MM Factors: Processor
• Includes a lot of features that help it tolerate memory latency– Most of them change the order of memory operations
• Examples– Out-of-order execution : The most important
performance-enabler of modern processors
– Write combining : Reads/writes to the same cache line
– Read/write buffers
– Many more
![Page 9: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/9.jpg)
MM Factors: Memory Subsystem
• Hardware– Cache Coherence Protocols
• Software– DSM Coherence Protocols
![Page 10: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/10.jpg)
The TradeoffThe more optimizations are there in the system, the less transparent it is to the programmer
Sequential Consistency Any Order
Transparency Perfo
rman
ce
![Page 11: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/11.jpg)
Programmer View Models
Java – Original specification
Java – New specification
Microsoft’s CLI (.NET) specification
![Page 12: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/12.jpg)
Java MM – Original Spec
• Java Language Specification, Chapter 17 http://java.sun.com/docs/books/jls/
• A. Gontmakher, A. Schuster, ACM TOCS, vol. 18, No. 4, pp. 333-386 http://www.cs.technion.ac.il/~assaf/publications/java.ps
• Defines an abstract virtual machine– Really hard to understand– Non-compliant implementation by SUN (!!!)– Many other problems
![Page 13: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/13.jpg)
Java MM: Motivation
• Built-in synchronization– Modeled after monitors– Integrated with memory model
• Performance: Avoid synchronization– Immutable objects
![Page 14: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/14.jpg)
Java MM: The Abstract ModelThread 1
Local memory
Executionengine
Executionengine
Thread 2
Local memory
Executionengine
Executionengine
Main memory
useuse assignassign
loadload storestore
readread writewrite
useuse assignassign
loadload storestore
readread writewrite
![Page 15: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/15.jpg)
Java MM: The Constraints
read x,v load x,v use x,vassign x,v store x,v write x,v
read x,v load x,vwrite x,v store x,v
load x,v use x,v
store x,v assign x,v … and more
Thread 1
Local memory
Executionengine
Executionengine
Main memory
useuseassignassign
loadloadstorestore
readread writewrite
Not always(Prescient Stores)
![Page 16: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/16.jpg)
Java MM: Applying The Modelx==1y==1y=1 x=1
read y,1 read x,1
load y,1 load x,1
use y,1 use x,1
assign x,1 assign y,1
store x,1 store y,1
write x,1 write y,1
![Page 17: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/17.jpg)
Java MM: How To Deal With
• Determine the dependencies between use/assigns that follow from the constraints
• Then, ignore all the operations except for use/assigns
• Non-Operational Model!
![Page 18: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/18.jpg)
Java MM - Views
use/assign
load/store
read/write
use/assign
load/store
read/write
Programmer View(non-operational)
Implementor View(non-operational)
Program
mer V
iew(operational)Implementor View
(operational)
![Page 19: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/19.jpg)
Java MM: Characterizations
• Java is stronger than Coherence– Proof below
• Volatile variables: Sequential Consistency
• Locks: variant of Release Consistency– Semantics of locks not SC or PC (and not stated
explicitly at all).
![Page 20: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/20.jpg)
Java MM – Characterizations 2• Full definition: regular variables
– Based on Legal Serialization. Constraints:
– Excludes Prescient Stores– Proof: 5+ pages
r x,vw y,w
r/w xr/w x
Legend:Sees a value written by another thread
Same Variable rule
Transistor rule
![Page 21: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/21.jpg)
Java MM – Characterizations 3• Java: full definition (regular variables only)
– Constraints:
– Includes Prescient Stores– Proof: 20+ pages!– Coherence follows from the first Constraint
r x,vr y,1r y,2w y,w
r x,vw y,1r y,2w y,w
r x,v
w y,2wy,w
r/w xr/w x
Legend:Writes a value seen by another thread
![Page 22: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/22.jpg)
Java MM – Coherence Proof 1:Java is not weaker than Coherence
• Take operations for variable X from all threads.
• Divide each thread into blocks:
load-block: load (use)*
store-block: assign (use|assign) store (use)*
• Each block: one load/store operation.
• Sort the blocks by their memory accesses.
• Result: legal serialization of use/assigns to X.
![Page 23: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/23.jpg)
Java MM – Coherence Proof 2:Java is stronger than Coherence
• Coherence: easily shown
• Java (without Prescient Stores):– Transistor Rule: 1.1 1.2, 2.1 2.2– Legal Serialization: 2.2 1.1, 2.1 1.2– Cycle of dependencies!
Thread 1 Thread 2
1 use x,1 1 use y,12 assign y,1 2 assign x,1
![Page 24: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/24.jpg)
Java MM – Coherence Proof 3Prescient Stores
• A store can move presciently up– Before its corresponding assign– But not before another load/store
• The previous execution now valid– But it can still be fixed…
Thread 1read x,1read y,0read y,2write y,1
Thread 2read y,1read x,0read x,2write x,1
Thread 3write x,2write y,2
Necessarily has a load
The store, even prescient, now
cannot move up
![Page 25: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/25.jpg)
Java MM: Conclusions
• Programming with Locks: easy
• Programming with volatile variables: easy
• Programming with regular variables:– Using just Coherence – OK– Using full definition – hard– Really accounting for Prescient Stores -
nightmare
![Page 26: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/26.jpg)
New Java MM
In process, by Bill Pugh et. al.
http://www.javasoft.com/aboutJava/communityprocess/jsr/jsr_133.html
http://www.cs.umd.edu/~pugh/java/memoryModel/semantics.pdf
![Page 27: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/27.jpg)
New Java VM: Motivation
• Correctly synchronized programs must have SC semantics
• Incorrectly synchronized programs must have (safe) semantics– Safety: JVM must never fail– Security: Prevent attacks based on
unsynchronized code
![Page 28: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/28.jpg)
New Java MM: Requirements
• Backward Compatibility– No new language constructs– No new VM instructions– No system-specific artifacts, e.g. garbage collection
• Clear Distinction between compiler and VM– No optimizations in the compiler– Thus, VM model is the same as the one visible to the
programmer
• Implementability– No unrealistic requirements on software or hardware
![Page 29: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/29.jpg)
New Java VM: The Approach
• Exact semantics for all memory accesses– Not really relevant– Except that SC for Properly Labelled (no data
races) programs can be shown
• Semantics for support of established idioms– Final fields– Volatile variables– Locks
• Quite practical
![Page 30: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/30.jpg)
New Semantics of FinalImmutable objects
• Many objects in Java are designed to be immutable– Rationale: avoiding synchronization– Best known example – java.lang.String
• The problem: String not really immutable– Can see writes to the buffer, but not to the
length and offset!
• Security hole
![Page 31: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/31.jpg)
New Semantics of FinalFixing immutable objects
• Solution 1: Make ALL String methods synchronized– Serious hit at performance– Not needed on single-processor machines
• Solution 2: Extending semantics of final fields– Access that reads a final field, sees it initialized– An object must not escape the constructor
• Problem: String: array elements cannot be final– “weak acquire semantics”: reads dependent on the final
field are seen initialized too
![Page 32: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/32.jpg)
New Semantics for Volatile
• Previously: Sequential Consistency– But: no relation with the regular operations– Not really useful for synchronization (recall the
producer/consumer example)
• Now: Acquire/Release Semantics– Read works as Acquire– Write works as Release
![Page 33: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/33.jpg)
New Semantics of VolatileDouble-Checked Locking
• An object s must be created first time it is requestedsynchronized(s) { if (s==null) s = new S(); }– Slow! Locking on each access
• Double-Checking:if (s==null) { synchronized(this)
if (s==null) s = new S(); }
• The reader can reorder access to s and to its fields
• But, if s is volatile, it works!
![Page 34: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/34.jpg)
New Semantics of VolatileAdvanced Double-Checking
static volatile boolean initialized = false;
if (!initialized) {synchronized(this) {
if (!initialized) {s1 = new S();s1.connect(…);initialized = true;
}}}
Final fields won’t help
![Page 35: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/35.jpg)
New Semantics of Locks
• Only locks on the same variable have acquire/release semantics– Simplifies implementation– Different locks do not synchronize anyway, so no
need for acquire
• In original spec, each lock is a memory barrier– Even synchronized(new Object()) {}– Compiler cannot safely remove locks– In the new semantics, recursive locks are no-op
![Page 36: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/36.jpg)
CLI Memory Model
The VM for Microsoft’s .NET
http://www.ecma.ch/ecma1/STAND/ecma-335.htm
Standard ECMA-335, Common Language Infrastructure
Chapter 11.6, Memory Model and Optimizations
![Page 37: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/37.jpg)
CLI Memory Model
• So Short!!! Just 4 pages• The system
– Flat shared memory– Threads access the same memory
• Any reordering of operations is permitted– Except volatile reads/writes– Except synchronous exceptions
• Atomic access defined for some operations• Threading APIs define synchronization semantics
![Page 38: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/38.jpg)
CLI: Volatile Consistency
• Volatile reads and writes– Accesses to volatile variables– Explicit methods: Thread.VolatileRead,
Thread.VolatileWrite– Thread.MemoryBarrier – same as both VolatileRead
and VolatileWrite
• Volatile read – acquire semantics, volatile write – release semantics
• Different threads can see different orders of volatile writes of different threads
![Page 39: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/39.jpg)
CLI: Locks
• Usual locking semantics: obtaining and releasing locks– Synchronized methods– System.Threading.Monitor class – simulates
C.A.R. Hoare’s monitor (only tries to; simulation is no more complete than in Java)
• Acquiring lock has acquire semantics, releasing – release semantics
![Page 40: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/40.jpg)
CLI: Atomic Memory Accesses
• Word-length accesses, aligned 4-byte accesses are atomic
• System.Threading.Interlocked: atomic read-modify-write operations– Increment, Decrement, Exchange,
CompareExchange
• One and Two-byte reads are atomic. Byte writes may write the whole word
![Page 41: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/41.jpg)
Conclusions: Using CLI
• All concurrent accesses might be synchronized using synchronized methods or Monitor class
• Volatile variables: no common order. Probably usable in the simplest cases– Designed for accessing hardware registers. There it fits
• Atomic memory access: no memory barrier semantics– Probably just forgotten
– Useful in some simple cases
![Page 42: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/42.jpg)
Conclusions: Implementing CLI
• Lots of disclaimers in the spec – no unimplementable requirements. Thus, implementation is straightforward– For instance, Alpha has no instruction to write a
byte – implementation of atomic write would be problematic. Java has this problem
• From the other hand, all low-level mechanisms are present (Interlocked)
![Page 43: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/43.jpg)
Conclusions: JVM vs. CLI• Similar semantics for locks
– Except that in Java, nested locks are no-op, thus locks can be eliminated by the compiler
– In Java, acquire/release happens only if synchronizing on same lock object. In CLI – full acquire/release.
• Similar semantics for volatiles– Except that volatiles consistency is weaker. It is unclear if
the Double Checked Locking idiom should work
• Similarly unusable semantics for regular variables– Except for Java’s provisions for object construction
(semantics of volatile fields)
• Adds low-level interlocked accesses
![Page 44: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/44.jpg)
Hardware Memory Models
IA-64 and IA-32
![Page 45: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/45.jpg)
IA-32
• Memory reads: acquire semantics– Except that reads can see local writes early; see
below
• Memory writes: release semantics– Except that there is no global order of writes;
see below
• Interlocked memory accesses: using processor lock prefix
![Page 46: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/46.jpg)
IA-64: Memory Accesses
• Regular memory accesses – unordered
• Attributes to memory accesses: release or acquire– Acquire: ld.acq instruction– Release: st.rel instruction
• Memory Fence (mf)– AKA Memory Barrier, is both acquire and
release.
![Page 47: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/47.jpg)
IA-64: Atomic Accesses
• CMPXCHG (Compare and Exchange)– Compare memory with a given value. Exchange
if not equal– Can have either acquire (cmpxchg.acq) or
release (cmpxchg.rel) semantics
• FAA (fetch and add)– Also acquire or release semantics
• XCHG (Exchange)– Only acquire semantics
![Page 48: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/48.jpg)
IA-64: Semantics of ld.acq, st.rel
• Constraints:– Acquire >> X Acquire X
– X >> Release X Release
– Fence >> X Fence X
– X >> Fence X Fence
• Global order of all the strong write operationsT1 T2 T3 T4
st.rel [x]=1 ld.acq r1=[x] st.rel [y]=1 ld.acq r3=[y]
ld r2=[y] ld r4=[x]
Program order
Forbidden: r1=1, r3=1, r2=0, r4=0
Execution order
![Page 49: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/49.jpg)
IA-64 Semantics: Exceptions
• Load may see value from store buffer
• Inserting mf between st.rel and ld.acq solves the problem
• But: in Java semantics, this execution is OK!
T1 T2
st.rel [x]=1 st.rel [y]=1
ld.acq r1=[x] ld.acq r3=[y]
ld r2=[y] ld r4=[x]
Permitted: r1=1, r3=1, r2=0, r4=0
![Page 50: Memory Models In Software and in Hardware Practical Considerations](https://reader035.vdocuments.pub/reader035/viewer/2022062516/56649d635503460f94a45eab/html5/thumbnails/50.jpg)
IA-64 Semantics: Conclusion
• Simple. Clean
• Very usable: direct mapping to both Java and CLI memory models– Especially fits the new Java Memory Model (or
more reasonably, the new Java Memory Model especially fits IA-64 ;)
• IA-32: Obviously developed before MP systems became common (for Intel processors)– Cannot change architecture now