asplos10&vee10 report-suzaki

ACM ASPLOS’10 & Vee’10 Report

at 22回仮想化実装技術勉強会(vimpl) at 22 装 (vimpl) 2010/April/20

須崎有康

概要• Fifteenth International Conference on Architectural Support for

Programming Languages and Operating Systems (ASPLOS 2010)– March 15-17, 2010– Pittsburgh, PA– 182Submit (今までの最高)、Accept 32（18%）、Best Paper 3本

• ポスターあり。日本から5件(東大平木研、早稲田中島研2件、九大村上研、九工大光来研)– 参加者400名程度。– Keynote SpeechはACM InfoSys Foundation Award の Eric Brewer (UCB)

• ワークショップ– 2nd WIOV (Workshop I/O Virtualization)– Workshop on Architecting Memory Technologies (これはパネルでした)– 参加していないが Workshop on General-Purpose Computation on Graphics

Processing Units

• ASPLOS 2011はNewport Beach, California, March 5 ~ 11, 2011– asplos11.cs.ucr.edu/– Abstract Deadline: Monday, July 19, 2010– Full Paper Deadline: Monday, July 26, 2010 (11:59pm EDT)

プログラム１日目• Session 1: Novel Architectures (Session Chair: Luis Ceze)

– Best Paper! Dynamically Replicated Memory: Building Reliable Systems from Nanoscale Resistive Memories

• Engin Ipek, Jeremy Condit, Edmund B. Nightingale, Doug Burger and Thomas Moscibroda (University of Rochester / Microsoft Research)

– A Power-efficient All-optical On-chip Interconnect Using Wavelength-based Oblivious Routing• Nevin Kirman and Jose Martinez (Cornell University)

• Session 2: Compilers and Runtime Systems (Session Chair: Michael Hind)– Best Paper! A Real System Evaluation of Hardware Atomicity for Software Speculation

• Naveen Neelakantam, David Ditzel and Craig Zilles (University of Illinois at Urbana-Champaign; Intel)

– Dynamic filtering: multi-purpose architecture support for language runtime systems• Tim Harris, Adrian Cristal, Sasa Tomic and Osman Unsal (Microsoft Research)

• Session 3: Parallel Programming 1 (Session Chair: Yuanyuan Zhou)• Session 3: Parallel Programming 1 (Session Chair: Yuanyuan Zhou)– CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution

• Tom Bergan, Owen Anderson, Joe Devietti, Luis Ceze and Dan Grossman (University of Washington)

– Speculative Parallelization Using Software Multi-threaded Transactions,• Arun Raman, Hanjun Kim, Thomas R. Mason, Thomas B. Jablin and David I. August (Princeton University)

– Respec: Efficient online multiprocessor replay via speculation and external determinism• Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter Chen and Jason Flinn (University of Michigan)

• Session 4: Scheduling in Parallel Systems (Session Chair: Tim Harris)– Probabilistic Job Symbiosis Modeling for SMT Processor Scheduling

• Stijn Eyerman and Lieven Eeckhout (Ghent University)

– Request Behavior Variations• Kai Shen (University of Rochester)

– Decoupling contention management from scheduling• Ryan Johnson, Radu Stoica, Anastasia Ailamaki and Todd Mowry (EPFL; Carnegie Mellon University)

– Addressing Shared Resource Contention in Multicore Processors Via Scheduling• Sergey Zhuravlev, Sergey Blagodurov and Alexandra Fedorova (Simon Fraser University)

プログラム2日目 (1/2)• Session 5. Software Reliability (Session Chair: Emery Berger)

– SherLog: Error Diagnosis by Connecting Clues from Run-time Logs• Ding Yuan, Haohui Mai, Weiwei Xiong, Lin Tan, Yuanyuan Zhou and Shankar Pasupathy (University of California, San Diego;

University of Illinois at Urbana-Champaign)

– Analyzing Multicore Dumps to Facilitate Concurrency Bug Reproduction• Dasarath Weeratunge, Xiangyu Zhang and Suresh Jagannathan (Purdue University)

– A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs• Sebastian Burckhardt, Pravesh Kothari, Madanlal Musuvathi and Santosh Nagarakatte (Microsoft Research)

– ConMem: Detecting Severe Concurrency Bugs Through an Effect-Oriented Approach• Wei Zhang, Chong Sun and Shan Lu (University of Wisconsin- Madison)

• Session 6. Hardware Power and Energy (Session Chair: David Wood)– Characterizing Processor Thermal Behavior– Characterizing Processor Thermal Behavior

• Francisco J. Mesa-Martínez, Ehsan K. Ardestani and Jose Renau (University of California, Santa Cruz)

– Conservation Cores: Reducing the Energy of Mature Computations• Ganesh Venkatesh, John Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steve Swanson

and Michael Taylor (University of California, San Diego)

– Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement• Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian and Al Davis (University of Utah)

プログラム2日目 (2/2)

• Session 7. Data Centers (Session Chair: Scott Mahlke)– Power Routing: Dynamic Power Provisioning in the Data Center

• Steven Pelley, David Meisner, Pooya Zandevakili, Jack Underwood and Thomas Wenisch (University of Michigan)

– Joint Optimization of Idle and Cooling Power in Data Centers While Maintaining Response Time

• Faraz Ahmad and T. N. Vijaykumar (Purdue University)

• Session 8. Hardware Monitoring (Session Chair: Peter Chen)– Butterfly Analysis: Adapting Dataflow Analysis to Dynamic Parallel Monitoring

• Michelle Goodstein, Evangelos Vlachos, Shimin Chen, Phillip Gibbons, Michael Kozuch and Todd Mowry (Carnegie Mellon University; Intel Labs Pittsburgh)

– ParaLog: Enabling and Accelerating Online Parallel Monitoring of Multithreaded – ParaLog: Enabling and Accelerating Online Parallel Monitoring of Multithreaded Applications

• Evangelos Vlachos, Michelle Goodstein, Michael Kozuch, Shimin Chen, Babak Falsafi, Phillip Gibbons and Todd Mowry (Carnegie Mellon University; Intel Labs Pittsburgh; EPFL)

• Session 9. Parallel Programming 2 (Session Chair: Tim Harris)– MacroSS: Macro-SIMDization of Streaming Applications

• Amir Hormati, Yoonseo Choi, Mark Woh, Manjunath Kudlur, Rodric Rabbah, Trevor Mudge and Scott Mahlke (University of Michigan)

– COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders• Dong Hyuk Woo and Hsien-Hsin Lee (Georgia Institute of Technology)

– Flexible Architectural Support for Fine-grain Scheduling• Daniel Sanchez, Richard Yoo and Christos Kozyrakis (Stanford University)

プログラム3日目• Session 10. Parallel Memory Systems (Session Chair: Carl Waldspurger)

– Specifying and Dynamically Verifying Address Translation-Aware Memory Consistency• Bogdan Romanescu, Alvin Lebeck and Daniel Sorin (Duke University)

– Best Paper! Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems

• Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu and Yale Patt (The University of Texas at Austin)

– An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems

– Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro and Wen-mei Hwu (University of Illinois at Urbana-Champaign; UPC)

– Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors– Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors• Abhishek Bhattacharjee and Margaret Martonosi (Princeton University)

• Session 11. Security and Hardware Reliability (Session Chair: Vikram Adve)

– Orthrus: Efficient Software Integrity Protection on Multi-Cores• Ruirui Huang, Dan Deng and G. Edward Suh (Cornell University)

– Shoestring: Probabilistic Soft-error Resilience on the Cheap• Shuguang Feng, Shantanu Gupta, Amin Ansari and Scott Mahlke (University of Michigan)

– Virtualized and Flexible ECC for Main Memory• Doe Hyun Yoon and Mattan Erez (The university of Texas at Austin)

Dynamically Replicated Memory: Building Reliable Systems from Nanoscale Resistive MemoriesEngin Ipek, Jeremy Condit, Edmund B. Nightingale, Doug Burger and Thomas Moscibroda

(University of Rochester / Microsoft Research)

• 次期メインメモリであるPCM(Phase Change Memory)の利用法– 40n scale以下で作成でき高密度だが、一旦壊れると修復できない

– 壊れたページ(primary)はbackupページを用意してリカバー

– Physical -> Real 変換でPrimary とbackupのマッピングを行う

Xはdead byte. ここはparityが壊れていることで判断

Backup page

Primary page

Dynamic filtering: multi-purpose architecture support for language runtime systems

Tim Harris, Adrian Cristal, Sasa Tomic and Osman Unsal (Microsoft Research)

• メモリアクセス確認するread/write barrier命令である”dyfl”を追加することでGC, Software Transactional Memory, Control&Data Flow Integrity (XFI[OSDI06],WIT[SP08], DFI[OSDI06])を効率化

GCで使われるWrite Barriervoid writeBarrier(void **addr, void *tgt) {

if (inOldGen(addr) && inYoungGen(tgt)) { // T1

dflyを追加したWrite Barriervoid writeBarrierDyfl(void **addr, void *tgt) {

if ((!dyfl_card_pair(addr, tgt, 0x1)) && // A1(!dyfl_addr(addr, 0x2))) { // A2log(addr); // L1

} }

T がtest, Lがlog, Sがset, A がaddress

dyfl(i1, i2, mask, tag) // Test dynamic filterdyfl_set(i1, i2, mask, tag) // Set dynamic filterdyfl_clear(i1, i2, mask, tag) // Clear specific entrydyfl_clear(tag) // Clear all with tag

疑問：hardware break pointと違うのか？

(!dyfl_addr(addr, 0x2))) { // A2if (inOldGen(addr) && inYoungGen(tgt)) { // T1

dyfl_set_addr(addr, 0x2); // S2log(addr); // L1

} else {dyfl_set_card_pair(addr, tgt, 0x1); // S1

} } }

Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement

Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian and Al Davis (University of Utah)

• 動機：MultiCoreにより細かいメモリアクセスになっている。DRAMのRow Buffer 8KBのヒット率が低くなっている。下図 64byte cache block

• アクセスが多いデータを見つけ、ヒット率が高くなるようにデータを移動する(hardware assist migration)

• OSのページサイズを1KBとし、4KB SuperPage(プロセッサのTLBにおけるページ粒度可変機構)を使う

– 2.6 Linux Super Page http://shimizu-lab.dt.u-tokai.ac.jp/thesis/master/6adgm007.pdf– 参考文献「2.6 系カーネルに対するLinux Super Pageの実装と性能評価」 http://shimizu-lab.dt.u-tokai.ac.jp/thesis/master/6adgm007.pdf

•Average performance ↑ 9% (max. 18%)•Average memory energy consumption ↓ 18% (max. 62%). •Average row-buffer utilization ↑ 38%

Orthrus: Efficient Software Integrity Protection on Multi-Cores

Ruirui Huang, Dan Deng and G. Edward Suh (Cornell University)

• 細粒度のメモリレイアウトが異なるレプリカプロセスを作成。

• ２つのプロセスの実行で、メモリアクセスが同一コンテンツ(異なるアドレス)をしているかを検査することでBuffer OverflowやDangling Pointer検出

– Orthrus(オルトロス)はギリシャ神話の双頭の犬。ケルベロスの兄弟。

類似研究：どちらともソースコードを公開している

Diehard [PLDI06] http://prisms.cs.umass.edu/emery/N-variant [USENIX-Security06] http://www.cs.virginia.edu/nvariant/

Virtualized and Flexible ECC for Main MemoryDoe Hyun Yoon and Mattan Erez (The university of Texas at Austin)

• 通常ECC用にCheck Bitが付加されているが、このcheck bitを仮想化(Tire1 シンプル, Tire2 ストロング)し、通常のメモリ空間にマップできるようにする。– 利点：Bit増加を抑制する。省電力化

• DIMM(DDR2 burst4)の構成に合わせて、– x4 DDR2 burst 4 の場合、64bit -> 4B T1EC– x8 DDR2 burst 4 の場合、64bit -> 8B T1EC

• T2はchipkill correntを採用

感想・傾向

• 当然だが、OS＆最新ハードやDebugger+最新ハードを絡めたものが採択されている。

• 最新ハードもメモリがらみが多かった。

WIOV 2009Second Workshop on I/O Virtualization

• 参加人数 30名程度。全員自己紹介• Storage

– SLIM: Network Decongestion for Storage Systems• Madalin Mihailescu, Gokul Soundararajan and Cristiana Amza (University of Toronto).

– On Disk I/O Scheduling in Virtual Machines • Mukil Kesavan, Ada Gavrilovska and Karsten Schwan (Georgia Institute of Technology).

• Networking – Ally: OS-Transparent Packet Inspection Using Sequestered Cores

• Jen-Cheng Huang (Georgia Tech), Matteo Monchiero and Yoshio Turner (HP Labs).

– A Network Interface Card Architecture for I/O Virtualization in Embedded Systems– A Network Interface Card Architecture for I/O Virtualization in Embedded Systems• Holm Rauchfuss, Thomas Wild and Andreas Herkersdorf (Technische Universitat Munchen).

– Architectural support for user-level network interfaces in heavily virtualized systems• Florian Auernhammer and Patricia Sagmeister (IBM Research).

• Keynote by Paul Congdon (HP) – Enabling Truly Converged Instrastructure

• Power and Performance Bottlenecks– Redesigning Xen's Memory Sharing Mechanism for Safe and Efficient I/O

Virtualization• Kaushik Kumar Ram (Rice University), Jose Renato Santos and Yoshio Turner (HP Labs).

– Power Aware I/O Virtualization• Kun Tian and Yaozu Dong (Intel).

– I/O Virtualization Bottlenecks in Cloud Computing Today• Jeffrey Shafer (Rice University).

• HP: http://sysrun.haifa.il.ibm.com/hrl/wiov2010/– スライドが公開されている

Enabling Truly Converged InstrastrucutreKeynote by Paul Congdon (HP)

• 現在進んでいるネットワーク仮想化の規格紹介

– HyperVisorでI/O仮想化のためにCPUの負荷が大きい。

– アダプタ仮想化• I/Oの仮想化をハードで行う

– PCI-SIGで規格化– PCI-SIGで規格化

» SR-IOV :Single Root I/O virtualization

– エッジ仮想化

• スイッチの仮想化をハードで行う– IEEE 802.Qbg 802.Qbhで規格化

» VEB: Virtual Ethernet Bridge» VEPA: Virtual Ehternet Port Aggregator

• 参考文献日経コンピュータ 2010/03/31• ネットワーク仮想化裏で支えるネットワークの新規格

Workshop on Architecting Memory Technologies

• 司会: Shih-Lien Lu, Intel Labs• Professor Mattan Erez, University of Texas at Austin • Professor Bruce Jacob, University of Maryland • Professor Hsien-Hsin Lee, Georgia Tech University • Professor Onur Mutlu, Carnegie Mellon University • Professor Yuan Xie, Pennsylvania State University

– HP: http://web.engr.oregonstate.edu/~sllu/asplos2010 スライド公開

• 不揮発RAMへの移行、電力消費の問題、マルチコアの競合による性能低

• コアに対する最適ストレージサイズ– Mattn Erez (Texas Austin)

ＦIT （Failure In Time) は故障率の表記方法として使用されます。その単位は10億時間に発生する故障件数で表記されます。例えば、10億時間に、故障が3件発生したとすると、その故障率（FIT）は3となります。一般的な電子部品は、FITが10-100程度となります。故障率の合計がシステム全体の故障率になるため、部品数が多くなればなるほど、故障率が上昇します

Vee Day1

• Keynote Talk “Transistors to Toys: Teaching Systems to Freshmen”– Peter M. Chen (University of Michigan)

• Debugging and Replay– Capability Wrangling Made Easy: Debugging on a Microkernel with

Valgrind Valgrind • Aaron Pohle (Technische Universität Dresden), Björn Döbel, Michael

Roitzsch, Hermann Härtig– Multi-Stage Replay with Crosscut

• Jim Chow, Dominic Lucchetti,Tal Garfinkel, Geoffrey Lefebvre,Ryan Gardner,Joshua Mason, Sam Small, Peter M. Chen (University of Michigan)

– Optimizing Crash Dump in Virtualized Environments • Yijian Huang (Fudan University), Haibo Chen, Binyu Zang

Vee Day2• Keynote Talk, “Looking Beyond a Singularity”

– Galen C. Hunt (Microsoft Research)

• Compiler Infrastructure– Improving Compiler-Runtime Separation with XIR

• Ben L. Titzer (Google), Thomas Würthinger, Doug Simon, Marcelo Cintra

– VMKit: A Substrate for Managed Runtime Environments• Nicolas Geoffray (Université Pierre et Marie Curie),Gaël Thomas, Julia Lawall , Gilles Muller , Bertil Folliot

• Featured Talk “Spice up your browser: NaCl, Pepper, and beyond”– Robert Muth (Google)

• Applications of Virtualization• Applications of Virtualization– Neon: System Support for Derived Data Management

• QiUniversity of California, San Diego), John McCullough, Justin Ma, Nabil Schear, Michael Vrable (University of California, San Diego), Amin Vahdat, Alex C. Snoeren, Geoffrey M. Voelker, Stefan Savage

– Energy-Efficient Storage in Virtual Machine ng Zhang (Environments• Lei Ye (University of Arizona), Gen Lu, Sushanth Kumar, Chris Gniady, John H. Hartman

• Hypervisor Scheduling– AASH: An Asymmetry-Aware Scheduler for Hypervisors

• Vahid Kazempour , Ali Kamali , Alexandra Fedorova (Simon Fraser University)

– Supporting Soft Real-Time Tasks in the Xen Hypervisor• Min Lee (Georgia Institute of Technology), A. S. Krishnakumar (Avaya Laboratories), P. Krishnan

, Navjot Singh, Shalini Yajnik

Vee Day3

• Java– Efficient Runtime Tracking of Allocation Sites in Java

• Rei Odaira (IBM Research - Tokyo), Kazunori Ogata, Kiyokuni Kawachiya, Tamiya Onodera (IBM Research - Tokyo), Toshio Nakatani

– Evaluation of a Just-In-Time Compiler Retrofitted for PHP• Michiaki Tatsubori (IBM Research - Tokyo), Akihiko Tozawa, Toyotaro

Suzumura, Scott Trent, Tamiya Onodera, Suzumura, Scott Trent, Tamiya Onodera,

– Novel Online Profiling for Virtual Machines• Manjiri A. Namjoshi (University of Kansas), Prasad A. Kulkarni

• Dynamic Binary Translation– DBT Path Selection for Holistic Memory Efficiency and Performance

• Apala Guha (University of Virginia), Kim Hazelwood, Mary Lou Soffa

– Dynamic Binary Translation Specialized for Embedded Systems• Goh Kondoh (IBM Research - Tokyo), Hideaki Komatsu

“Looking Beyond a Singularity”Galen C. Hunt (Microsoft Research)

• Singularityの３つのkey– Software Isolated Processes (SIP)– Contract-Based Channels– Manifest-Based Programs

• Singularityの後継プロジェクト

– Menlo 認知されないモバイルデバイス

– Drawbridge サンドボックス

– SafeOS アッセンブリを検証

– BTL 静的解析と動的解析の融合

Capability Wrangling Made Easy: Debugging on a Microkernel with Valgrind

Aaron Pohle (Technische Universität Dresden), Björn Döbel, Michael Roitzsch, Hermann Härtig

• L4系マイクロカーネル Fiasco.OCにValgrindを移植する方法

• メモリ管理が異なるので整合性を取る仕組みが必要– Valgrind ではapplication(Client)のメモリ空間をValgirndが可能。OSの

インターフェースはPOSIX– Fiasco.OCではCapabilityベース– Fiasco.OCではCapability ス

• Valgrindを使ったCapCheckによりCapabilityの移譲を検査できるようになった

AASH: An Asymmetry-Aware Scheduler for HypervisorsVahid Kazempour , Ali Kamali , Alexandra Fedorova (Simon Fraser University)

• 非対称マルチコア(同一ISA。Fast CoreとSlow Coreの２種類)に対するHypervisorのスケジューラの提案

– 基本：• Fast Coreは公平に割り当てる

• ゲスト内の構成は認識する• ゲスト内の構成は認識する– Fast CoreのスレッドスケジュールはOSの仕事

• Fast Core割り当てのプライオリティあり

– Fast Coreが空いている場合にはSlow Coreより優先して割り当てる

– MSR (Model Specification Register)を使ってゲストOSにCoreの変更を伝えることは今後の課題

ゲスト内認識

AASH: An Asymmetry-Aware Scheduler for Hypervisors

• 実装

– Xen3.0のCredit Schdulerを改良

– 4 Core AMD Opteron を2つ（計8コア）• Fast Core 2GHz 1個、Slow Core 1GHz 7個• DVFS(Dynamic Voltage and Frequency Scaling)で設定？

• 評価• 評価

– Xenオリジナルなスケジューラより、36%良い結果がでた。