asplos10&vee10 report-suzaki
DESCRIPTION
ASPLOS'10&Vee'10 の報告TRANSCRIPT
ACM ASPLOS’10 & Vee’10 Report
at 22回仮想化実装技術勉強会(vimpl) at 22 装 (vimpl) 2010/April/20
須崎有康
概要• Fifteenth International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS 2010)– March 15-17, 2010– Pittsburgh, PA– 182Submit (今までの最高)、Accept 32(18%)、Best Paper 3本
• ポスターあり。日本から5件(東大平木研、早稲田中島研2件、九大村上研、九工大光来研)– 参加者400名程度。– Keynote SpeechはACM InfoSys Foundation Award の Eric Brewer (UCB)
• ワークショップ– 2nd WIOV (Workshop I/O Virtualization)– Workshop on Architecting Memory Technologies (これはパネルでした)– 参加していないが Workshop on General-Purpose Computation on Graphics
Processing Units
• ASPLOS 2011はNewport Beach, California, March 5 ~ 11, 2011– asplos11.cs.ucr.edu/– Abstract Deadline: Monday, July 19, 2010– Full Paper Deadline: Monday, July 26, 2010 (11:59pm EDT)
プログラム1日目• Session 1: Novel Architectures (Session Chair: Luis Ceze)
– Best Paper! Dynamically Replicated Memory: Building Reliable Systems from Nanoscale Resistive Memories
• Engin Ipek, Jeremy Condit, Edmund B. Nightingale, Doug Burger and Thomas Moscibroda (University of Rochester / Microsoft Research)
– A Power-efficient All-optical On-chip Interconnect Using Wavelength-based Oblivious Routing• Nevin Kirman and Jose Martinez (Cornell University)
• Session 2: Compilers and Runtime Systems (Session Chair: Michael Hind)– Best Paper! A Real System Evaluation of Hardware Atomicity for Software Speculation
• Naveen Neelakantam, David Ditzel and Craig Zilles (University of Illinois at Urbana-Champaign; Intel)
– Dynamic filtering: multi-purpose architecture support for language runtime systems• Tim Harris, Adrian Cristal, Sasa Tomic and Osman Unsal (Microsoft Research)
• Session 3: Parallel Programming 1 (Session Chair: Yuanyuan Zhou)• Session 3: Parallel Programming 1 (Session Chair: Yuanyuan Zhou)– CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution
• Tom Bergan, Owen Anderson, Joe Devietti, Luis Ceze and Dan Grossman (University of Washington)
– Speculative Parallelization Using Software Multi-threaded Transactions,• Arun Raman, Hanjun Kim, Thomas R. Mason, Thomas B. Jablin and David I. August (Princeton University)
– Respec: Efficient online multiprocessor replay via speculation and external determinism• Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter Chen and Jason Flinn (University of Michigan)
• Session 4: Scheduling in Parallel Systems (Session Chair: Tim Harris)– Probabilistic Job Symbiosis Modeling for SMT Processor Scheduling
• Stijn Eyerman and Lieven Eeckhout (Ghent University)
– Request Behavior Variations• Kai Shen (University of Rochester)
– Decoupling contention management from scheduling• Ryan Johnson, Radu Stoica, Anastasia Ailamaki and Todd Mowry (EPFL; Carnegie Mellon University)
– Addressing Shared Resource Contention in Multicore Processors Via Scheduling• Sergey Zhuravlev, Sergey Blagodurov and Alexandra Fedorova (Simon Fraser University)
プログラム2日目 (1/2)• Session 5. Software Reliability (Session Chair: Emery Berger)
– SherLog: Error Diagnosis by Connecting Clues from Run-time Logs• Ding Yuan, Haohui Mai, Weiwei Xiong, Lin Tan, Yuanyuan Zhou and Shankar Pasupathy (University of California, San Diego;
University of Illinois at Urbana-Champaign)
– Analyzing Multicore Dumps to Facilitate Concurrency Bug Reproduction• Dasarath Weeratunge, Xiangyu Zhang and Suresh Jagannathan (Purdue University)
– A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs• Sebastian Burckhardt, Pravesh Kothari, Madanlal Musuvathi and Santosh Nagarakatte (Microsoft Research)
– ConMem: Detecting Severe Concurrency Bugs Through an Effect-Oriented Approach• Wei Zhang, Chong Sun and Shan Lu (University of Wisconsin- Madison)
• Session 6. Hardware Power and Energy (Session Chair: David Wood)– Characterizing Processor Thermal Behavior– Characterizing Processor Thermal Behavior
• Francisco J. Mesa-Martínez, Ehsan K. Ardestani and Jose Renau (University of California, Santa Cruz)
– Conservation Cores: Reducing the Energy of Mature Computations• Ganesh Venkatesh, John Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steve Swanson
and Michael Taylor (University of California, San Diego)
– Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement• Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian and Al Davis (University of Utah)
プログラム2日目 (2/2)
• Session 7. Data Centers (Session Chair: Scott Mahlke)– Power Routing: Dynamic Power Provisioning in the Data Center
• Steven Pelley, David Meisner, Pooya Zandevakili, Jack Underwood and Thomas Wenisch (University of Michigan)
– Joint Optimization of Idle and Cooling Power in Data Centers While Maintaining Response Time
• Faraz Ahmad and T. N. Vijaykumar (Purdue University)
• Session 8. Hardware Monitoring (Session Chair: Peter Chen)– Butterfly Analysis: Adapting Dataflow Analysis to Dynamic Parallel Monitoring
• Michelle Goodstein, Evangelos Vlachos, Shimin Chen, Phillip Gibbons, Michael Kozuch and Todd Mowry (Carnegie Mellon University; Intel Labs Pittsburgh)
– ParaLog: Enabling and Accelerating Online Parallel Monitoring of Multithreaded – ParaLog: Enabling and Accelerating Online Parallel Monitoring of Multithreaded Applications
• Evangelos Vlachos, Michelle Goodstein, Michael Kozuch, Shimin Chen, Babak Falsafi, Phillip Gibbons and Todd Mowry (Carnegie Mellon University; Intel Labs Pittsburgh; EPFL)
• Session 9. Parallel Programming 2 (Session Chair: Tim Harris)– MacroSS: Macro-SIMDization of Streaming Applications
• Amir Hormati, Yoonseo Choi, Mark Woh, Manjunath Kudlur, Rodric Rabbah, Trevor Mudge and Scott Mahlke (University of Michigan)
– COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders• Dong Hyuk Woo and Hsien-Hsin Lee (Georgia Institute of Technology)
– Flexible Architectural Support for Fine-grain Scheduling• Daniel Sanchez, Richard Yoo and Christos Kozyrakis (Stanford University)
プログラム3日目• Session 10. Parallel Memory Systems (Session Chair: Carl Waldspurger)
– Specifying and Dynamically Verifying Address Translation-Aware Memory Consistency• Bogdan Romanescu, Alvin Lebeck and Daniel Sorin (Duke University)
– Best Paper! Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems
• Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu and Yale Patt (The University of Texas at Austin)
– An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems
– Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro and Wen-mei Hwu (University of Illinois at Urbana-Champaign; UPC)
– Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors– Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors• Abhishek Bhattacharjee and Margaret Martonosi (Princeton University)
• Session 11. Security and Hardware Reliability (Session Chair: Vikram Adve)
– Orthrus: Efficient Software Integrity Protection on Multi-Cores• Ruirui Huang, Dan Deng and G. Edward Suh (Cornell University)
– Shoestring: Probabilistic Soft-error Resilience on the Cheap• Shuguang Feng, Shantanu Gupta, Amin Ansari and Scott Mahlke (University of Michigan)
– Virtualized and Flexible ECC for Main Memory• Doe Hyun Yoon and Mattan Erez (The university of Texas at Austin)
Dynamically Replicated Memory: Building Reliable Systems from Nanoscale Resistive MemoriesEngin Ipek, Jeremy Condit, Edmund B. Nightingale, Doug Burger and Thomas Moscibroda
(University of Rochester / Microsoft Research)
• 次期メインメモリであるPCM(Phase Change Memory)の利用法– 40n scale以下で作成でき高密度だが、一旦壊れると修復できない
– 壊れたページ(primary)はbackupページを用意してリカバー
– Physical -> Real 変換でPrimary とbackupのマッピングを行う
Xはdead byte. ここはparityが壊れていることで判断
Backup page
Primary page
Dynamic filtering: multi-purpose architecture support for language runtime systems
Tim Harris, Adrian Cristal, Sasa Tomic and Osman Unsal (Microsoft Research)
• メモリアクセス確認するread/write barrier命令である”dyfl”を追加することでGC, Software Transactional Memory, Control&Data Flow Integrity (XFI[OSDI06],WIT[SP08], DFI[OSDI06])を効率化
GCで使われるWrite Barriervoid writeBarrier(void **addr, void *tgt) {
if (inOldGen(addr) && inYoungGen(tgt)) { // T1
dflyを追加したWrite Barriervoid writeBarrierDyfl(void **addr, void *tgt) {
if ((!dyfl_card_pair(addr, tgt, 0x1)) && // A1(!dyfl_addr(addr, 0x2))) { // A2log(addr); // L1
} }
T がtest, Lがlog, Sがset, A がaddress
dyfl(i1, i2, mask, tag) // Test dynamic filterdyfl_set(i1, i2, mask, tag) // Set dynamic filterdyfl_clear(i1, i2, mask, tag) // Clear specific entrydyfl_clear(tag) // Clear all with tag
疑問:hardware break pointと違うのか?
(!dyfl_addr(addr, 0x2))) { // A2if (inOldGen(addr) && inYoungGen(tgt)) { // T1
dyfl_set_addr(addr, 0x2); // S2log(addr); // L1
} else {dyfl_set_card_pair(addr, tgt, 0x1); // S1
} } }
Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement
Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian and Al Davis (University of Utah)
• 動機:MultiCoreにより細かいメモリアクセスになっている。DRAMのRow Buffer 8KBのヒット率が低くなっている。下図 64byte cache block
• アクセスが多いデータを見つけ、ヒット率が高くなるようにデータを移動する(hardware assist migration)
• OSのページサイズを1KBとし、4KB SuperPage(プロセッサのTLBにおけるページ粒度可変機構)を使う
– 2.6 Linux Super Page http://shimizu-lab.dt.u-tokai.ac.jp/thesis/master/6adgm007.pdf– 参考文献 「2.6 系カーネルに対するLinux Super Pageの実装と性能評価」 http://shimizu-lab.dt.u-tokai.ac.jp/thesis/master/6adgm007.pdf
•Average performance ↑ 9% (max. 18%)•Average memory energy consumption ↓ 18% (max. 62%). •Average row-buffer utilization ↑ 38%
Orthrus: Efficient Software Integrity Protection on Multi-Cores
Ruirui Huang, Dan Deng and G. Edward Suh (Cornell University)
• 細粒度のメモリレイアウトが異なるレプリカプロセスを作成。
• 2つのプロセスの実行で、メモリアクセスが同一コンテンツ(異なるアドレス)をしているかを検査することでBuffer OverflowやDangling Pointer検出
– Orthrus(オルトロス)はギリシャ神話の双頭の犬。ケルベロスの兄弟。
類似研究: どちらともソースコードを公開している
Diehard [PLDI06] http://prisms.cs.umass.edu/emery/N-variant [USENIX-Security06] http://www.cs.virginia.edu/nvariant/
Virtualized and Flexible ECC for Main MemoryDoe Hyun Yoon and Mattan Erez (The university of Texas at Austin)
• 通常ECC用にCheck Bitが付加されているが、このcheck bitを仮想化(Tire1 シンプル, Tire2 ストロング)し、通常のメモリ空間にマップできるようにする。– 利点:Bit増加を抑制する。省電力化
• DIMM(DDR2 burst4)の構成に合わせて、– x4 DDR2 burst 4 の場合、64bit -> 4B T1EC– x8 DDR2 burst 4 の場合、64bit -> 8B T1EC
• T2はchipkill correntを採用
感想・傾向
• 当然だが、OS&最新ハードやDebugger+最新ハードを絡めたものが採択されている。
• 最新ハードもメモリがらみが多かった。
WIOV 2009Second Workshop on I/O Virtualization
• 参加人数 30名程度。全員自己紹介• Storage
– SLIM: Network Decongestion for Storage Systems• Madalin Mihailescu, Gokul Soundararajan and Cristiana Amza (University of Toronto).
– On Disk I/O Scheduling in Virtual Machines • Mukil Kesavan, Ada Gavrilovska and Karsten Schwan (Georgia Institute of Technology).
• Networking – Ally: OS-Transparent Packet Inspection Using Sequestered Cores
• Jen-Cheng Huang (Georgia Tech), Matteo Monchiero and Yoshio Turner (HP Labs).
– A Network Interface Card Architecture for I/O Virtualization in Embedded Systems– A Network Interface Card Architecture for I/O Virtualization in Embedded Systems• Holm Rauchfuss, Thomas Wild and Andreas Herkersdorf (Technische Universitat Munchen).
– Architectural support for user-level network interfaces in heavily virtualized systems• Florian Auernhammer and Patricia Sagmeister (IBM Research).
• Keynote by Paul Congdon (HP) – Enabling Truly Converged Instrastructure
• Power and Performance Bottlenecks– Redesigning Xen's Memory Sharing Mechanism for Safe and Efficient I/O
Virtualization• Kaushik Kumar Ram (Rice University), Jose Renato Santos and Yoshio Turner (HP Labs).
– Power Aware I/O Virtualization• Kun Tian and Yaozu Dong (Intel).
– I/O Virtualization Bottlenecks in Cloud Computing Today• Jeffrey Shafer (Rice University).
• HP: http://sysrun.haifa.il.ibm.com/hrl/wiov2010/– スライドが公開されている
Enabling Truly Converged InstrastrucutreKeynote by Paul Congdon (HP)
• 現在進んでいるネットワーク仮想化の規格紹介
– HyperVisorでI/O仮想化のためにCPUの負荷が大きい。
– アダプタ仮想化• I/Oの仮想化をハードで行う
– PCI-SIGで規格化– PCI-SIGで規格化
» SR-IOV :Single Root I/O virtualization
– エッジ仮想化
• スイッチの仮想化をハードで行う– IEEE 802.Qbg 802.Qbhで規格化
» VEB: Virtual Ethernet Bridge» VEPA: Virtual Ehternet Port Aggregator
• 参考文献 日経コンピュータ 2010/03/31• ネットワーク仮想化 裏で支えるネットワークの新規格
Workshop on Architecting Memory Technologies
• 司会: Shih-Lien Lu, Intel Labs• Professor Mattan Erez, University of Texas at Austin • Professor Bruce Jacob, University of Maryland • Professor Hsien-Hsin Lee, Georgia Tech University • Professor Onur Mutlu, Carnegie Mellon University • Professor Yuan Xie, Pennsylvania State University
– HP: http://web.engr.oregonstate.edu/~sllu/asplos2010 スライド公開
• 不揮発RAMへの移行、電力消費の問題、マルチコアの競合による性能低
• コアに対する最適ストレージサイズ– Mattn Erez (Texas Austin)
FIT (Failure In Time) は故障率の表記方法として使用されます。その単位は10億時間に発生する故障件数で表記されます。例えば、10億時間に、故障が3件発生したとすると、その故障率(FIT)は3となります。一般的な電子部品は、FITが10-100程度となります。故障率の合計がシステム全体の故障率になるため、部品数が多くなればなるほど、故障率が上昇します
Vee Day1
• Keynote Talk “Transistors to Toys: Teaching Systems to Freshmen”– Peter M. Chen (University of Michigan)
• Debugging and Replay– Capability Wrangling Made Easy: Debugging on a Microkernel with
Valgrind Valgrind • Aaron Pohle (Technische Universität Dresden), Björn Döbel, Michael
Roitzsch, Hermann Härtig– Multi-Stage Replay with Crosscut
• Jim Chow, Dominic Lucchetti,Tal Garfinkel, Geoffrey Lefebvre,Ryan Gardner,Joshua Mason, Sam Small, Peter M. Chen (University of Michigan)
– Optimizing Crash Dump in Virtualized Environments • Yijian Huang (Fudan University), Haibo Chen, Binyu Zang
Vee Day2• Keynote Talk, “Looking Beyond a Singularity”
– Galen C. Hunt (Microsoft Research)
• Compiler Infrastructure– Improving Compiler-Runtime Separation with XIR
• Ben L. Titzer (Google), Thomas Würthinger, Doug Simon, Marcelo Cintra
– VMKit: A Substrate for Managed Runtime Environments• Nicolas Geoffray (Université Pierre et Marie Curie),Gaël Thomas, Julia Lawall , Gilles Muller , Bertil Folliot
• Featured Talk “Spice up your browser: NaCl, Pepper, and beyond”– Robert Muth (Google)
• Applications of Virtualization• Applications of Virtualization– Neon: System Support for Derived Data Management
• QiUniversity of California, San Diego), John McCullough, Justin Ma, Nabil Schear, Michael Vrable (University of California, San Diego), Amin Vahdat, Alex C. Snoeren, Geoffrey M. Voelker, Stefan Savage
– Energy-Efficient Storage in Virtual Machine ng Zhang (Environments• Lei Ye (University of Arizona), Gen Lu, Sushanth Kumar, Chris Gniady, John H. Hartman
• Hypervisor Scheduling– AASH: An Asymmetry-Aware Scheduler for Hypervisors
• Vahid Kazempour , Ali Kamali , Alexandra Fedorova (Simon Fraser University)
– Supporting Soft Real-Time Tasks in the Xen Hypervisor• Min Lee (Georgia Institute of Technology), A. S. Krishnakumar (Avaya Laboratories), P. Krishnan
, Navjot Singh, Shalini Yajnik
Vee Day3
• Java– Efficient Runtime Tracking of Allocation Sites in Java
• Rei Odaira (IBM Research - Tokyo), Kazunori Ogata, Kiyokuni Kawachiya, Tamiya Onodera (IBM Research - Tokyo), Toshio Nakatani
– Evaluation of a Just-In-Time Compiler Retrofitted for PHP• Michiaki Tatsubori (IBM Research - Tokyo), Akihiko Tozawa, Toyotaro
Suzumura, Scott Trent, Tamiya Onodera, Suzumura, Scott Trent, Tamiya Onodera,
– Novel Online Profiling for Virtual Machines• Manjiri A. Namjoshi (University of Kansas), Prasad A. Kulkarni
• Dynamic Binary Translation– DBT Path Selection for Holistic Memory Efficiency and Performance
• Apala Guha (University of Virginia), Kim Hazelwood, Mary Lou Soffa
– Dynamic Binary Translation Specialized for Embedded Systems• Goh Kondoh (IBM Research - Tokyo), Hideaki Komatsu
“Looking Beyond a Singularity”Galen C. Hunt (Microsoft Research)
• Singularityの3つのkey– Software Isolated Processes (SIP)– Contract-Based Channels– Manifest-Based Programs
• Singularityの後継プロジェクト
– Menlo 認知されないモバイルデバイス
– Drawbridge サンドボックス
– SafeOS アッセンブリを検証
– BTL 静的解析と動的解析の融合
Capability Wrangling Made Easy: Debugging on a Microkernel with Valgrind
Aaron Pohle (Technische Universität Dresden), Björn Döbel, Michael Roitzsch, Hermann Härtig
• L4系マイクロカーネル Fiasco.OCにValgrindを移植する方法
• メモリ管理が異なるので整合性を取る仕組みが必要– Valgrind ではapplication(Client)のメモリ空間をValgirndが可能。OSの
インターフェースはPOSIX– Fiasco.OCではCapabilityベース– Fiasco.OCではCapability ス
• Valgrindを使ったCapCheckによりCapabilityの移譲を検査できるようになった
AASH: An Asymmetry-Aware Scheduler for HypervisorsVahid Kazempour , Ali Kamali , Alexandra Fedorova (Simon Fraser University)
• 非対称マルチコア(同一ISA。Fast CoreとSlow Coreの2種類)に対するHypervisorのスケジューラの提案
– 基本:• Fast Coreは公平に割り当てる
• ゲスト内の構成は認識する• ゲスト内の構成は認識する– Fast CoreのスレッドスケジュールはOSの仕事
• Fast Core割り当てのプライオリティあり
– Fast Coreが空いている場合にはSlow Coreより優先して割り当てる
– MSR (Model Specification Register)を使ってゲストOSにCoreの変更を伝えることは今後の課題
ゲスト内認識
AASH: An Asymmetry-Aware Scheduler for Hypervisors
• 実装
– Xen3.0のCredit Schdulerを改良
– 4 Core AMD Opteron を2つ(計8コア)• Fast Core 2GHz 1個、Slow Core 1GHz 7個• DVFS(Dynamic Voltage and Frequency Scaling)で設定?
• 評価• 評価
– Xenオリジナルなスケジューラより、36%良い結果がでた。