薛智文 [email protected] csie.ntu.tw/~cwhsueh

50
國國國國國國 國國國國國國 薛薛薛 [email protected] http://www.csie.ntu.edu.tw/~cwhsueh/ 101 Spring, March 22, Fri 678, DTH 104 國國國國國國 (II) - 國國國 (2) - Virtualization(V1 2N)

Upload: crete

Post on 23-Feb-2016

133 views

Category:

Documents


1 download

DESCRIPTION

前瞻 資訊科技 (II) - 虛擬化 (2) - Virtualization(V12N ) . 薛智文 [email protected] http://www.csie.ntu.edu.tw/~cwhsueh/ 101 Spring, March 22, Fri 678, DTH 104. Outline. Case Study Xen Architecture Hypercall CPU Virtualization Memory Virtualization I/O Device Virtualization - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

國立台灣大學資訊工程學系

薛智文[email protected]

http://www.csie.ntu.edu.tw/~cwhsueh/101 Spring, March 22, Fri 678, DTH 104

前瞻資訊科技(II)

- 虛擬化 (2)-

Virtualization(V12N)

Page 2: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Case StudyXen

ArchitectureHypercallCPU VirtualizationMemory VirtualizationI/O Device VirtualizationHardware Virtual MachineBenchmark

Domain XKVMUbitusBitCoinWeOS

Summary

Outline

/492

Page 3: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Binary translation Hypercall

How to Virtualize ?

Full Virtualizat

ion

Para Virtualiza

tion

Hardware Assisted Virtualization

Intel VT-x & AMD SVMTrap and emulate

/493

Page 4: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Hardware

Hypervisor, e.g. Xen, VMware

VM0 VM1 VMN…

Virtual Machine Monitor (VMM)Hypervisor

Hardware

Hosted VMM, e.g. KVM, VMware

VM0 VM1 VMN…

Host Operating System

Type I - Hypervisor Type II – Hosted VMM

VM : Virtual Machine, Guest OS + Virtual Devices

/494

Page 5: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Hypervisor (VMM) TypeType I + Microkernel

Xen (open source, Citrix),Microsoft Hyper-V

Type I + Integrated kernel VMware ESX, KVM (kernel-base VM)

Type II (Host OS + Guest OS)VMware GSX, workstation,Microsoft virtual PC, Microsoft virtual server, Sun Virtual Box

Type I

Type II

/495

Page 6: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Xen Architecture (1/2)Domain 0 Domain UDomain UDomain U

/496

QEMU

Page 7: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Xen Architecture (2/2)

Linux XenSystem Calls Hyper CallsSignals EventsInterrupts Physical + Virtual InterruptsCPU PCPU + VCPUFilesystem XenStorePOSIX Shared Memory Grant Tables/Shared Pages

Compare to common Linux

/497

Page 8: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

int 0x80int 0x82

System Call

// xen/include/public/xen.h

#define __HYPERVISOR_set_trap_table 0#define __HYPERVISOR_mmu_update 1#define __HYPERVISOR_set_gdt 2#define __HYPERVISOR_stack_switch 3…

01020304050607

// linux/include/asm/unistd.h

#define __NR_restart_syscall 0#define __NR_exit 1#define __NR_fork 2#define __NR_read 3…

01020304050607

Hyper Call

Guest OS Hypervisor

int 82hhypercall

Hypercall_table

resume Guest OS

HYPERVOSIR_sched_op

do_sched_opiret

Hyper Call

/498

Page 9: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Grant TablePage mapping & Page transferringPage as a unitGrant reference (GR) Grant entry

Domain A Domain B

create GR send GR

informrelease GR

map page

unmap page

access page

Domain A Domain B

transfer page

send GR

create GR

release GR

receive page

inform

/499

Page 10: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Xen Architecture (1/2)Domain 0 Domain UDomain UDomain U

/4910

Page 11: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Event ChannelA lightweight signal mechanism

Use “ports” as identifers (pending+mask)Four major purposes

Guest OSGuest OS

Hypervisor

Hardware

Virtual CPU VirtualMemory Scheduling

PhysicalCPU

PhysicalMemory Eth1

Eth0

VCPU VCPU … VCPU VCPU …IPI

IDC

vIRQ pIRQ

IPI

/4911

Page 12: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Architecture

2 scheduling algorithms (Non-Work Conserving)Simple Earliest Deadline First (SEDF)Credit

CPU Virtualization

Guest OS

VCPU VCPU

Guest OS

VCPU

PCPU PCPU PCPU …

App App

Hypervisor

Scheduling

/4912

Page 13: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Interrupt

Physical interruptFor the hypervisor or for guest OSes

Virtual interruptAsk guest OSes to do8 for now (max is 24)

PIC

IRQnDevice

OS

Hardware

PIC

IRQnDevice

Guest OS

Hardware

Hypervisor

Guest OS …

ISR

event

/4913

Page 14: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Two-level memoryThree-level memory

Virtual, Pseudo-physical, Machine

Memory Virtualization (1/2)

hypervisor

Application

OS

- Virtual Memory

-Physical Memory

Hypervisor

-Machine Memory

Guest OS-Pseudo-Physical Memory

P2M M2P

/4914

Page 15: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

168M memory for hypervisor

Memory Virtualization (2/2)

Area Size

MPT, Machine-to-Physical Translation Table (RO) 16M

Page-Frame Information 96M

MPT, Machine-to-Physical Translation Table (R/W) 16M

Linear Page Table 8MShadow Linear Page Table 8MPer Domain Mappings 8M

Direct Map 12M

I/O Remap 4M

0xFFFFFFFF

0xFC0000000xFC400000

Heap

/4915

Page 16: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

4 mechanisms to manipulate page tablesParavirtualized page tablesWrite page tables (Only level 1 is writable)Shadow page tablesHardware-assisted paging

Memory Virtualization- Translation

Virtual Memory

Machine Memory

Pseudo-Physical Memory

Page TablePage Fault ! Shadow Page Table

P2M

(VM->PFN) (VM->MFN or VM->P2M)

Second Level PagingHAP

MMU

/4916

Page 17: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Structure

Compare with start_info_page

Memory Virtualization - Shared Info Page

wall clock

event channel

Start Info Page Shared Info PageMapped by Domain Builder Guest OS

Information Static Dynamically Updated

MAX : 32 VCPUs

memory

TSC

/4917

Page 18: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

I/O Device Virtualization

Hypervisor also provides three mechanisms to use devices.

Emulated Devices

Paravirtualized Driver

Pass-through

/4918

Page 19: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

I/O Device Virtualization - Emulated Devices

Implemented by QEMUe.g. sound card, ac97, sb16, etc

QEMU-DM

/4919

Page 20: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

I/O Device Virtualization - Paravirtualized Driver

Split Device Driver ModelAn example of sending packets

Front-End DriverBack-End Driver

Native Driver

/4920

Page 21: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

I/O Device Virtualization - I/O Ring

Without data, it only transfers request/replyAn example with GR

Grant Table

Active Grant Table

Hypervisor

Dom U Dom 0

GR GR

GR

Device

I/O Channel

/4921

Page 22: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

I/O Device Virtualization - Pass-Through

Pass and directly use the device

Dom UDom 0

Hypervisor

Hardware

Virtual CPU VirtualMemory Scheduling

PhysicalCPU

PhysicalMemory

Eth1

NativeDriver

…NativeDriver

Eth0

Dom U

/4922

Page 23: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Hardware Virtual Machine

Intel Virtualization Technology

Technology Description Virtualization Implementation

VT-x Root/NonRootExtended Page Tables CPU, Memory Instructions Set

VT-i As VT-x, for ItaniumVT-d DMA, Interrupt Devices IOMMU (Chipset)VT-c Classify Packets Network Devices VMDq, VMDc

/4923

Page 24: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

CPU Benchmark (1/2)

8.3%

Average over 100 tests, Deviation: 0.066~0.128%

/4924

Page 25: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

CPU Benchmark (2/2)

5%

Calculate the 32M digits of .

/4925

Page 26: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Hard Disk Drive Benchmark

/4926

Page 27: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Network Benchmark (1/2)

Testing Time: 180 seconds, Deviation: 0.12~0.26%.

59%

/4927

Page 28: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Network Benchmark (2/2)

Sample Period: 2 seconds

Average: 9.82%

/4928

Page 29: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Architecture

Domain 1, X – A Fake Domain 0

BIOS

payload

hypervisor

Dom0

Linux

Dom1

Windows

DomU

Android …

non-assignable hardwareassignable hardwareVGA eth usb …

xendDrivers Drivers

/4929

Page 30: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

KVM Architecture (1/2)KVM (Kernel-based Virtual Machine) is a full virtualization solution for Linux on x86 hardware containing virtualization extensions (Intel VT or AMD-V).

/4930

KVM

user space

kernel space

...MachineEmulator

Guest OS

Linux Kernel

MachineEmulator

Guest OSUserProcess

UserProcess

Page 31: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

KVM Architecture (2/2)KVM consists of

A loadable kernel module, kvm.ko.Provides the core virtualization infrastructure.

A processor specific module, kvm-intel.ko or kvm-amd.ko.

Provides the support of hardware virtualization.

/4931

kvm.ko

kvm-intel.ko

kvm-amd.ko

KVM

create a device when loading kvm.ko

Page 32: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

QEMU-KVM is modified from QEMU supporting KVM.

QEMU is a generic and open source machine emulator and virtualizer.

Machine Emulator for KVM

/4932

KVM API

Event LoopCPU Emulator

TranslationBuffer

ExceptionHandler

MemoryManagement

EmulatedDevices ...

User InterfaceQEMU-KVM

Page 33: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

KVM APIThere are three types (implemented by ioctl):

On KVM device.KVM_CREATE_VMKVM_CHECK_EXTENSION…

On Virtual Machine (VM).KVM_CREATE_VCPUKVM_ASSIGN_PCI_DEVICE…

On Virtual CPU (VCPU).KVM_RUNKVM_GET_REGSKVM_GET_SREGS…

/4933

return VM idcreate VMs

create VCPUs

control VM

return VCPU id

control VCPU

Page 34: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Using KVM in QEMU-KVM

/4934

In cpus.c717 static void *qemu_kvm_cpu_thread_fn(void *arg)718 {...738 while (1) {739 r = kvm_cpu_exec(env);... ...745 qemu_kvm_wait_io_event(env);746 }749 }

In kvm-all.c954 int kvm_cpu_exec(CPUState *env)955 {...987 run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);...1005 switch (run->exit_reason) {1006 case KVM_EXIT_IO:

In cpus.c924 static void qemu_kvm_start_vcpu(CPUState *env)925 {...929 qemu_thread_create(qemu_kvm_cpu_thread_fn, ...);...933 }

KVM

user space

kernel space

Guest OS

QEMU-KVM

threads

KVM_RUN

Page 35: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Compare to XenXen KVM

Hypervisor Type With Microkernel With Integrated Kernel(A Kernel Module)

Managing VM A Modified Linux(Domain-0)

The Integrated Kernel(Linux)

Guest OS 1. Paravirtualized2. HVM HVM

VCPU Scheduling 1. SEDF2. Credit/Credit2

As Linux Doese.g., CFS

ParavirtualizedDevice Split driver virtio

Management Tool

1. xl (Xen developed)2. libvirt (3rd party) libvirt (3rd party)

RelatedMechanism

More (XenStore, Grant Table, ...) Less

/4935

[1] Andrea Chierici, "A quantitative comparison between xen and kvm", Journal of Physics: Conference Series, IOP Publishing, vol. 219, no. 4, 2010.

In 2010, Andrea has following conclusions [1]: 1. KVM proved great stability and reliability. 2. Right now (2010), Xen hypervisor seems to be the best solution, particularly when using the paravirtualized approach.

Page 36: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Types of Virtualization

Hardware/platform virtualizationDesktop virtualizationSoftware virtualization

OS-level, Workspace, ApplicationStorage virtualization

E.g. Virtual Tape Library, 1.2B USD sold to CA, 1996.Data virtualizationDatabase virtualizationNetwork virtualization

/4936

Page 37: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室/4937

Page 38: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室/4938

Page 39: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室/4939

Page 40: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室/4940

Page 41: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室/4941

WeOS: emerge Our Services網民當家作主 , 共創資訊價值 !

台北

台南

......

...

嘉義...

台灣

京都

大阪

......

...

東京...

日本

Seattle

LA

.........

DC...

USA

BuyerSeller

LogisticsCash Flow...

Internet

Autonomous IDAutonomous Distributed Match Engine

V12N to help G11N (I18N + L10N).

Page 42: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室42

Computer Science and Information Engineering

資訊科學

資訊工程

資訊管理

資訊教育

生物資訊

醫學資訊

圖書資訊

金融資訊

資訊電子

資訊處理/49

資訊傳播

Page 43: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

市值 2013/06/30

/4943

System Software

系微 1x (20.7 億台幣 )

HardwareApplication Software

宏達電 華碩 廣達 99x 99x 119x

台積電 1307x

鴻海 417x訊連 5x

聯發科 213x趨勢 61x

TI 560x

Google, Yahoo 4233x, 394x

IBM 3071x

Microsoft 4181x ARM 244x

Intel 1746x

Apple 5395x

Vmware 416x

Citrix 164x Adobe 332x

Semantec 227x

Amazon 1832x Cisco 1885x

Page 44: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

市值 2013/10/08

/4944

System Software

系微 1x (20.1 億台幣 )

HardwareApplication Software

宏達電 華碩 廣達 57x 88x 124x

台積電 1342x

鴻海 492x訊連 4x

聯發科 255x趨勢 77x

TI 644x

IBM 2923x

Microsoft 4067x ARM 322x

Intel 1668x

Apple 6497x

Vmware 508x

Citrix 194x Adobe 372x

Semantec 254x

Amazon 2077x Cisco 1799x

Google, Yahoo 4227x, 511x

Page 45: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

市值 2013/12/27

/4945

System Software

系微 1x (18.3 億台幣 )

HardwareApplication Software

宏達電 華碩 廣達 65x 108x 145x

台積電 1462x

鴻海 568x訊連 5x

聯發科 318x趨勢 77x

TI 784x

IBM 3309x

Microsoft 5138x ARM 420x

Intel 2100x

Apple 8341x

Vmware 629x

Citrix 188x Adobe 489x

Semantec 263x

Amazon 3043x Cisco 1916x

Google, Yahoo 6137x, 678x

Page 46: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

How fast can virtualization achieve?95+% 99.9%

What kinds of applications?Well …

What problems it might incur?Technical

Big Data?Security

How much?BusinessPoliticsGlobalization (G11N) = Internationalization (I18N) + Localization (L10N)…

Answers for Big Questions

/4946

Page 47: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

HomeworkRefer to

Xen To-Do List, http://wiki.xen.org/wiki/Xen_Document_Days/TODOBitCoin, http://bitcoin.org/zh_TW/WeOS

Each of you send a one-page report ( 學號 .pdf) to [email protected], answering any of the big or related questions with your words, what problems you would like to solve? And how?

Due on Dec 29.Your reports will be posted on course wiki on Dec 30.

/4947

Page 48: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

假若真時真亦假虛擬實處實還虛

/4948

System typePlatform

Virtual Real

Test Data

Virtual simulation evaluation

Real emulation implementation

Page 49: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Stay hungry to be full [of passion].Stay foolish to be smart [on absorption].Virtualized reality vs. Real virtualization.

Life of Pi, trust yourself.專題 vs. PhD創意 vs. 創業 , 人事時地物本 e.g. 鼎王 1B, 麻油 1B, 鳳梨酥 20+B, 掏寶 , evernote, Line, ubitus, whoscall (6M0.5B), 阿里巴巴 , 萬達 , PTT?

Virtualized to go anywhere?Just Do it, NTU CSIE eSystem!

For Taiwan IndustryKey is system, System is key.

Summary

/4949

Page 50: 薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh

資工系網媒所 NEWS 實驗室

Reference

五分鐘看懂美國國債危機http://www.youtube.com/watch?v=K2hhck_kmz0

/4950