SlideShare a Scribd company logo
1 of 52
Download to read offline
The Hardware
Revolution in Server
Virtualization

Mohan Parthasarathy (Hewlett-Packard)
ACM Compute 2009 Tutorial
9th Jan 2009




© 2008 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice
Agenda
    Server Virtualization technologies ~15 min
•
     − Overview and history
     − VMM architectures
     − Criteria for a processor to be virtualizable
    X86 Virtualization ~30 min
•
     − The x86 processor architecture overview
     − Virtualization challenges in x86 processors
    Break 1 – Q&A
•
    Software techniques for virtualization ~ 45 min
•
     − CPU virtualization (Binary Translation/Para-virtualization)
     − Memory virtualization (shadow tables/Xen writeable page tables)
     − I/O virtualization (device emulation)
    Break 2 – Q&A
•
    Hardware techniques for virtualization ~45 min
•
     − CPU virtualization (VT-x/AMD-V)
     − Memory virtualization (Intel EPT/AMD NPT)
     − I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV)
    Future Trends ~ 5 min
•
     − Manageability
     − Security
Did you ever wonder if the person in the puddle is real, and you're just
a reflection of him? ~Calvin and Hobbes
2     16 January 2009
Server Virtualization Technologies
                                                      Software/
                              Software/
       Hardware                                       Firmware              Resource
                              Firmware
       Partitioning                                   Virtualization        Virtualization
                              Partitioning

                                                       APP1      APP2
                              APP1       APP2
                       APP2
        APP1
                                                                                      APP2
                                                                             APP1


                                                      OS1         OS2
                               OS1       OS2
                       OS2
        OS1                                                                         OS
                                                        Hypervisor Layer
                               Hypervisor Layer
 S/W                                                      (Software/
                                 (Software/
                                                           Firmware)
                                  Firmware)
 H/W      CPU          CPU
                                                                      CPU
                                 CPU                    CPU
                                             CPU                              CPU        CPU




       Memory Memory
                                                     Memory Memory
                              Memory Memory                                 Memory Memory

                                                    HP Integrity VM         HP-UX SRP
                              HP vPar
       HP nPar
                                                    IBM SLPARS (micro-      Solaris Containers
                              IBM DLPAR
       Sun DSD
                                                    partitions)             (Zones)
                              Sun Logical Domains
                                                    Hitachi Virtage         PVC (earlier SWSoft)
                                                    VMware ESX/GSX          OpenVZ,
                                                    Microsoft Hyper-V
Isolation                                                                   IBM WPAR
                                                                                         Flexibility
 3   16 January 2009
                                                    Xen, KVM, xVM…
A brief history lesson
                        1960’s                                1996
                  APP            APP   APP
    APP                                                 APP          APP      APP
                                              APP

    CMS           MVS            MVS   CMS    W2K3     W2K           WNT4     Linux
                    IBM VM/370                             VMware

                  IBM Mainframe                      Intel / AMD x86 Server



                                             Stanford Research
  VMM on IBM Mainframe
•
                                             • DISCO project
• Many apps on $$$ HW
                                             • VMM on cheap x86 HW
                                             • VMware in 1999

    Commodity hardware becomes powerful enough to support a virtual machine
    manager (VMM) – so it’s back to the future with a proven technology!
4     16 January 2009
VMM Architectures

                                                          Hybrid VMM Hypervisor
                                  Type-2 VMM Hypervisor
Type-1 VMM Hypervisor


                                    Guest 1    Guest 2

                                                           Guest 1    Guest 2
    Guest 1             Guest 2           VMM

                                                           Host OS      VMM
                                         Host OS
     VMM (Hypervisor)

                                                               Hardware
                                        Hardware
          Hardware


                                                              Examples:
                                        Examples:
          Examples:
                                                              -HPVM
                                        -UML
          -VMware ESX
                                                              -VMWare GSX
          -Xen
                                                              -Microsoft Virtual   Server
          -MS Hyper-V




5     16 January 2009
Hosted VM Architecture




                      HP Integrity VM, Microsoft Virtual Server, VMware GSX
6   16 January 2009
Virtualization Requirements – Popek and
Goldberg
        A Model of Third Generation Machines
•
         − Two modes of execution
         − Protection mechanism for the
           supervisor mode
         − A method to automatically signal the
            supervisor when the VM executes a
            sensitive instruction.
        Properties for a Virtual Machine Monitor
•
         − Equivalence
         − Resource control
         − Efficiency




    7      16 January 2009
VMM Requirements (Sensitive Instructions)




    Ref : Analyzing the Intel Pentium’s ability to support a secure VMM – John Scott Robin (1999)
8      16 January 2009
Agenda
    Server Virtualization technologies
•
     − Overview and history
     − VMM architectures
     − Criteria for a processor to be virtualizable
    X86 Virtualization
•
     − The x86 processor architecture overview
     − Virtualization challenges in x86 processors
    Software techniques for virtualization
•
     − CPU virtualization (Binary Translation/Para-virtualization)
     − Memory virtualization (shadow tables/Xen writeable page tables)
     − I/O virtualization (device emulation)
    Hardware techniques for virtualization
•
     − CPU virtualization (VT-x/AMD-V)
     − Memory virtualization (Intel EPT/AMD NPT)
     − I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV)
    Future Trends
•
     − Manageability
     − Security
9    16 January 2009
X86 architecture – Privilege Levels
                       Data Structures contains Privilege Levels
                       • DPL : Descriptor privilege level
                       • CPL : Current Privilege Level
                          − DPL of the access rights byte in CS
                            segment descriptor cache register
                          − privilege level of the code and data
                            segment for the current task
                       • RPL : Requested Privilege Level
                          − the privilege level of the new selector
                            loaded into a segment register




10   16 January 2009
X86 memory management
                       CS, DS, SS, FS, ES, GS




11   16 January 2009
X86 memory management - segmentation
        Upper 13 bits of
       segment selector
       are used to index
      the descriptor table


                                                    GDTR, LDTR
      TI = Table Indicator
Select the descriptor table
0 = Global Descriptor Table
1 = Local Descriptor Table




                                                                          Access
          selector            Segment base          Segment limit
                                                                           rights


                                        Hidden part of segment register
 12     16 January 2009
X86 Paging – 32 bit mode



                         Page Table




                       Page Table Entry




13   16 January 2009
X86 paging - registers




14   16 January 2009
X86 paging – 64 bit mode with 4KB
pages
                                            4Kb Page Translation

                       63        48 47        39 38            30 29        21 20            12 11             0

 Linear Address                                                                                  Page Offset
                       Sign Extend PML4E Offset    PDPE Offset     PDE Offset     PTE Offset




                                                                                                       4-Kb Page
                                                                                                       In Physical
                                                                                                        Memory
                                                                                         Page Table
                                                                 Page Directory          Entry
                                                                 Entry
                                  Page Directory
         Page Map                 Pointer
         Level 4
                                                                                                      12
                                                           9                        9
     9                       9




                                                                                    36
                            512 PML4E * 512 PDPE * 512 PDE * 512 PTE = 2                 4-Kb pages



15   16 January 2009
X86 paging – 64 bit mode with 2MB
pages
                                              2Mb Page Translation

                         63        48 47        39 38            30 29         21 20                         0

     Linear Address                                                                    Page Offset
                         Sign Extend PML4E Offset    PDPE Offset     PDE Offset




                                                                                               2-Mb Page
                                                                                               In Physical
                                                                                                Memory
                                                                   Page Directory
                                                                   Entry
                                    Page Directory
              Page Map              Pointer
              Level 4
                                                             9                               21
         9                     9




                                                                         27
                              512 PML4E * 512 PDPE * 512 PDE = 2              2-Mb pages


16   16 January 2009
X86 virtualization challenges
                                        Incorrect execution when
  Non-faulting read                                                                Excessive
                                        run in ring level > 0 (3C1)
  of privileged                                                                    Faulting
  registers (3B1)
                                                               Guest
                            Guest
Ring 3             CPUID                                                Sysenter
                                                                Apps
                            Apps




                                                                                    Ring
                                        POPF                LAR/LSL/
                SGDT/SIDT/SLDT/STR                                     CLI/
                                                            VERR/VER                aliasing/
                /PUSHF/SMSW/POP/                                       STI
                                        STR/POP             W/CALL/
                PUSH
                                                                                    compression
Ring 1                                  /PUSH               INT/JMP/
                                                            RET
Address space
compression
Ring 0
                                                  VMM
 Leakage of privilege
 level (3C1)
                                               Hardware

                           Non-faulting write to          Segment
                           privileged state               reversibility issue
                           (eflags.IF) (3B1)              on context switch
  17     16 January 2009
Agenda
     Server Virtualization technologies
•
      − Overview and history
      − VMM architectures
      − Criteria for a processor to be virtualizable
     X86 Virtualization
•
      − The x86 processor architecture overview
      − Virtualization challenges in x86 processors
     Software techniques for virtualization
•
      − CPU virtualization (Binary Translation/Para-virtualization)
      − Memory virtualization (shadow tables/Xen writeable page tables)
      − I/O virtualization (device emulation)
     Hardware techniques for virtualization
•
      − CPU virtualization (VT-x/AMD-V)
      − Memory virtualization (Intel EPT/AMD NPT)
      − I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV)
     Future Trends
•
      − Manageability
      − Security
18    16 January 2009
Dynamic Binary Translation



                                                    x86 Parser &
         x86
         x86                                         High Level
        Binary
        Binary                                       Translator
                           Data RAM

   Disk                                                                   Code Cache           Code Cache
                                                      High Level                               Tags
                                                     Optimization



                                                      Low Level
                                                   Code Generation



                                                      Low Level
                                                   Optimization and
                                                     Scheduling

                                      Translator                        Runtime -- Execution

Ref : Virtual Machines and Dynamic Translation:Implementing ISAs in Software – Joel Emer, Massachusetts
Institute of Technology
   19    16 January 2009
Binary Translation - C Code Example
     int isPrime(int a) {
          for (int i = 2; i < a; i++) {
                      if (a % i == 0) return 0;
          }
          return 1;
     }




Ref : Keith Adams and Ole Agesen. A comparison of software and hardware techniques for x86
virtualization. Operating Systems Review, 40(5):2–13, December 2006
20       16 January 2009
Basic Block Translation
          Most instructions copied identically.
     •
          Privileged instructions must be emulated.
     •
          Jumps must be translated since translation can alter code layout.
     •
          Each translated BB must end with jump to next translated BB.
     •




Ref : Keith Adams and Ole Agesen. A comparison of software and hardware techniques for x86
virtualization. Operating Systems Review, 40(5):2–13, December 2006
21       16 January 2009
Translation of isPrime(49)
      Note that prime: BB never translated since 49 is not prime.




Ref : Keith Adams and Ole Agesen. A comparison of software and hardware techniques for x86
virtualization. Operating Systems Review, 40(5):2–13, December 2006
 22    16 January 2009
Para-virtualization – Xen architecture




23   16 January 2009
Memory Virtualization – Shadow Page
Tables




24   16 January 2009
Memory Virtualization – Shadow PT vs
Writeable page tables (Xen)
                                                           PT
                            PT
                                   Guest
Guest

                                    PD
     PD
                          VA->PA                       VA->MA




                        Sync’ed
                                                    Page Fault
                            PT
VMM                                VMM
Shadow PT
     PD
                                           Verifies that
                                           page table
                                           update is
                                           okay




                                                   CR3
                         CR3
25    16 January 2009
Dynamic memory resizing - Ballooning

     Inflating a balloon
•
     − When the server wants to
       reclaim memory
     − Driver allocates pinned
       physical pages within the VM
     − Increases memory pressure in
       the guest OS, reclaims space
       to satisfy the driver allocation
       request
     − Driver communicates the
       physical page number for
       each allocated page to VMM
     Deflating
•
     − Frees up memory for general
       use within the guest OS

26    16 January 2009
I/O system architecture overview (PCI/PCI-e)


                                                    OS driver            OS driver
                   OS driver



                                              VMM


                                  CPU                   CPU        CPU
            CPU
                                  CHAOS!!

                        Root                              Memory
Configuration
                        Complex                     RX
                                    TX
space



                        01 2
27    16 January 2009
                                        3, 0, 0 (BDF)
I/O Virtualization Architecture
                                            Service VM Model
    Monolithic Model                                                      Pass-through Model
                                                              Guest VMs
                                           Service VMs
                            VMn
    VM0                                                                                       VMn
                                                                              VM0
                                                               VMn
                                              I/O
                            Guest OS
    Guest OS                                                                                   Guest OS
                                                                               Guest OS
                                            Services         VM0
                            and Apps
    and Apps                                                                                   and Apps
                                                                               and Apps

                                                                                                Device
                                                                                Device
                                            Device
                                                             Guest OS                           Drivers
                                                                                Drivers
                                            Drivers
              I/O Services
                                                             and Apps
     Device Drivers
                                                                                    Hypervisor
                                                         Hypervisor
                   Hypervisor

                                                                                     Assigned
                                                             Shared
                            Shared
                                                                                      Devices
                                                             Devices
                            Devices

    Pro: Higher Performance                Pro: High Security
•                                                                             Pro: Highest Performance
                                       •                                  •
    Pro: I/O Device Sharing                Pro: I/O Device Sharing
•                                                                             Pro: Smaller Hypervisor
                                       •                                  •
    Pro: VM Migration                      Pro: VM Migration
•                                                                             Pro: Device assisted sharing
                                       •                                  •
    Con: Larger Hypervisor                 Con: Lower Performance
•                                                                             Con: Migration Challenges
                                       •                                  •

      VMWare ESX                                       Xen


28        16 January 2009
Network Virtualization (VMWare GSX
example)




29   16 January 2009
Xen I/O Architecture - Safe Hardware
Interface




30   16 January 2009
Networking in Xen

                                                            Guest
                Driver Domain
                                                           Domain 1
                    Back-End Drivers         Packet Data                     Guest
                                                            Front-End
                                                                            Domain 2
                                                              Driver
                                                                                        Guest
               Ethernet                       Hypervisor
                                                                                       Domain ...
                                                Page
                Bridge
                                               Flipping


                                               Virtual
                                             Interrupts
                NIC Driver


     Driver
     Control
                                             Interrupt       Hypervisor
                                             Dispatch



                                       Hardware                 Control + Data
                    Packet Data        Interrupts

                        NIC                                CPU / Memory / Disk / Other Devices



31    16 January 2009
Agenda
     Server Virtualization technologies (15 min)
•
      − Overview and history
      − VMM architectures
      − Criteria for a processor to be virtualizable
     X86 Virtualization (30 min)
•
      − The x86 processor architecture overview
      − Virtualization challenges in x86 processors
     Software techniques for virtualization (30 min)
•
      − CPU virtualization (Binary Translation/Para-virtualization)
      − Memory virtualization (shadow tables/Xen writeable page tables)
      − I/O virtualization (device emulation)
     Hardware techniques for virtualization
•
      − CPU virtualization (VT-x/AMD-V)
      − Memory virtualization (Intel EPT/AMD NPT)
      − I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV)
     Future Trends (5 min)
•
      − Manageability
      − Security

32    16 January 2009
CPU Virtualization with Intel VT-x

                                                                           Virtual Machines (VMs)
         Two new VT-x operating modes
•
         − Less-privileged mode
           (VMX non-root) for guest OSes                                                      Apps
                                                                           Apps
                                                           Ring 3

         − More-privileged mode
                                                                                              OS
                                                                            OS
           (VMX root) for VMM                              Ring 0

         Two new transitions
•                                                             VM Exit              VM Entry

         − VM entry to non-root operation                   VMX
                                                                              VM Monitor (VMM)
         − VM exit to root operation                        Root

         Execution controls determine when exits occur
•
         −     Access to privilege state, occurrence of exceptions, etc.
         −     Flexibility provided to minimize unwanted exits
         VM Control Structure (VMCS) controls VT-x operation
•
         −     Also holds guest and host state

    33       16 January 2009
VT-x Operations
                        VM 1    VM 2                VM n

VMX                    Ring 3   Ring 3             Ring 3
                                             ...
Non-root
                       Ring 0   Ring 0             Ring 0
Operation

          VM Exit       VMCS      VMCS              VMCS
                          1         2                 n


                                    Ring 3
VMX
IA-32 Root
Operation              VMRESUME
                       VMLAUNCH     Ring 0
                       VMXON


34   16 January 2009
VT-x new instructions
         VMXON and VMXOFF
•
     −     To enter and exit VMX-root mode.
         VMLAUNCH: Used on initial transition from VMM to Guest
•
     −     Enters VMX non-root operation mode
         VMRESUME: Used on subsequent entries
•
     −     Enters VMX non-root operation mode
     −     Loads Guest state and Exit criteria from VMCS
         VMEXIT
•
     −     Used on transition from Guest to VMM
     −     Enters VMX root operation mode
     −     Saves Guest state in VMCS
     −     Loads VMM state from VMCS
         VMPTRST and VMPTRLD
•
     −     To Read and Write the VMCS pointer.
         VMREAD, VMWRITE, VMCLEAR
•
     −     Read from, Write to and clear a VMCS
         VMCALL
•
     −     Hypervisor entry point for hypercall from guest



35   16 January 2009
VT-x Data Structures (VMCS)
     VMCS is a 4K table      VM execution controls   Controls           External interrupt
•
     which specifies the                             processor          exiting, interrupt
                                                     behaviour in       window exiting,
     VM environment
                                                     non-root mode      CR3 load/store
     Physical addressing                                                exiting, VPID
•
     only, and is accessed                                              enable, VPID
                                                                        value, EPT
     through
                                                                        enable, EPTP…
     VMREAD/VMWRITE
     interface               Guest save state        Processor state    EIP, ESP,
                                                     saved on VM        EFLAGS, IDTR,
     Loads and Stores to
•                                                    exits and loaded   Segment
     the current VMCS                                from on VM         registers etc..
     pointer through                                 entries
     VMPTRLD and             Host save state         Processor state    CR3, EIP set to
     VMPTRST                                         loaded on VM       monitor entry,
                                                     exits              EFLAGS etc..
     VMRESUME used if
•
     same VMCS is being      VM exit controls        These fields       MSR save etc..
                                                     control VM exits
     resumed on a
     processor. Else,
                             VM entry controls       These fields       Interrupts on
     VMCLEAR followed by
                                                     control VM         entry, MSR
     VMLAUNCH.                                       entries            loads etc..


36    16 January 2009
VT-x solution to x86 virtualization challenges
                                                                                 Sysenter calls into guest
All reads return privilege                  Guest OS in full control of          OS. CLI/STI optimized to
level 0, GDT/LDT owned by                   segment/task descriptors             deliver virtual interrupts to
guest OS, CPUID can be                                                           VM
made to trap into VMM
                                                                       Guest
                              Guest
Ring 3             CPUID                                                          Sysenter
                                                                          Apps
                              Apps



                                                                                              No ring
                                                              LAR/LSL/VERR/
                                            POPF                                              compression –
                                                                                 CLI/
             SGDT/SIDT/SLDT/STR
                                                              VERW/CALL/IN
Ring 0                                                                                        all rings
                                                                                 STI
             /PUSHF/SMSW/POP/
                                                              T/JMP/RET
             PUSH                                                                             available

No need for VMM to share address
space with guest – no address
compression
Ring -1
                                                    VMM

                                                   Hardware
                                                                 Clean context switch on
                           Eflags.IF is no longer used for       VM entry/exit
                           interrupt masking
  37     16 January 2009
Intel EPT/AMD NPT
                                            GPT Base
                                          Pointer (hCR3)
              gCR3
                                         Guest
                                        Physical
 Guest                                                               Host
                         x86 Guest                     Host GPT
                                        Address
 Linear                                                             Physical
                        Page Tables                   Page Tables
Address                                                             Address

                                      TLB & Caches
    GPT directly translates Guest Virtual addresses into Host Physical
•
    addresses on the fly.
     − Uses Guest Page Table and Host-based Page Table
    Significant reduction in “exit frequency”
•
        • Primary page table modifications are as fast as native
        • Page faults require no exits
        • Context switches require no exits
     − No shadow page table memory overhead
    However, results in more expensive TLB misses - The “memsweep effect” –
•
    mitigated by large guest pages
    AMD ASID/Intel VPID - segments the TLB, reduces TLB purge overheads.
•

38    16 January 2009
VT-x extension: Extended Page Table
(EPT)




    All guest-physical addresses go through extended page tables
•
         • Includes address in CR3, address in PDE, address in PTE, etc.

39    16 January 2009
VT-x extension: Virtual Processor IDs
(VPID)
  The idea of a tagged TLB is that each
•
  TLB entry is “tagged” with an identifier
• Having such a tag allows the TLB
  entries to not be “flushed” when
  switching between the host and a
  guest
• VPID is activated if the new “enable
  VPIP” control bit is set in VMCS
                         Tag
                        Virtual    Address   Physical Address
                       Host       0x1000     0x10001000
                       Host       0x2000     0x10002000
                       Host       0x3000     0x10003000
                       Host       0x4000     0x10004000
                       Guest      0x1000     0xFFF01000
                       Guest      0x2000     0xFFF02000
                       Guest      0x3000     0xFFF03000
                       Guest      0x4000     0xFFF04000



40   16 January 2009
VT-x extension: CPUID spoofing
(Flex Migration)
 Allows software to “spoof” the CPUID feature bits (e.g. make
•
 the value of the CPUID feature bits appear different than
 they really are).
• This is the same than the CPUID spoofing feature that the
 current VT processors have.
                             Live VM               Live VM
                             Migration             Migration




Pre 2004                                               2006+ (Intel® Core™)
                                 2004+

                                        64 bit
         32 bit                                                64 bit dual,
                                     single core
      single core                                               quad-core

                Older / Existing Servers                Newer Servers


 41    16 January 2009
Intel VT-d Architecture Detail

               DMA Requests
                                                               Dev 31, Func 7
 Device ID   Virtual Address             …
                                Length
                                                               Dev P, Func 2
                                                     Bus 255
                                                                                                        Page
                                                                                                       Frame
                                                     Bus N

                                  Fault Generation    Bus 0
                                                               Dev P, Func 1
                                                                                        4KB Page
                                                                                         Tables
                                                               Dev 0, Func 0




                                                                                      Address Translation
   DMA Remapping                                                                          Structures
                                                                          Device D1
      Engine                                            Device
                                                      Assignment
                               Translation Cache       Structures
                                                                          Device D2

                                                                                      Address Translation
                                                                                          Structures
                                Context Cache



Memory Access with System                                 Memory-resident Partitioning And
    Physical Address                                          Translation Structures

 42    16 January 2009
VT-d: Remapping Structures
             VT-d hardware selects page-table based on source of DMA request
     •
             −    Requestor ID (bus / device / function) in request identifies DMA source

             VT-d Device Assignment Entry
     •
                   127                                                                             64

                                     Rsvd                     Domain ID        Rsvd      Address
                                                                                          Width
                   63                                                                              0

                                 Address Space Root Pointer            Rsvd      Ext.   Controls   P
                                                                               Controls


             VT-d supports hierarchical page tables for address translation
     •
             −    Page directories and page tables are 4 KB in size
             −    4KB base page size with support for larger page sizes
             −    Support for DMA snoop control through page table entries


                 VT-d Page Table Entry
         •
                  63                                                                                    0

                        Rsvd   Page-Frame / Page-Table Address Available   S    Rsvd     Ext.      W    R
                                                                           P           Controls

43       16 January 2009
VT-d: Hardware Page Walk
     Requestor ID                                                    DMA Virtual Address
15          87        32      0      63    57 56      48 47         39 38        30 29         21 20       12 11             0
                                                             Level-4
                                                             Level-       Level-3
                                                                          Level-       Level-2
                                                                                       Level-       Level-1
                                                                                                    Level-
                                      000000b 000000000b
                 Device Func
     Bus                                                                                                       Page Offset
                                                           table offset table offset table offset table offset




     Base
                                                                                                                   Page
                   Device
                 Assignment
                   Tables                                      Level-4
                                                               Level-
                                                                            Level-3
                                                                            Level-
                                                                Page
                                                                             Page
                                                                Table                    Level-2
                                                                                         Level-
                                                                             Table
                                                                                          Page
                                  Example Device Assignment                                            Level-1
                                                                                                       Level-
                                                                                          Table
                                  Table Entry specifying 4-level                                        Page
                                  page table                                                            Table
     44     16 January 2009
PCI SIG IOV Overview
                                                                PCIe Multi-Root IOV
PCIe Single-Root IOV

SI               SI                                      SI           SI                  SI          SI
          VI                                                    VI                             VI




                 PCI SIG is standardizing mechanisms that enable PCIe Devices to be directly shared
          •

                 −      Single-Root IOV – Direct sharing between SIs on a single system
                 −      Multi-Root IOV – Direct sharing between SIs on multiple systems
                 PCI-SIG IOV Specification covers “north-side” of the Device
          •
     45   16 January 2009
PCI SIG IOV
Terminologies
                                                                SR-PCIM
                                                      SI                            SI
                                                                          VI
                                                           VI

      System Image (SI)
 •
      −    SW, e.g., a guest OS, to which virtual
           and physical devices can be assigned
      Virtual Intermediary (VI)
 •
      −    Performs resource allocation, isolation,
           management and event handling
      PCIM – PCI Manager
 •
      −    Controls configuration, management
           and error handling of PFs and VFs
      −    May be in SW and/or Firmware.
      −    May be integrated into a VI
      Translation Agent (TA )
 •
      −    Uses ATPT to translates PCI Bus
           Addresses into platform addresses
                                                                                PCIe
      Address Translation and Protection
 •
                                                                               Switch
      Table (ATPT)
      −    Validates access rights of incoming PCI
           memory transactions.
      −    Translates PCI Address into
           platform physical addresses
                                                      F                   F


46   16 January 2009
VT-c: Virtual Machine Device
Queues (VMDq)
• On      the receive path, VMDq
     provides a hardware ‘sorter'
     or classifier that essentially
     does the pre-work for the
     VMM of directing which end
     VM the packets should go to.
     The NIC or LAN silicon is
     performing a hardware assist
     for the VMM layer.




47    16 January 2009
Intel / AMD Comparison
           VT-x                                            VT-x2               VT-d2
                                                 VT-d
                                     LT
Intel

           VMENTER, VMRESUME,                              Extended Page       IOMMU
                                                 IOMMU
                                     SENTER
           VMREAD, VMWRITE                                 Tables (EPT)
                                     AC
           VMCS – VM control seg                           Virtual Processor
                                                           IDs (VPID)




                                                                               unknown
                               2006
                2005                                               2008
                                                 2007



                                                                                SVM-3
                                                                    IOMMU
                                              SVM-2
          SVM                                                                   ?
                                                                    PCI-SIG
AMD




                                              Nested page tables
          VMRUN
                                                                    ATS
                                              Improved #VMEXIT
          VMCB – VM control block
                                              Decode assist
          ASID tagged TLB (performance)
          Paged realmode
          SKINIT (security)
          DMA exclusion vector (security)



  48    16 January 2009
Deja-Vu – Back to the future

  What VT calls quot;non-root modequot;, and Pacifica calls quot;guest
•
  modequot;, was called quot;interpretive executionquot; on the IBM
  VM/370 and VM/ESA mainframes.
• VT's quot;vmlaunchquot; instruction and Pacifica's quot;vmrunquot; was
  called as quot;sie“
• Intel's quot;VMCSquot; and AMD's quot;VMCBquot; was called as quot;state
  descriptionquot; on the IBM mainframes.
• IBM also defined the concept of shadow translation tables
  and a dual page-table walk in hardware.
• IBM also defined a interpreted SIE for nested hypervisor
  support (not yet in Intel/AMD)


49   16 January 2009
Agenda
     Server Virtualization technologies
•
      − Overview and history
      − VMM architectures
      − Criteria for a processor to be virtualizable
     X86 Virtualization
•
      − The x86 processor architecture overview
      − Virtualization challenges in x86 processors
     Software techniques for virtualization
•
      − CPU virtualization (Binary Translation/Para-virtualization)
      − Memory virtualization (shadow tables/Xen writeable page tables)
      − I/O virtualization (device emulation)
     Hardware techniques for virtualization
•
      − CPU virtualization (VT-x/AMD-V)
      − Memory virtualization (Intel EPT/AMD NPT)
      − I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV)
     Future Trends
•
      − Manageability
      − Security
50    16 January 2009
Future Trends
     Secure Hypervisors – The hypervisor itself like an OS can have holes.
•
     BluePill attacks – subverting the hypervisor
•
     Trusted Virtualization - Virtualizing TPMs for use by guest virtual machines
•
     Trusted Virtualization – How do we trust the VMM ? Intel’s LT (LaGrande) and
•
     AMD’s Presidio introduce architectural extensions for security
     Firewalls to protect guests. Xen Motion security hole
•
     Storage QoS – FC NPIV, Storage vMotion
•
     Datacenter/Lifecycle Management (Virtualiztion 2.0)
•
      − OpsWare PAS (now HP Operations Orchestrator)
      − Novell ZENworks Orchestrator
      − VMware Lifecycle Manager




51     16 January 2009
References
  D. L. Osisek, K. M. Jackson, and P. H. Gum. ESA/390
•
  interpretive-execution architecture, foundation for VM/ESA.
  IBM Systems Journal, 30(1):34–51, 1991.
• John Scott Robin and Cynthia E. Irvine. Analysis of the Intel
  Pentium’s ability to support a secure virtual machine
  monitor. In USENIX, editor, Proceedings of the Ninth
  USENIX Security Symposium, August 14–17, 2000,
  Denver, Colorado, page 275, San Francisco, CA, USA,
  2000
• Keith Adams and Ole Agesen. A comparison of software
  and hardware techniques for x86 virtualization. Operating
  Systems Review, 40(5):2–13, December 2006
• PCI IOV talks at WinHEC and HP by Michael Krause
• VMWorld 2007 talk by Ole Agesen
• Intel IDF 2007/2008 presentations

52   16 January 2009

More Related Content

What's hot

Xen server 6.1 customer presentation
Xen server 6.1 customer presentationXen server 6.1 customer presentation
Xen server 6.1 customer presentationNuno Alves
 
VMware vSphere 5 seminar
VMware vSphere 5 seminarVMware vSphere 5 seminar
VMware vSphere 5 seminarMarkiting_be
 
All About Virtualization
All About VirtualizationAll About Virtualization
All About VirtualizationEMC
 
Ws08 r2 hyper v overview r2
Ws08 r2 hyper v overview r2Ws08 r2 hyper v overview r2
Ws08 r2 hyper v overview r2Omid Koushki
 
Linux On V Mware ESXi
Linux On V Mware ESXiLinux On V Mware ESXi
Linux On V Mware ESXiMasafumi Ohta
 
VMware vSphere 5.1 Overview
VMware vSphere 5.1 OverviewVMware vSphere 5.1 Overview
VMware vSphere 5.1 OverviewESXLab
 
Esx Server 3i Presentation[1]
Esx Server 3i Presentation[1]Esx Server 3i Presentation[1]
Esx Server 3i Presentation[1]Rishi Sharma
 
Cloud Computing and Virtualization
Cloud Computing and Virtualization Cloud Computing and Virtualization
Cloud Computing and Virtualization Mahbub Noor Bappy
 
VMware Esx Short Presentation
VMware Esx Short PresentationVMware Esx Short Presentation
VMware Esx Short PresentationBarcamp Cork
 
Windows Server 2012 - Dynamische opslag met Storage Pools
Windows Server 2012 - Dynamische opslag met Storage PoolsWindows Server 2012 - Dynamische opslag met Storage Pools
Windows Server 2012 - Dynamische opslag met Storage PoolsCompuTrain. De IT opleider.
 
The Storage Hypervisor: The missing link for the Software Defined Datacenter
The Storage Hypervisor:  The missing link for the Software Defined Datacenter The Storage Hypervisor:  The missing link for the Software Defined Datacenter
The Storage Hypervisor: The missing link for the Software Defined Datacenter Virsto Software
 
Virsto Software Extends Storage Hypervisor Leadership with Release of Virsto ...
Virsto Software Extends Storage Hypervisor Leadership with Release of Virsto ...Virsto Software Extends Storage Hypervisor Leadership with Release of Virsto ...
Virsto Software Extends Storage Hypervisor Leadership with Release of Virsto ...Virsto Software
 
V mware v sphere boot camp
V mware v sphere boot campV mware v sphere boot camp
V mware v sphere boot campbestip
 
VMWARE VS MS-HYPER-V
VMWARE VS MS-HYPER-VVMWARE VS MS-HYPER-V
VMWARE VS MS-HYPER-VDavid Ramirez
 
20090911 virtualizationandcloud
20090911 virtualizationandcloud20090911 virtualizationandcloud
20090911 virtualizationandcloudSupratik Ghatak
 
virtualization and cloud
virtualization and cloudvirtualization and cloud
virtualization and cloudsankarimsc
 
V mware v sphere advanced administration
V mware v sphere advanced administrationV mware v sphere advanced administration
V mware v sphere advanced administrationbestip
 
What’s New in VMware vSphere 7?
What’s New in VMware vSphere 7?What’s New in VMware vSphere 7?
What’s New in VMware vSphere 7?Insight
 

What's hot (20)

Xen server 6.1 customer presentation
Xen server 6.1 customer presentationXen server 6.1 customer presentation
Xen server 6.1 customer presentation
 
VMware vSphere 5 seminar
VMware vSphere 5 seminarVMware vSphere 5 seminar
VMware vSphere 5 seminar
 
All About Virtualization
All About VirtualizationAll About Virtualization
All About Virtualization
 
Ws08 r2 hyper v overview r2
Ws08 r2 hyper v overview r2Ws08 r2 hyper v overview r2
Ws08 r2 hyper v overview r2
 
Linux On V Mware ESXi
Linux On V Mware ESXiLinux On V Mware ESXi
Linux On V Mware ESXi
 
VMware vSphere 5.1 Overview
VMware vSphere 5.1 OverviewVMware vSphere 5.1 Overview
VMware vSphere 5.1 Overview
 
Esx Server 3i Presentation[1]
Esx Server 3i Presentation[1]Esx Server 3i Presentation[1]
Esx Server 3i Presentation[1]
 
Cloud Computing and Virtualization
Cloud Computing and Virtualization Cloud Computing and Virtualization
Cloud Computing and Virtualization
 
VMware Esx Short Presentation
VMware Esx Short PresentationVMware Esx Short Presentation
VMware Esx Short Presentation
 
Windows Server 2012 - Dynamische opslag met Storage Pools
Windows Server 2012 - Dynamische opslag met Storage PoolsWindows Server 2012 - Dynamische opslag met Storage Pools
Windows Server 2012 - Dynamische opslag met Storage Pools
 
VMware vSphere
VMware vSphereVMware vSphere
VMware vSphere
 
The Storage Hypervisor: The missing link for the Software Defined Datacenter
The Storage Hypervisor:  The missing link for the Software Defined Datacenter The Storage Hypervisor:  The missing link for the Software Defined Datacenter
The Storage Hypervisor: The missing link for the Software Defined Datacenter
 
VMWARE
VMWAREVMWARE
VMWARE
 
Virsto Software Extends Storage Hypervisor Leadership with Release of Virsto ...
Virsto Software Extends Storage Hypervisor Leadership with Release of Virsto ...Virsto Software Extends Storage Hypervisor Leadership with Release of Virsto ...
Virsto Software Extends Storage Hypervisor Leadership with Release of Virsto ...
 
V mware v sphere boot camp
V mware v sphere boot campV mware v sphere boot camp
V mware v sphere boot camp
 
VMWARE VS MS-HYPER-V
VMWARE VS MS-HYPER-VVMWARE VS MS-HYPER-V
VMWARE VS MS-HYPER-V
 
20090911 virtualizationandcloud
20090911 virtualizationandcloud20090911 virtualizationandcloud
20090911 virtualizationandcloud
 
virtualization and cloud
virtualization and cloudvirtualization and cloud
virtualization and cloud
 
V mware v sphere advanced administration
V mware v sphere advanced administrationV mware v sphere advanced administration
V mware v sphere advanced administration
 
What’s New in VMware vSphere 7?
What’s New in VMware vSphere 7?What’s New in VMware vSphere 7?
What’s New in VMware vSphere 7?
 

Viewers also liked

Best Practices For Using Virtualization In Development Environments
Best Practices For Using Virtualization In Development EnvironmentsBest Practices For Using Virtualization In Development Environments
Best Practices For Using Virtualization In Development EnvironmentsKnowledge Management Associates, LLC
 
XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...
XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...
XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...The Linux Foundation
 
Searchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionSearchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionOpenStack Foundation
 
Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System
Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis SystemScalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System
Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis SystemTamas K Lengyel
 
Virtualization presentation
Virtualization presentationVirtualization presentation
Virtualization presentationMangesh Gunjal
 
Populaarikulttuuri ja mainonta
Populaarikulttuuri ja mainontaPopulaarikulttuuri ja mainonta
Populaarikulttuuri ja mainontadynamo&son
 
Modified maximum tangential stress criterion for fracture behavior of zirconi...
Modified maximum tangential stress criterion for fracture behavior of zirconi...Modified maximum tangential stress criterion for fracture behavior of zirconi...
Modified maximum tangential stress criterion for fracture behavior of zirconi...dentalid
 
SharePoint Fest Denver - Is Your SharePoint Really Healthy?
SharePoint Fest Denver - Is Your SharePoint Really Healthy?SharePoint Fest Denver - Is Your SharePoint Really Healthy?
SharePoint Fest Denver - Is Your SharePoint Really Healthy?Richard Harbridge
 
Ultizing Online Space: Virtual Fairs and Online Conversion Tools (with poll r...
Ultizing Online Space: Virtual Fairs and Online Conversion Tools (with poll r...Ultizing Online Space: Virtual Fairs and Online Conversion Tools (with poll r...
Ultizing Online Space: Virtual Fairs and Online Conversion Tools (with poll r...Marty Bennett
 
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 Tutorial
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 TutorialPrivacy-Aware Data Management in Information Networks - SIGMOD 2011 Tutorial
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 TutorialKun Liu
 
Roxana Ivan - Buget mic pentru evenimente mari (Impact Hub Bucharest, 2014.02...
Roxana Ivan - Buget mic pentru evenimente mari (Impact Hub Bucharest, 2014.02...Roxana Ivan - Buget mic pentru evenimente mari (Impact Hub Bucharest, 2014.02...
Roxana Ivan - Buget mic pentru evenimente mari (Impact Hub Bucharest, 2014.02...Lumea SEO PPC
 
Шобанов Константин "Боль и удовольствие в продажах"
Шобанов Константин "Боль и удовольствие в продажах"Шобанов Константин "Боль и удовольствие в продажах"
Шобанов Константин "Боль и удовольствие в продажах"PechaKucha-Cheboksary
 
Zoekwoordenselectie
ZoekwoordenselectieZoekwoordenselectie
ZoekwoordenselectieFrank Krepel
 
Ne the god_of_religion_of_islam
Ne the god_of_religion_of_islamNe the god_of_religion_of_islam
Ne the god_of_religion_of_islamLoveofpeople
 
органи виділення
органи  виділенняоргани  виділення
органи виділенняltasenko
 
מנור גינדי בשבוע האופנה תל אביב
מנור גינדי בשבוע האופנה תל אביבמנור גינדי בשבוע האופנה תל אביב
מנור גינדי בשבוע האופנה תל אביבManor Gindi מנור גינדי
 

Viewers also liked (20)

Best Practices For Using Virtualization In Development Environments
Best Practices For Using Virtualization In Development EnvironmentsBest Practices For Using Virtualization In Development Environments
Best Practices For Using Virtualization In Development Environments
 
XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...
XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...
XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...
 
Searchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionSearchlight Updates - Liberty Edition
Searchlight Updates - Liberty Edition
 
Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System
Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis SystemScalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System
Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System
 
Virtualization
VirtualizationVirtualization
Virtualization
 
Virtualization presentation
Virtualization presentationVirtualization presentation
Virtualization presentation
 
Populaarikulttuuri ja mainonta
Populaarikulttuuri ja mainontaPopulaarikulttuuri ja mainonta
Populaarikulttuuri ja mainonta
 
Snr 2012 ee020344
Snr 2012 ee020344Snr 2012 ee020344
Snr 2012 ee020344
 
Modified maximum tangential stress criterion for fracture behavior of zirconi...
Modified maximum tangential stress criterion for fracture behavior of zirconi...Modified maximum tangential stress criterion for fracture behavior of zirconi...
Modified maximum tangential stress criterion for fracture behavior of zirconi...
 
SharePoint Fest Denver - Is Your SharePoint Really Healthy?
SharePoint Fest Denver - Is Your SharePoint Really Healthy?SharePoint Fest Denver - Is Your SharePoint Really Healthy?
SharePoint Fest Denver - Is Your SharePoint Really Healthy?
 
Ultizing Online Space: Virtual Fairs and Online Conversion Tools (with poll r...
Ultizing Online Space: Virtual Fairs and Online Conversion Tools (with poll r...Ultizing Online Space: Virtual Fairs and Online Conversion Tools (with poll r...
Ultizing Online Space: Virtual Fairs and Online Conversion Tools (with poll r...
 
20120319 aws meister-reloaded-s3
20120319 aws meister-reloaded-s320120319 aws meister-reloaded-s3
20120319 aws meister-reloaded-s3
 
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 Tutorial
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 TutorialPrivacy-Aware Data Management in Information Networks - SIGMOD 2011 Tutorial
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 Tutorial
 
Roxana Ivan - Buget mic pentru evenimente mari (Impact Hub Bucharest, 2014.02...
Roxana Ivan - Buget mic pentru evenimente mari (Impact Hub Bucharest, 2014.02...Roxana Ivan - Buget mic pentru evenimente mari (Impact Hub Bucharest, 2014.02...
Roxana Ivan - Buget mic pentru evenimente mari (Impact Hub Bucharest, 2014.02...
 
Шобанов Константин "Боль и удовольствие в продажах"
Шобанов Константин "Боль и удовольствие в продажах"Шобанов Константин "Боль и удовольствие в продажах"
Шобанов Константин "Боль и удовольствие в продажах"
 
Zoekwoordenselectie
ZoekwoordenselectieZoekwoordenselectie
Zoekwoordenselectie
 
Ne the god_of_religion_of_islam
Ne the god_of_religion_of_islamNe the god_of_religion_of_islam
Ne the god_of_religion_of_islam
 
органи виділення
органи  виділенняоргани  виділення
органи виділення
 
מנור גינדי בשבוע האופנה תל אביב
מנור גינדי בשבוע האופנה תל אביבמנור גינדי בשבוע האופנה תל אביב
מנור גינדי בשבוע האופנה תל אביב
 
Shepherd Elementary School Community Meeting Flyer
Shepherd Elementary School Community Meeting FlyerShepherd Elementary School Community Meeting Flyer
Shepherd Elementary School Community Meeting Flyer
 

Similar to virtualization tutorial at ACM bangalore Compute 2009

Hyper V - Minasi Forum 2009
Hyper V - Minasi Forum 2009Hyper V - Minasi Forum 2009
Hyper V - Minasi Forum 2009Aidan Finn
 
Virtualization Technology Overview
Virtualization Technology OverviewVirtualization Technology Overview
Virtualization Technology OverviewOpenCity Community
 
Aidan Finn Hyper V The Future Of Infrastructure
Aidan Finn   Hyper V   The Future Of InfrastructureAidan Finn   Hyper V   The Future Of Infrastructure
Aidan Finn Hyper V The Future Of InfrastructureNathan Winters
 
Hyper V R2 Deep Dive
Hyper V R2 Deep DiveHyper V R2 Deep Dive
Hyper V R2 Deep DiveAidan Finn
 
I/O仮想化最前線〜ネットワークI/Oを中心に〜
I/O仮想化最前線〜ネットワークI/Oを中心に〜I/O仮想化最前線〜ネットワークI/Oを中心に〜
I/O仮想化最前線〜ネットワークI/Oを中心に〜Ryousei Takano
 
Computer, end program
Computer, end programComputer, end program
Computer, end programSameer Verma
 
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...Ryousei Takano
 
ARM Architecture-based System Virtualization: Xen ARM open source software pr...
ARM Architecture-based System Virtualization: Xen ARM open source software pr...ARM Architecture-based System Virtualization: Xen ARM open source software pr...
ARM Architecture-based System Virtualization: Xen ARM open source software pr...The Linux Foundation
 
Scalable Object Storage with Apache CloudStack and Apache Hadoop
Scalable Object Storage with Apache CloudStack and Apache HadoopScalable Object Storage with Apache CloudStack and Apache Hadoop
Scalable Object Storage with Apache CloudStack and Apache HadoopChiradeep Vittal
 
Virtually Secure: Uncovering the risks of virtualization
Virtually Secure: Uncovering the risks of virtualizationVirtually Secure: Uncovering the risks of virtualization
Virtually Secure: Uncovering the risks of virtualizationSeccuris Inc.
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java DevelopersRichard McDougall
 
Track A-Shmuel Panijel, Windriver
Track A-Shmuel Panijel, WindriverTrack A-Shmuel Panijel, Windriver
Track A-Shmuel Panijel, Windriverchiportal
 
Windsor: Domain 0 Disaggregation for XenServer and XCP
	Windsor: Domain 0 Disaggregation for XenServer and XCP	Windsor: Domain 0 Disaggregation for XenServer and XCP
Windsor: Domain 0 Disaggregation for XenServer and XCPThe Linux Foundation
 
Hardware supports for Virtualization
Hardware supports for VirtualizationHardware supports for Virtualization
Hardware supports for VirtualizationYoonje Choi
 
Highload Frank Kohler
Highload Frank KohlerHighload Frank Kohler
Highload Frank KohlerOntico
 
Xenserver Highload Frank Kohler
Xenserver Highload Frank KohlerXenserver Highload Frank Kohler
Xenserver Highload Frank KohlerOntico
 
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterToward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterRyousei Takano
 
Presentation power vm common 2012
Presentation   power vm common 2012Presentation   power vm common 2012
Presentation power vm common 2012solarisyougood
 

Similar to virtualization tutorial at ACM bangalore Compute 2009 (20)

Hyper V - Minasi Forum 2009
Hyper V - Minasi Forum 2009Hyper V - Minasi Forum 2009
Hyper V - Minasi Forum 2009
 
Virtualization Technology Overview
Virtualization Technology OverviewVirtualization Technology Overview
Virtualization Technology Overview
 
Aidan Finn Hyper V The Future Of Infrastructure
Aidan Finn   Hyper V   The Future Of InfrastructureAidan Finn   Hyper V   The Future Of Infrastructure
Aidan Finn Hyper V The Future Of Infrastructure
 
Hyper V R2 Deep Dive
Hyper V R2 Deep DiveHyper V R2 Deep Dive
Hyper V R2 Deep Dive
 
I/O仮想化最前線〜ネットワークI/Oを中心に〜
I/O仮想化最前線〜ネットワークI/Oを中心に〜I/O仮想化最前線〜ネットワークI/Oを中心に〜
I/O仮想化最前線〜ネットワークI/Oを中心に〜
 
Computer, end program
Computer, end programComputer, end program
Computer, end program
 
XS Japan 2008 Services English
XS Japan 2008 Services EnglishXS Japan 2008 Services English
XS Japan 2008 Services English
 
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
 
ARM Architecture-based System Virtualization: Xen ARM open source software pr...
ARM Architecture-based System Virtualization: Xen ARM open source software pr...ARM Architecture-based System Virtualization: Xen ARM open source software pr...
ARM Architecture-based System Virtualization: Xen ARM open source software pr...
 
Scalable Object Storage with Apache CloudStack and Apache Hadoop
Scalable Object Storage with Apache CloudStack and Apache HadoopScalable Object Storage with Apache CloudStack and Apache Hadoop
Scalable Object Storage with Apache CloudStack and Apache Hadoop
 
Virtually Secure: Uncovering the risks of virtualization
Virtually Secure: Uncovering the risks of virtualizationVirtually Secure: Uncovering the risks of virtualization
Virtually Secure: Uncovering the risks of virtualization
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java Developers
 
Track A-Shmuel Panijel, Windriver
Track A-Shmuel Panijel, WindriverTrack A-Shmuel Panijel, Windriver
Track A-Shmuel Panijel, Windriver
 
Windsor: Domain 0 Disaggregation for XenServer and XCP
	Windsor: Domain 0 Disaggregation for XenServer and XCP	Windsor: Domain 0 Disaggregation for XenServer and XCP
Windsor: Domain 0 Disaggregation for XenServer and XCP
 
Hardware supports for Virtualization
Hardware supports for VirtualizationHardware supports for Virtualization
Hardware supports for Virtualization
 
Highload Frank Kohler
Highload Frank KohlerHighload Frank Kohler
Highload Frank Kohler
 
Xenserver Highload Frank Kohler
Xenserver Highload Frank KohlerXenserver Highload Frank Kohler
Xenserver Highload Frank Kohler
 
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterToward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
 
2166 Quayle
2166 Quayle2166 Quayle
2166 Quayle
 
Presentation power vm common 2012
Presentation   power vm common 2012Presentation   power vm common 2012
Presentation power vm common 2012
 

More from ACMBangalore

The power of abstraction
The power of abstractionThe power of abstraction
The power of abstractionACMBangalore
 
Securing Wireless Cellular Systems
Securing Wireless Cellular SystemsSecuring Wireless Cellular Systems
Securing Wireless Cellular SystemsACMBangalore
 
Overview of FreeBSD PMC Tools
Overview of FreeBSD PMC ToolsOverview of FreeBSD PMC Tools
Overview of FreeBSD PMC ToolsACMBangalore
 
Lesson from Building a Search Engine using the cloud
Lesson from Building a Search Engine using the cloudLesson from Building a Search Engine using the cloud
Lesson from Building a Search Engine using the cloudACMBangalore
 
Automated Design of Digital Microfluids Lab-on-Chip
Automated Design of Digital Microfluids Lab-on-ChipAutomated Design of Digital Microfluids Lab-on-Chip
Automated Design of Digital Microfluids Lab-on-ChipACMBangalore
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
 
Opening Remarks - Cloud Symposium
Opening Remarks - Cloud SymposiumOpening Remarks - Cloud Symposium
Opening Remarks - Cloud SymposiumACMBangalore
 
Clouds in emerging markets
Clouds in emerging marketsClouds in emerging markets
Clouds in emerging marketsACMBangalore
 
Opportunites and Challenges in Cloud COmputing
Opportunites and Challenges in Cloud COmputingOpportunites and Challenges in Cloud COmputing
Opportunites and Challenges in Cloud COmputingACMBangalore
 
Perspectives on Cloud COmputing - Google
Perspectives on Cloud COmputing - GooglePerspectives on Cloud COmputing - Google
Perspectives on Cloud COmputing - GoogleACMBangalore
 
Making of a Successful Cloud Business
Making of a Successful Cloud BusinessMaking of a Successful Cloud Business
Making of a Successful Cloud BusinessACMBangalore
 
Web Business Platforms on the Cloud
Web Business Platforms on the CloudWeb Business Platforms on the Cloud
Web Business Platforms on the CloudACMBangalore
 
Badrinath Ramamurthy Cloud Infrastructure
Badrinath Ramamurthy   Cloud InfrastructureBadrinath Ramamurthy   Cloud Infrastructure
Badrinath Ramamurthy Cloud InfrastructureACMBangalore
 
market oriented cloud
market oriented cloudmarket oriented cloud
market oriented cloudACMBangalore
 
Case study - SaaS Abs Experience Jan07 09
Case study - SaaS Abs Experience Jan07 09Case study - SaaS Abs Experience Jan07 09
Case study - SaaS Abs Experience Jan07 09ACMBangalore
 
cloud - internet rengineering
cloud - internet rengineeringcloud - internet rengineering
cloud - internet rengineeringACMBangalore
 
ACM Bangalore Distinguished Speaker Program
ACM Bangalore Distinguished Speaker ProgramACM Bangalore Distinguished Speaker Program
ACM Bangalore Distinguished Speaker ProgramACMBangalore
 

More from ACMBangalore (17)

The power of abstraction
The power of abstractionThe power of abstraction
The power of abstraction
 
Securing Wireless Cellular Systems
Securing Wireless Cellular SystemsSecuring Wireless Cellular Systems
Securing Wireless Cellular Systems
 
Overview of FreeBSD PMC Tools
Overview of FreeBSD PMC ToolsOverview of FreeBSD PMC Tools
Overview of FreeBSD PMC Tools
 
Lesson from Building a Search Engine using the cloud
Lesson from Building a Search Engine using the cloudLesson from Building a Search Engine using the cloud
Lesson from Building a Search Engine using the cloud
 
Automated Design of Digital Microfluids Lab-on-Chip
Automated Design of Digital Microfluids Lab-on-ChipAutomated Design of Digital Microfluids Lab-on-Chip
Automated Design of Digital Microfluids Lab-on-Chip
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...
 
Opening Remarks - Cloud Symposium
Opening Remarks - Cloud SymposiumOpening Remarks - Cloud Symposium
Opening Remarks - Cloud Symposium
 
Clouds in emerging markets
Clouds in emerging marketsClouds in emerging markets
Clouds in emerging markets
 
Opportunites and Challenges in Cloud COmputing
Opportunites and Challenges in Cloud COmputingOpportunites and Challenges in Cloud COmputing
Opportunites and Challenges in Cloud COmputing
 
Perspectives on Cloud COmputing - Google
Perspectives on Cloud COmputing - GooglePerspectives on Cloud COmputing - Google
Perspectives on Cloud COmputing - Google
 
Making of a Successful Cloud Business
Making of a Successful Cloud BusinessMaking of a Successful Cloud Business
Making of a Successful Cloud Business
 
Web Business Platforms on the Cloud
Web Business Platforms on the CloudWeb Business Platforms on the Cloud
Web Business Platforms on the Cloud
 
Badrinath Ramamurthy Cloud Infrastructure
Badrinath Ramamurthy   Cloud InfrastructureBadrinath Ramamurthy   Cloud Infrastructure
Badrinath Ramamurthy Cloud Infrastructure
 
market oriented cloud
market oriented cloudmarket oriented cloud
market oriented cloud
 
Case study - SaaS Abs Experience Jan07 09
Case study - SaaS Abs Experience Jan07 09Case study - SaaS Abs Experience Jan07 09
Case study - SaaS Abs Experience Jan07 09
 
cloud - internet rengineering
cloud - internet rengineeringcloud - internet rengineering
cloud - internet rengineering
 
ACM Bangalore Distinguished Speaker Program
ACM Bangalore Distinguished Speaker ProgramACM Bangalore Distinguished Speaker Program
ACM Bangalore Distinguished Speaker Program
 

Recently uploaded

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

virtualization tutorial at ACM bangalore Compute 2009

  • 1. The Hardware Revolution in Server Virtualization Mohan Parthasarathy (Hewlett-Packard) ACM Compute 2009 Tutorial 9th Jan 2009 © 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
  • 2. Agenda Server Virtualization technologies ~15 min • − Overview and history − VMM architectures − Criteria for a processor to be virtualizable X86 Virtualization ~30 min • − The x86 processor architecture overview − Virtualization challenges in x86 processors Break 1 – Q&A • Software techniques for virtualization ~ 45 min • − CPU virtualization (Binary Translation/Para-virtualization) − Memory virtualization (shadow tables/Xen writeable page tables) − I/O virtualization (device emulation) Break 2 – Q&A • Hardware techniques for virtualization ~45 min • − CPU virtualization (VT-x/AMD-V) − Memory virtualization (Intel EPT/AMD NPT) − I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV) Future Trends ~ 5 min • − Manageability − Security Did you ever wonder if the person in the puddle is real, and you're just a reflection of him? ~Calvin and Hobbes 2 16 January 2009
  • 3. Server Virtualization Technologies Software/ Software/ Hardware Firmware Resource Firmware Partitioning Virtualization Virtualization Partitioning APP1 APP2 APP1 APP2 APP2 APP1 APP2 APP1 OS1 OS2 OS1 OS2 OS2 OS1 OS Hypervisor Layer Hypervisor Layer S/W (Software/ (Software/ Firmware) Firmware) H/W CPU CPU CPU CPU CPU CPU CPU CPU Memory Memory Memory Memory Memory Memory Memory Memory HP Integrity VM HP-UX SRP HP vPar HP nPar IBM SLPARS (micro- Solaris Containers IBM DLPAR Sun DSD partitions) (Zones) Sun Logical Domains Hitachi Virtage PVC (earlier SWSoft) VMware ESX/GSX OpenVZ, Microsoft Hyper-V Isolation IBM WPAR Flexibility 3 16 January 2009 Xen, KVM, xVM…
  • 4. A brief history lesson 1960’s 1996 APP APP APP APP APP APP APP APP CMS MVS MVS CMS W2K3 W2K WNT4 Linux IBM VM/370 VMware IBM Mainframe Intel / AMD x86 Server Stanford Research VMM on IBM Mainframe • • DISCO project • Many apps on $$$ HW • VMM on cheap x86 HW • VMware in 1999 Commodity hardware becomes powerful enough to support a virtual machine manager (VMM) – so it’s back to the future with a proven technology! 4 16 January 2009
  • 5. VMM Architectures Hybrid VMM Hypervisor Type-2 VMM Hypervisor Type-1 VMM Hypervisor Guest 1 Guest 2 Guest 1 Guest 2 Guest 1 Guest 2 VMM Host OS VMM Host OS VMM (Hypervisor) Hardware Hardware Hardware Examples: Examples: Examples: -HPVM -UML -VMware ESX -VMWare GSX -Xen -Microsoft Virtual Server -MS Hyper-V 5 16 January 2009
  • 6. Hosted VM Architecture HP Integrity VM, Microsoft Virtual Server, VMware GSX 6 16 January 2009
  • 7. Virtualization Requirements – Popek and Goldberg A Model of Third Generation Machines • − Two modes of execution − Protection mechanism for the supervisor mode − A method to automatically signal the supervisor when the VM executes a sensitive instruction. Properties for a Virtual Machine Monitor • − Equivalence − Resource control − Efficiency 7 16 January 2009
  • 8. VMM Requirements (Sensitive Instructions) Ref : Analyzing the Intel Pentium’s ability to support a secure VMM – John Scott Robin (1999) 8 16 January 2009
  • 9. Agenda Server Virtualization technologies • − Overview and history − VMM architectures − Criteria for a processor to be virtualizable X86 Virtualization • − The x86 processor architecture overview − Virtualization challenges in x86 processors Software techniques for virtualization • − CPU virtualization (Binary Translation/Para-virtualization) − Memory virtualization (shadow tables/Xen writeable page tables) − I/O virtualization (device emulation) Hardware techniques for virtualization • − CPU virtualization (VT-x/AMD-V) − Memory virtualization (Intel EPT/AMD NPT) − I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV) Future Trends • − Manageability − Security 9 16 January 2009
  • 10. X86 architecture – Privilege Levels Data Structures contains Privilege Levels • DPL : Descriptor privilege level • CPL : Current Privilege Level − DPL of the access rights byte in CS segment descriptor cache register − privilege level of the code and data segment for the current task • RPL : Requested Privilege Level − the privilege level of the new selector loaded into a segment register 10 16 January 2009
  • 11. X86 memory management CS, DS, SS, FS, ES, GS 11 16 January 2009
  • 12. X86 memory management - segmentation Upper 13 bits of segment selector are used to index the descriptor table GDTR, LDTR TI = Table Indicator Select the descriptor table 0 = Global Descriptor Table 1 = Local Descriptor Table Access selector Segment base Segment limit rights Hidden part of segment register 12 16 January 2009
  • 13. X86 Paging – 32 bit mode Page Table Page Table Entry 13 16 January 2009
  • 14. X86 paging - registers 14 16 January 2009
  • 15. X86 paging – 64 bit mode with 4KB pages 4Kb Page Translation 63 48 47 39 38 30 29 21 20 12 11 0 Linear Address Page Offset Sign Extend PML4E Offset PDPE Offset PDE Offset PTE Offset 4-Kb Page In Physical Memory Page Table Page Directory Entry Entry Page Directory Page Map Pointer Level 4 12 9 9 9 9 36 512 PML4E * 512 PDPE * 512 PDE * 512 PTE = 2 4-Kb pages 15 16 January 2009
  • 16. X86 paging – 64 bit mode with 2MB pages 2Mb Page Translation 63 48 47 39 38 30 29 21 20 0 Linear Address Page Offset Sign Extend PML4E Offset PDPE Offset PDE Offset 2-Mb Page In Physical Memory Page Directory Entry Page Directory Page Map Pointer Level 4 9 21 9 9 27 512 PML4E * 512 PDPE * 512 PDE = 2 2-Mb pages 16 16 January 2009
  • 17. X86 virtualization challenges Incorrect execution when Non-faulting read Excessive run in ring level > 0 (3C1) of privileged Faulting registers (3B1) Guest Guest Ring 3 CPUID Sysenter Apps Apps Ring POPF LAR/LSL/ SGDT/SIDT/SLDT/STR CLI/ VERR/VER aliasing/ /PUSHF/SMSW/POP/ STI STR/POP W/CALL/ PUSH compression Ring 1 /PUSH INT/JMP/ RET Address space compression Ring 0 VMM Leakage of privilege level (3C1) Hardware Non-faulting write to Segment privileged state reversibility issue (eflags.IF) (3B1) on context switch 17 16 January 2009
  • 18. Agenda Server Virtualization technologies • − Overview and history − VMM architectures − Criteria for a processor to be virtualizable X86 Virtualization • − The x86 processor architecture overview − Virtualization challenges in x86 processors Software techniques for virtualization • − CPU virtualization (Binary Translation/Para-virtualization) − Memory virtualization (shadow tables/Xen writeable page tables) − I/O virtualization (device emulation) Hardware techniques for virtualization • − CPU virtualization (VT-x/AMD-V) − Memory virtualization (Intel EPT/AMD NPT) − I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV) Future Trends • − Manageability − Security 18 16 January 2009
  • 19. Dynamic Binary Translation x86 Parser & x86 x86 High Level Binary Binary Translator Data RAM Disk Code Cache Code Cache High Level Tags Optimization Low Level Code Generation Low Level Optimization and Scheduling Translator Runtime -- Execution Ref : Virtual Machines and Dynamic Translation:Implementing ISAs in Software – Joel Emer, Massachusetts Institute of Technology 19 16 January 2009
  • 20. Binary Translation - C Code Example int isPrime(int a) { for (int i = 2; i < a; i++) { if (a % i == 0) return 0; } return 1; } Ref : Keith Adams and Ole Agesen. A comparison of software and hardware techniques for x86 virtualization. Operating Systems Review, 40(5):2–13, December 2006 20 16 January 2009
  • 21. Basic Block Translation Most instructions copied identically. • Privileged instructions must be emulated. • Jumps must be translated since translation can alter code layout. • Each translated BB must end with jump to next translated BB. • Ref : Keith Adams and Ole Agesen. A comparison of software and hardware techniques for x86 virtualization. Operating Systems Review, 40(5):2–13, December 2006 21 16 January 2009
  • 22. Translation of isPrime(49) Note that prime: BB never translated since 49 is not prime. Ref : Keith Adams and Ole Agesen. A comparison of software and hardware techniques for x86 virtualization. Operating Systems Review, 40(5):2–13, December 2006 22 16 January 2009
  • 23. Para-virtualization – Xen architecture 23 16 January 2009
  • 24. Memory Virtualization – Shadow Page Tables 24 16 January 2009
  • 25. Memory Virtualization – Shadow PT vs Writeable page tables (Xen) PT PT Guest Guest PD PD VA->PA VA->MA Sync’ed Page Fault PT VMM VMM Shadow PT PD Verifies that page table update is okay CR3 CR3 25 16 January 2009
  • 26. Dynamic memory resizing - Ballooning Inflating a balloon • − When the server wants to reclaim memory − Driver allocates pinned physical pages within the VM − Increases memory pressure in the guest OS, reclaims space to satisfy the driver allocation request − Driver communicates the physical page number for each allocated page to VMM Deflating • − Frees up memory for general use within the guest OS 26 16 January 2009
  • 27. I/O system architecture overview (PCI/PCI-e) OS driver OS driver OS driver VMM CPU CPU CPU CPU CHAOS!! Root Memory Configuration Complex RX TX space 01 2 27 16 January 2009 3, 0, 0 (BDF)
  • 28. I/O Virtualization Architecture Service VM Model Monolithic Model Pass-through Model Guest VMs Service VMs VMn VM0 VMn VM0 VMn I/O Guest OS Guest OS Guest OS Guest OS Services VM0 and Apps and Apps and Apps and Apps Device Device Device Guest OS Drivers Drivers Drivers I/O Services and Apps Device Drivers Hypervisor Hypervisor Hypervisor Assigned Shared Shared Devices Devices Devices Pro: Higher Performance Pro: High Security • Pro: Highest Performance • • Pro: I/O Device Sharing Pro: I/O Device Sharing • Pro: Smaller Hypervisor • • Pro: VM Migration Pro: VM Migration • Pro: Device assisted sharing • • Con: Larger Hypervisor Con: Lower Performance • Con: Migration Challenges • • VMWare ESX Xen 28 16 January 2009
  • 29. Network Virtualization (VMWare GSX example) 29 16 January 2009
  • 30. Xen I/O Architecture - Safe Hardware Interface 30 16 January 2009
  • 31. Networking in Xen Guest Driver Domain Domain 1 Back-End Drivers Packet Data Guest Front-End Domain 2 Driver Guest Ethernet Hypervisor Domain ... Page Bridge Flipping Virtual Interrupts NIC Driver Driver Control Interrupt Hypervisor Dispatch Hardware Control + Data Packet Data Interrupts NIC CPU / Memory / Disk / Other Devices 31 16 January 2009
  • 32. Agenda Server Virtualization technologies (15 min) • − Overview and history − VMM architectures − Criteria for a processor to be virtualizable X86 Virtualization (30 min) • − The x86 processor architecture overview − Virtualization challenges in x86 processors Software techniques for virtualization (30 min) • − CPU virtualization (Binary Translation/Para-virtualization) − Memory virtualization (shadow tables/Xen writeable page tables) − I/O virtualization (device emulation) Hardware techniques for virtualization • − CPU virtualization (VT-x/AMD-V) − Memory virtualization (Intel EPT/AMD NPT) − I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV) Future Trends (5 min) • − Manageability − Security 32 16 January 2009
  • 33. CPU Virtualization with Intel VT-x Virtual Machines (VMs) Two new VT-x operating modes • − Less-privileged mode (VMX non-root) for guest OSes Apps Apps Ring 3 − More-privileged mode OS OS (VMX root) for VMM Ring 0 Two new transitions • VM Exit VM Entry − VM entry to non-root operation VMX VM Monitor (VMM) − VM exit to root operation Root Execution controls determine when exits occur • − Access to privilege state, occurrence of exceptions, etc. − Flexibility provided to minimize unwanted exits VM Control Structure (VMCS) controls VT-x operation • − Also holds guest and host state 33 16 January 2009
  • 34. VT-x Operations VM 1 VM 2 VM n VMX Ring 3 Ring 3 Ring 3 ... Non-root Ring 0 Ring 0 Ring 0 Operation VM Exit VMCS VMCS VMCS 1 2 n Ring 3 VMX IA-32 Root Operation VMRESUME VMLAUNCH Ring 0 VMXON 34 16 January 2009
  • 35. VT-x new instructions VMXON and VMXOFF • − To enter and exit VMX-root mode. VMLAUNCH: Used on initial transition from VMM to Guest • − Enters VMX non-root operation mode VMRESUME: Used on subsequent entries • − Enters VMX non-root operation mode − Loads Guest state and Exit criteria from VMCS VMEXIT • − Used on transition from Guest to VMM − Enters VMX root operation mode − Saves Guest state in VMCS − Loads VMM state from VMCS VMPTRST and VMPTRLD • − To Read and Write the VMCS pointer. VMREAD, VMWRITE, VMCLEAR • − Read from, Write to and clear a VMCS VMCALL • − Hypervisor entry point for hypercall from guest 35 16 January 2009
  • 36. VT-x Data Structures (VMCS) VMCS is a 4K table VM execution controls Controls External interrupt • which specifies the processor exiting, interrupt behaviour in window exiting, VM environment non-root mode CR3 load/store Physical addressing exiting, VPID • only, and is accessed enable, VPID value, EPT through enable, EPTP… VMREAD/VMWRITE interface Guest save state Processor state EIP, ESP, saved on VM EFLAGS, IDTR, Loads and Stores to • exits and loaded Segment the current VMCS from on VM registers etc.. pointer through entries VMPTRLD and Host save state Processor state CR3, EIP set to VMPTRST loaded on VM monitor entry, exits EFLAGS etc.. VMRESUME used if • same VMCS is being VM exit controls These fields MSR save etc.. control VM exits resumed on a processor. Else, VM entry controls These fields Interrupts on VMCLEAR followed by control VM entry, MSR VMLAUNCH. entries loads etc.. 36 16 January 2009
  • 37. VT-x solution to x86 virtualization challenges Sysenter calls into guest All reads return privilege Guest OS in full control of OS. CLI/STI optimized to level 0, GDT/LDT owned by segment/task descriptors deliver virtual interrupts to guest OS, CPUID can be VM made to trap into VMM Guest Guest Ring 3 CPUID Sysenter Apps Apps No ring LAR/LSL/VERR/ POPF compression – CLI/ SGDT/SIDT/SLDT/STR VERW/CALL/IN Ring 0 all rings STI /PUSHF/SMSW/POP/ T/JMP/RET PUSH available No need for VMM to share address space with guest – no address compression Ring -1 VMM Hardware Clean context switch on Eflags.IF is no longer used for VM entry/exit interrupt masking 37 16 January 2009
  • 38. Intel EPT/AMD NPT GPT Base Pointer (hCR3) gCR3 Guest Physical Guest Host x86 Guest Host GPT Address Linear Physical Page Tables Page Tables Address Address TLB & Caches GPT directly translates Guest Virtual addresses into Host Physical • addresses on the fly. − Uses Guest Page Table and Host-based Page Table Significant reduction in “exit frequency” • • Primary page table modifications are as fast as native • Page faults require no exits • Context switches require no exits − No shadow page table memory overhead However, results in more expensive TLB misses - The “memsweep effect” – • mitigated by large guest pages AMD ASID/Intel VPID - segments the TLB, reduces TLB purge overheads. • 38 16 January 2009
  • 39. VT-x extension: Extended Page Table (EPT) All guest-physical addresses go through extended page tables • • Includes address in CR3, address in PDE, address in PTE, etc. 39 16 January 2009
  • 40. VT-x extension: Virtual Processor IDs (VPID) The idea of a tagged TLB is that each • TLB entry is “tagged” with an identifier • Having such a tag allows the TLB entries to not be “flushed” when switching between the host and a guest • VPID is activated if the new “enable VPIP” control bit is set in VMCS Tag Virtual Address Physical Address Host 0x1000 0x10001000 Host 0x2000 0x10002000 Host 0x3000 0x10003000 Host 0x4000 0x10004000 Guest 0x1000 0xFFF01000 Guest 0x2000 0xFFF02000 Guest 0x3000 0xFFF03000 Guest 0x4000 0xFFF04000 40 16 January 2009
  • 41. VT-x extension: CPUID spoofing (Flex Migration) Allows software to “spoof” the CPUID feature bits (e.g. make • the value of the CPUID feature bits appear different than they really are). • This is the same than the CPUID spoofing feature that the current VT processors have. Live VM Live VM Migration Migration Pre 2004 2006+ (Intel® Core™) 2004+ 64 bit 32 bit 64 bit dual, single core single core quad-core Older / Existing Servers Newer Servers 41 16 January 2009
  • 42. Intel VT-d Architecture Detail DMA Requests Dev 31, Func 7 Device ID Virtual Address … Length Dev P, Func 2 Bus 255 Page Frame Bus N Fault Generation Bus 0 Dev P, Func 1 4KB Page Tables Dev 0, Func 0 Address Translation DMA Remapping Structures Device D1 Engine Device Assignment Translation Cache Structures Device D2 Address Translation Structures Context Cache Memory Access with System Memory-resident Partitioning And Physical Address Translation Structures 42 16 January 2009
  • 43. VT-d: Remapping Structures VT-d hardware selects page-table based on source of DMA request • − Requestor ID (bus / device / function) in request identifies DMA source VT-d Device Assignment Entry • 127 64 Rsvd Domain ID Rsvd Address Width 63 0 Address Space Root Pointer Rsvd Ext. Controls P Controls VT-d supports hierarchical page tables for address translation • − Page directories and page tables are 4 KB in size − 4KB base page size with support for larger page sizes − Support for DMA snoop control through page table entries VT-d Page Table Entry • 63 0 Rsvd Page-Frame / Page-Table Address Available S Rsvd Ext. W R P Controls 43 16 January 2009
  • 44. VT-d: Hardware Page Walk Requestor ID DMA Virtual Address 15 87 32 0 63 57 56 48 47 39 38 30 29 21 20 12 11 0 Level-4 Level- Level-3 Level- Level-2 Level- Level-1 Level- 000000b 000000000b Device Func Bus Page Offset table offset table offset table offset table offset Base Page Device Assignment Tables Level-4 Level- Level-3 Level- Page Page Table Level-2 Level- Table Page Example Device Assignment Level-1 Level- Table Table Entry specifying 4-level Page page table Table 44 16 January 2009
  • 45. PCI SIG IOV Overview PCIe Multi-Root IOV PCIe Single-Root IOV SI SI SI SI SI SI VI VI VI PCI SIG is standardizing mechanisms that enable PCIe Devices to be directly shared • − Single-Root IOV – Direct sharing between SIs on a single system − Multi-Root IOV – Direct sharing between SIs on multiple systems PCI-SIG IOV Specification covers “north-side” of the Device • 45 16 January 2009
  • 46. PCI SIG IOV Terminologies SR-PCIM SI SI VI VI System Image (SI) • − SW, e.g., a guest OS, to which virtual and physical devices can be assigned Virtual Intermediary (VI) • − Performs resource allocation, isolation, management and event handling PCIM – PCI Manager • − Controls configuration, management and error handling of PFs and VFs − May be in SW and/or Firmware. − May be integrated into a VI Translation Agent (TA ) • − Uses ATPT to translates PCI Bus Addresses into platform addresses PCIe Address Translation and Protection • Switch Table (ATPT) − Validates access rights of incoming PCI memory transactions. − Translates PCI Address into platform physical addresses F F 46 16 January 2009
  • 47. VT-c: Virtual Machine Device Queues (VMDq) • On the receive path, VMDq provides a hardware ‘sorter' or classifier that essentially does the pre-work for the VMM of directing which end VM the packets should go to. The NIC or LAN silicon is performing a hardware assist for the VMM layer. 47 16 January 2009
  • 48. Intel / AMD Comparison VT-x VT-x2 VT-d2 VT-d LT Intel VMENTER, VMRESUME, Extended Page IOMMU IOMMU SENTER VMREAD, VMWRITE Tables (EPT) AC VMCS – VM control seg Virtual Processor IDs (VPID) unknown 2006 2005 2008 2007 SVM-3 IOMMU SVM-2 SVM ? PCI-SIG AMD Nested page tables VMRUN ATS Improved #VMEXIT VMCB – VM control block Decode assist ASID tagged TLB (performance) Paged realmode SKINIT (security) DMA exclusion vector (security) 48 16 January 2009
  • 49. Deja-Vu – Back to the future What VT calls quot;non-root modequot;, and Pacifica calls quot;guest • modequot;, was called quot;interpretive executionquot; on the IBM VM/370 and VM/ESA mainframes. • VT's quot;vmlaunchquot; instruction and Pacifica's quot;vmrunquot; was called as quot;sie“ • Intel's quot;VMCSquot; and AMD's quot;VMCBquot; was called as quot;state descriptionquot; on the IBM mainframes. • IBM also defined the concept of shadow translation tables and a dual page-table walk in hardware. • IBM also defined a interpreted SIE for nested hypervisor support (not yet in Intel/AMD) 49 16 January 2009
  • 50. Agenda Server Virtualization technologies • − Overview and history − VMM architectures − Criteria for a processor to be virtualizable X86 Virtualization • − The x86 processor architecture overview − Virtualization challenges in x86 processors Software techniques for virtualization • − CPU virtualization (Binary Translation/Para-virtualization) − Memory virtualization (shadow tables/Xen writeable page tables) − I/O virtualization (device emulation) Hardware techniques for virtualization • − CPU virtualization (VT-x/AMD-V) − Memory virtualization (Intel EPT/AMD NPT) − I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV) Future Trends • − Manageability − Security 50 16 January 2009
  • 51. Future Trends Secure Hypervisors – The hypervisor itself like an OS can have holes. • BluePill attacks – subverting the hypervisor • Trusted Virtualization - Virtualizing TPMs for use by guest virtual machines • Trusted Virtualization – How do we trust the VMM ? Intel’s LT (LaGrande) and • AMD’s Presidio introduce architectural extensions for security Firewalls to protect guests. Xen Motion security hole • Storage QoS – FC NPIV, Storage vMotion • Datacenter/Lifecycle Management (Virtualiztion 2.0) • − OpsWare PAS (now HP Operations Orchestrator) − Novell ZENworks Orchestrator − VMware Lifecycle Manager 51 16 January 2009
  • 52. References D. L. Osisek, K. M. Jackson, and P. H. Gum. ESA/390 • interpretive-execution architecture, foundation for VM/ESA. IBM Systems Journal, 30(1):34–51, 1991. • John Scott Robin and Cynthia E. Irvine. Analysis of the Intel Pentium’s ability to support a secure virtual machine monitor. In USENIX, editor, Proceedings of the Ninth USENIX Security Symposium, August 14–17, 2000, Denver, Colorado, page 275, San Francisco, CA, USA, 2000 • Keith Adams and Ole Agesen. A comparison of software and hardware techniques for x86 virtualization. Operating Systems Review, 40(5):2–13, December 2006 • PCI IOV talks at WinHEC and HP by Michael Krause • VMWorld 2007 talk by Ole Agesen • Intel IDF 2007/2008 presentations 52 16 January 2009