SlideShare a Scribd company logo
1 of 34
Download to read offline
Experiences Porting
KVM to SmartOS


Bryan Cantrill
VP, Engineering

bryan@joyent.com
@bcantrill
WTF is SmartOS?

   • illumos-derived OS that is the foundation of both
    Joyentʼs public cloud and SmartDataCenter product
   • As an illumos derivative, has several key features:
      • ZFS: Enterprise-class copy-on-write filesystem featuring
        constant time snapshots, writable clones, built-in
        compression, checksumming, volume management, etc.

      • DTrace: Facility for dynamic, ad hoc instrumentation of
        production systems that supports in situ data aggregation,
        user-level instrumentation, etc. — and is absolutely safe

      • OS-based virtualization (Zones): Entirely secure virtual OS
        instances offering hardware performance, high multi-tenancy

      • Network virtualization (Crossbow): Virtual NIC Infrastructure
        for easy bandwidth management and resource control
KVM on SmartOS?

   • Despite its rich feature-set, SmartOS was missing an
    essential component: hardware virtualization
   • Thanks to Intel and AMD, hardware virtualization can
    now be remarkably high performing...
   • We firmly believe that the best hypervisor is the
    operating system — anyone attempting to implement a
    “thin” hypervisor will end up retracing OS history
   • KVM shares this vision — indeed, pioneered it!
   • Moreover, KVM is best-of-breed: highly competitive
    performance and a community with critical mass
   • Imperative was clear: needed to port KVM to SmartOS!
Constraining the port

    • For business and resourcing reasons, elected to focus
     exclusively on Intel VT-x with EPT...
    • ...but to not make decisions that would make later AMD
     SVM work impossible
    • Only ever interested in x86-64 host support
    • Only ever interested in x86 and x86-64 guests
    • Willing to diverge as needed to support illumos
     constructs or coding practices…
    • ...but wanted to maintain compatibility with QEMU/KVM
     interface as much as possible
Starting the port

    • KVM was (rightfully) not designed to be portable in any
     real sense — it is specific to Linux and Linux facilities
    • Became clear that emulating Linux functionality would
     be insufficient — there is simply too much divergence
    • Given the stability of KVM in Linux 2.6.34, we felt
     confident that we could diverge from the Linux
     implementation — while still being able to consume and
     contribute patches as needed
    • Also clear that just getting something to compile would
     be a significant (and largely serial) undertaking
    • Joyent engineer Max Bruning started on this in late fall...
Getting to successful compilation

   • To expedite compilation, unported blocks of code would
     be “XXXʼd out” by being enclosed in #ifdef XXX
   • To help understand when/where we hit XXXʼd code
     paths, we put a special DTrace probe with __FILE__
     and __LINE__ as arguments in the #else case
   • We could then use simple DTrace enablings to
     understand what of these cases we were hitting to
     prioritize work:
     kvm-xxx
     {
               @[stringof(arg0), probefunc, arg1] = count();
     }

     tick-10sec
     {
             printf("%-12s %-40s %-8s %8sn",
                 "FILE", "FUNCTION", "LINE", "COUNT");
             printa("%20s %8d %@8dn", @);
     }
Accelerating the port

    • By late March, Max could launch a virtual machine that
     could run in perpetuity without panicking…
    • ...but also was not making any progress booting
    • At this point, the work was more readily parallelized:
     Joyentʼs Robert Mustacchi and I joined Max in April
    • Added tooling to understand guest behavior, e.g.:
       • MDB support to map guest PFNs to QEMU VAs
       • MDB support for 16-bit disassembly (!)
       • DTrace probes on VM entry/exit and the ability to pull VM
         state in DTrace with a new vmregs[] variable
Making progress...

   • To make forward progress, we would debug the issue
     blocking us (inducing either guest or host panic)…
   • ...which was usually due to a piece that hadnʼt yet been
     ported or re-implemented
   • We would implement that piece (usually eliminating an
     XXXʼd block in the process), and debug the next issue
   • The number of XXXʼs over time tell the tale...
The tale of the port
Port milestones


                  Boots KMDB

                       Boots Linux




                           Boots Windows
Notable bugs

   • In the course of this port, we did not discover any bug
     that one would call a bug in KVM — itʼs very solid!
   • Our bugs were (essentially) all self-inflicted, e.g.:
       • We erroneously configured QEMU such that both QEMU and
         KVM thought they were responsible for the 8254/8259!

       • We use a per-CPU GSBASE where Linux does not — Linux
         KVM doesnʼt have any reason to reload the hostʼs GSBASE
         on CPU migration, but not doing so induces host GSBASE
         corruption: two physical CPUs have the same CPU pointer
         (one believes itʼs the other), resulting in total mayhem

       • We reimplemented the FPU save code in terms of our native
         equivalent — and introduced a nasty corruption bug in the
         process by plowing TS in CR0!
Port performance

   • Not surprisingly, our port performs at baremetal speeds
     for entirely CPU-bound workloads:




   • But it took us a surprising amount of time to get to this
     result: due to dynamic overclocking, SmartOS KVM was
     initially operating 5% faster than baremetal!
Port performance

   • Our port of KVM seems to at least be in the hunt on
     other workloads, e.g.:
Port status

    • Port is publicly available:
        • Github repo for KVM itself:
           https://github.com/joyent/illumos-kvm

        • Github repo for our branch of QEMU 0.14.1:
           https://github.com/joyent/illumos-kvm-cmd

        • illumos-kvm-cmd repo contains minor QEMU 0.14.1
          patches to support our port, all of which we intend to
          upstream

    • Within its scope, this port is at or near production quality
    • Worthwhile to discuss the limitations of our port, the
      divergences of our port from Linux KVM, and the
      enhancements to KVM that our port allows...
Limitation: guest memory is locked down

   • As a cloud provider, we have something of an opinion
     on this: overselling memory is only for idle workloads
   • In our experience, the dissatisfaction from QoS
     variability induced by memory oversell is not paid for by
     the marginal revenue of that oversell
   • We currently lock down guest memory; failure to lock
     down memory will result in failure to start
   • For those high multi-tenancy environments, we believe
     that hardware is the wrong level at which to virtualize...
Limitation: no memory deduplication

   • We donʼt currently have an analog to the kernel same-
     page mapping (KSM) found in Linux
   • This is technically possible, but we donʼt see an acute
     need (for the same reason we lock down guest memory)
   • We are interested to hear experiences with this:
      • What kind of memory savings does one see?
      • Is one kind of guest (Windows?) more likely to see savings?
      • What kind of performance overhead from page scanning?
Limitation: no nested virtualization

    • We donʼt currently support nested virtualization — and
     weʼre not sure that weʼre ever going to implement it
    • While for our own development purposes, we would like
     to see VMware Fusion support nested virtualization, we
     donʼt see an acute need to support it ourselves
    • Would be curious to hear about experiences with nested
     virtualization; is it being used in production, or is it
     primarily for development?
Divergence: User/kernel interface

    • To minimize patches floated on QEMU, wanted to
     minimize any changes to the user/kernel interface
    • ...but we have no anon_inode_getfd() analog
    • This is required to implement the model of a 1-to-1
     mapping between a file descriptor and a VCPU
    • Added a new KVM_CLONE ioctl that makes the driver
     state in the operated-upon instance point to another
    • To create a VCPU, QEMU (re)opens /dev/kvm, and
     calls KVM_CLONE on the new instance, specifying the
     extant instance
Divergence: Context ops

   • illumos has the ability to install context ops that are
     executed before and after a thread is scheduled on CPU
   • Context ops were originally implemented to support
     CPU performance counter virtualization
   • Context ops are installed with installctx()
   • This facility proved essential — we use it to perform the
     equivalent of kvm_sched_in()/kvm_sched_out()
Divergence: Timers

   • illumos has arbitrary resolution interval timer support via
     the cyclic subsystem
   • Cyclics can be bound to a CPU or processor set and
     can be configured to fire at different interrupt levels
   • While originally designed to be a high resolution interval
     timer facility (the system clock is implemented in terms
     of it), cyclics may also be used as a dynamically
     reprogrammable one-shots
   • All KVM timers are implemented as cyclics
   • We do not migrate cyclics when a VCPU migrates from
     one CPU to another, choosing instead to poke the target
     CPU from the cyclic handler
Enhancement: ZFS

   • Strictly speaking, we have done nothing specifically for
    ZFS: running KVM on a ZFS volume (a zvol) Just Works
   • But the presence of ZFS allows for KVM enhancements:
      • Constant time cloning allows for nearly instant provisioning
        of new KVM guests (assuming that the reference image is
        already present)

      • The ZFSʼs unified adaptive replacement cache (ARC) allows
        for guest I/O to be efficiently cached in the host — resulting
        in potentially massive improvements in random I/O
        (depending, of course, on locality)

      • We believe that ZFS remote replication can provide an
        efficient foundation for WAN-based cloning and migration
Enhancement: OS Virtualization

   • illumos has deep support for OS virtualization
   • While our implementation does not require it, we run
     KVM guests in a local zone, with the QEMU process as
     the only process
   • This was originally for reasons of accounting (we use
     the zone as the basis for QoS, resource management,
     I/O throttling, billing, instrumentation, etc.)…
   • ...but given the recent KVM vulnerabilities, it has
     become a matter of security
   • OS virtualization neatly containerizes QEMU and
     drastically reduces attack surface for QEMU exploits
Enhancement: Network virtualization

   • illumos has deep support for network virtualization
   • We create a virtual NIC (VNIC) per KVM guest
   • We wrote simple glue to connect this to virtio — and
     have been able to push 1 Gb line to/from a KVM guest
   • VNICs give us several important enhancements, all with
     minimal management overhead:
      • Anti-spoofing confines guests to a specified IP (or IPs)
      • Flow management allows guests to be capped at specified
        levels of bandwidth — essential in overcommitted networks

      • Resource management allows for observability into per-
        VNIC (and thus, per-guest) throughput from the host
Enhancement: Kernel statistics

   • illumos has the kstat facility for kernel statistics
   • We reimplemented kvm_vcpu_stat as a kstat
   • We added a kvmstat tool to illumos that consumes these
     kstats, displaying them per-second and per-VCPU
   • For example, one second of kvmstat output with two
     VMs running — one idle 2 VCPU Linux guest, with one
     booting 4 VCPU SmartOS guest:
       pid vcpu |   exits   :   haltx   irqx   irqwx   iox   mmiox   |   irqs    emul   eptv
      4668    0 |      23   :       6      0       0     1       0   |      6      16      0
      4668    1 |      25   :       6      1       0     1       0   |      6      16      0
      5026    0 |   17833   :     223   2946     707   106       0   |   3379   13315      0
      5026    1 |   18687   :     244   2761     512     0       0   |   3085   14803      0
      5026    2 |   15696   :     194   3452     542     0       0   |   3568   11230      0
      5026    3 |   16822   :     244   2817     487     0       0   |   3100   12963      0
Enhancement: DTrace

   • As of QEMU 0.14, QEMU has DTrace probes — we lit
    those up on illumos
   • Added a bevy of SDT probes to KVM itself, including all
    of the call-sites of the trace_*() routines
   • Added vmregs[] variable that queries current VMCS,
    allowing for guest behavior to be examined
   • Can all be enabled dynamically and safely, and
    aggregated on an arbitrary basis (e.g., per-VCPU, per-
    VM, per-CPU, etc.)
   • Pairs well with kvmstat to understand workload
    characteristics in production deployments
Enhancement: DTrace, cont.

   • Example D script:
     kvm-guest-exit
     {
             @[pid, tid, strexitno[vmregs[VMX_VM_EXIT_REASON]] = count();
     }

     tick-1sec
     {
             printf("%10s %10s %-50s %sn",
                 "PID", "TID", "REASON", "COUNT");
             printa("%10d %10d %-50s %@dn", @);
             printf("n");
             clear(@);
     }


   • e.g., output from fork()/exit()-heavy workload:
          PID     TID   REASON                               COUNT
         3949       3   EXIT_REASON_CR_ACCESS                0
         3949       3   EXIT_REASON_HLT                      0
         3949       3   EXIT_REASON_IO_INSTRUCTION           2
         3949       3   EXIT_REASON_EXCEPTION_NMI            11
         3949       3   EXIT_REASON_EXTERNAL_INTERRUPT       14
         3949       3   EXIT_REASON_APIC_ACCESS              202
         3949       3   EXIT_REASON_CPUID                    8440      WTF?!
Enhancement: DTrace, cont.

   • Orthogonal to this work, we have developed a real-time
     analytics framework that instruments the cloud using
     DTrace and visualizes the result
   • We have extended this facility to the new DTrace probes
     in our KVM port
   • We have only been experimenting with this very
     recently, but the results have been fascinating!
   • For example...
Enhancement: Visualizing DTrace on KVM

   • Observing ext3 write offsets in a logical volume on a
     workload that creates and removes a 3 GB file:
Enhancement: Visualizing DTrace on KVM

   • Decomposing by guest CR3 and millisecond offset
     within-the-second, sampled at 99 hertz with two
     compute-bound processes:
Enhancement: Visualizing DTrace on KVM

   • Same view, but now sampled at 999 hertz — and with
     one of the compute-bound processes reniced:
Enhancement: Visualizing DTrace on KVM

   • Same view, same sample frequency — but horsing
     around with nice values:
Enhancement: Visualizing DTrace on KVM

   • Interrupt requests decomposed by IRQ vector and offset
     within-the-second:
Engaging the community

   • We are very excited to engage the KVM community;
    potential areas of collaboration:
      • Working on KVM performance. With DTrace, we have much
        better visibility into guest behavior; it seems possible (if not
        likely!) that resulting improvements to KVM will carry from
        one host system to the other

      • Collaborating on testing. We would love to participate in
        automated KVM testing infrastructure; we dream of a farm of
        oddball ISOs and the infrastructure to boot and execute
        them!

      • Collaborating on benchmarking. We have not examined
        SPECvirt_sc2010 in detail, but would like to work with the
        community to develop standard benchmarks
Thank you!

   • Josh Wilsdon and Rob Gulewich of Joyent for their
       instrumental assistance in this effort
   • Brendan Gregg of Joyent for examining the performance
       of KVM — and for his tenacity in discovering the effects
       of dynamic overclocking!
   •   Fabrice Bellard for lighting the path with QEMU
   •   Intel for a rippinʼ fast CPU (+ EPT!) in Nehalem
   •   Avi Kivity and team for putting it all together with KVM!
   •   The illumos community for their enthusiastic support

More Related Content

What's hot

The DIY Punk Rock DevOps Playbook
The DIY Punk Rock DevOps PlaybookThe DIY Punk Rock DevOps Playbook
The DIY Punk Rock DevOps Playbookbcantrill
 
Why it’s (past) time to run containers on bare metal
Why it’s (past) time to run containers on bare metalWhy it’s (past) time to run containers on bare metal
Why it’s (past) time to run containers on bare metalbcantrill
 
node.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicornnode.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicornbcantrill
 
Leaping the chasm from proprietary to open: A survivor's guide
Leaping the chasm from proprietary to open: A survivor's guideLeaping the chasm from proprietary to open: A survivor's guide
Leaping the chasm from proprietary to open: A survivor's guidebcantrill
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondbcantrill
 
The Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of dataThe Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of databcantrill
 
open source virtualization
open source virtualizationopen source virtualization
open source virtualizationKris Buytaert
 
Containers and Cloud: From LXC to Docker to Kubernetes
Containers and Cloud: From LXC to Docker to KubernetesContainers and Cloud: From LXC to Docker to Kubernetes
Containers and Cloud: From LXC to Docker to KubernetesShreyas MM
 
BACD July 2012 : The Xen Cloud Platform
BACD July 2012 : The Xen Cloud Platform BACD July 2012 : The Xen Cloud Platform
BACD July 2012 : The Xen Cloud Platform The Linux Foundation
 
Xen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,Pavlicek
Xen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,PavlicekXen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,Pavlicek
Xen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,PavlicekThe Linux Foundation
 
Scaling and Managing Cassandra with docker, CoreOS and Presto
Scaling and Managing Cassandra with docker, CoreOS and PrestoScaling and Managing Cassandra with docker, CoreOS and Presto
Scaling and Managing Cassandra with docker, CoreOS and PrestoVali-Marius Malinoiu
 
Xen: Hypervisor for the Cloud - CCC13
Xen: Hypervisor for the Cloud - CCC13Xen: Hypervisor for the Cloud - CCC13
Xen: Hypervisor for the Cloud - CCC13The Linux Foundation
 
4 virtual router CloudStack Developer Day
4 virtual router CloudStack Developer Day4 virtual router CloudStack Developer Day
4 virtual router CloudStack Developer DayKimihiko Kitase
 

What's hot (20)

The DIY Punk Rock DevOps Playbook
The DIY Punk Rock DevOps PlaybookThe DIY Punk Rock DevOps Playbook
The DIY Punk Rock DevOps Playbook
 
Why it’s (past) time to run containers on bare metal
Why it’s (past) time to run containers on bare metalWhy it’s (past) time to run containers on bare metal
Why it’s (past) time to run containers on bare metal
 
node.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicornnode.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicorn
 
Leaping the chasm from proprietary to open: A survivor's guide
Leaping the chasm from proprietary to open: A survivor's guideLeaping the chasm from proprietary to open: A survivor's guide
Leaping the chasm from proprietary to open: A survivor's guide
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyond
 
The Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of dataThe Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of data
 
Understanding LXC & Docker
Understanding LXC & DockerUnderstanding LXC & Docker
Understanding LXC & Docker
 
Xen and Apache cloudstack
Xen and Apache cloudstack  Xen and Apache cloudstack
Xen and Apache cloudstack
 
open source virtualization
open source virtualizationopen source virtualization
open source virtualization
 
Xen @ Google, 2011
Xen @ Google, 2011Xen @ Google, 2011
Xen @ Google, 2011
 
IPv6 & Containers
IPv6 & ContainersIPv6 & Containers
IPv6 & Containers
 
Containers and Cloud: From LXC to Docker to Kubernetes
Containers and Cloud: From LXC to Docker to KubernetesContainers and Cloud: From LXC to Docker to Kubernetes
Containers and Cloud: From LXC to Docker to Kubernetes
 
BSDcon Asia 2015: Xen on FreeBSD
BSDcon Asia 2015: Xen on FreeBSDBSDcon Asia 2015: Xen on FreeBSD
BSDcon Asia 2015: Xen on FreeBSD
 
BACD July 2012 : The Xen Cloud Platform
BACD July 2012 : The Xen Cloud Platform BACD July 2012 : The Xen Cloud Platform
BACD July 2012 : The Xen Cloud Platform
 
Xen 4.3 Roadmap
Xen 4.3 RoadmapXen 4.3 Roadmap
Xen 4.3 Roadmap
 
Xen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,Pavlicek
Xen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,PavlicekXen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,Pavlicek
Xen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,Pavlicek
 
Xen Cloud Platform Update
Xen Cloud Platform UpdateXen Cloud Platform Update
Xen Cloud Platform Update
 
Scaling and Managing Cassandra with docker, CoreOS and Presto
Scaling and Managing Cassandra with docker, CoreOS and PrestoScaling and Managing Cassandra with docker, CoreOS and Presto
Scaling and Managing Cassandra with docker, CoreOS and Presto
 
Xen: Hypervisor for the Cloud - CCC13
Xen: Hypervisor for the Cloud - CCC13Xen: Hypervisor for the Cloud - CCC13
Xen: Hypervisor for the Cloud - CCC13
 
4 virtual router CloudStack Developer Day
4 virtual router CloudStack Developer Day4 virtual router CloudStack Developer Day
4 virtual router CloudStack Developer Day
 

Viewers also liked

Open solaris
Open solarisOpen solaris
Open solarisTiago
 
Docker, and Why Containers Matter
Docker, and Why Containers MatterDocker, and Why Containers Matter
Docker, and Why Containers MatterKarl Matthias
 
Introdução ao OpenSolaris
Introdução ao OpenSolarisIntrodução ao OpenSolaris
Introdução ao OpenSolarisJoão Longo
 
Introdução ao OpenSolaris
Introdução ao OpenSolarisIntrodução ao OpenSolaris
Introdução ao OpenSolarisguest830f1
 
BayLISA meetup: 8/16/12
BayLISA meetup: 8/16/12BayLISA meetup: 8/16/12
BayLISA meetup: 8/16/12bcantrill
 
The Kitchen Cloud How To: Automating Joyent SmartMachines with Chef
The Kitchen Cloud How To: Automating Joyent SmartMachines with ChefThe Kitchen Cloud How To: Automating Joyent SmartMachines with Chef
The Kitchen Cloud How To: Automating Joyent SmartMachines with ChefChef Software, Inc.
 
Fi fo euc 2014
Fi fo euc 2014Fi fo euc 2014
Fi fo euc 2014Licenser
 
Chef on SmartOS
Chef on SmartOSChef on SmartOS
Chef on SmartOSEric Saxby
 
SmartOS ZFS Architecture
SmartOS ZFS ArchitectureSmartOS ZFS Architecture
SmartOS ZFS ArchitectureBill Pijewski
 
Rabbitmq Boot System
Rabbitmq Boot SystemRabbitmq Boot System
Rabbitmq Boot SystemAlvaro Videla
 
Integrating PostgreSql with RabbitMQ
Integrating PostgreSql with RabbitMQIntegrating PostgreSql with RabbitMQ
Integrating PostgreSql with RabbitMQGavin Roy
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSTomas Vondra
 
LF Collaboration Summit: Xen Project 4 4 Features and Futures
LF Collaboration Summit: Xen Project 4 4 Features and FuturesLF Collaboration Summit: Xen Project 4 4 Features and Futures
LF Collaboration Summit: Xen Project 4 4 Features and FuturesThe Linux Foundation
 
Shipping Applications to Production in Containers with Docker
Shipping Applications to Production in Containers with DockerShipping Applications to Production in Containers with Docker
Shipping Applications to Production in Containers with DockerJérôme Petazzoni
 

Viewers also liked (20)

SmartOS Primer
SmartOS PrimerSmartOS Primer
SmartOS Primer
 
Open solaris
Open solarisOpen solaris
Open solaris
 
Docker, and Why Containers Matter
Docker, and Why Containers MatterDocker, and Why Containers Matter
Docker, and Why Containers Matter
 
Introdução ao OpenSolaris
Introdução ao OpenSolarisIntrodução ao OpenSolaris
Introdução ao OpenSolaris
 
Introdução ao OpenSolaris
Introdução ao OpenSolarisIntrodução ao OpenSolaris
Introdução ao OpenSolaris
 
BayLISA meetup: 8/16/12
BayLISA meetup: 8/16/12BayLISA meetup: 8/16/12
BayLISA meetup: 8/16/12
 
The Kitchen Cloud How To: Automating Joyent SmartMachines with Chef
The Kitchen Cloud How To: Automating Joyent SmartMachines with ChefThe Kitchen Cloud How To: Automating Joyent SmartMachines with Chef
The Kitchen Cloud How To: Automating Joyent SmartMachines with Chef
 
Fi fo euc 2014
Fi fo euc 2014Fi fo euc 2014
Fi fo euc 2014
 
Chef on SmartOS
Chef on SmartOSChef on SmartOS
Chef on SmartOS
 
SmartOS ZFS Architecture
SmartOS ZFS ArchitectureSmartOS ZFS Architecture
SmartOS ZFS Architecture
 
Taming the rabbit
Taming the rabbitTaming the rabbit
Taming the rabbit
 
Rabbitmq Boot System
Rabbitmq Boot SystemRabbitmq Boot System
Rabbitmq Boot System
 
PostgreSQL: meet your queue
PostgreSQL: meet your queuePostgreSQL: meet your queue
PostgreSQL: meet your queue
 
OpenStack on SmartOS
OpenStack on SmartOSOpenStack on SmartOS
OpenStack on SmartOS
 
Integrating PostgreSql with RabbitMQ
Integrating PostgreSql with RabbitMQIntegrating PostgreSql with RabbitMQ
Integrating PostgreSql with RabbitMQ
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
LF Collaboration Summit: Xen Project 4 4 Features and Futures
LF Collaboration Summit: Xen Project 4 4 Features and FuturesLF Collaboration Summit: Xen Project 4 4 Features and Futures
LF Collaboration Summit: Xen Project 4 4 Features and Futures
 
Performance Tuning Xen
Performance Tuning XenPerformance Tuning Xen
Performance Tuning Xen
 
Linux PV on HVM
Linux PV on HVMLinux PV on HVM
Linux PV on HVM
 
Shipping Applications to Production in Containers with Docker
Shipping Applications to Production in Containers with DockerShipping Applications to Production in Containers with Docker
Shipping Applications to Production in Containers with Docker
 

Similar to Experiences porting KVM to SmartOS

Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0guest72e8c1
 
The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)Casey Bisson
 
Unikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorAnil Madhavapeddy
 
State of virtualisation -- 2012
State of virtualisation -- 2012State of virtualisation -- 2012
State of virtualisation -- 2012Jonathan Sinclair
 
Unikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSUnikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSDocker, Inc.
 
Bridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized EnvironmentBridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized EnvironmentAndy Lee
 
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...rhatr
 
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18Casey Bisson
 
Virtualization 101 - DeepDive
Virtualization 101 - DeepDiveVirtualization 101 - DeepDive
Virtualization 101 - DeepDiveAmit Agarwal
 
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~NTT Communications Technology Development
 
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...Mihai Criveti
 
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...OpenNebula Project
 
OSv at Usenix ATC 2014
OSv at Usenix ATC 2014OSv at Usenix ATC 2014
OSv at Usenix ATC 2014Don Marti
 
3. CPU virtualization and scheduling
3. CPU virtualization and scheduling3. CPU virtualization and scheduling
3. CPU virtualization and schedulingHwanju Kim
 

Similar to Experiences porting KVM to SmartOS (20)

RMLL / LSM 2009
RMLL / LSM 2009RMLL / LSM 2009
RMLL / LSM 2009
 
Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0
 
The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)
 
Unikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library Hypervisor
 
State of virtualisation -- 2012
State of virtualisation -- 2012State of virtualisation -- 2012
State of virtualisation -- 2012
 
Unikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSUnikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOS
 
17-virtualization.pptx
17-virtualization.pptx17-virtualization.pptx
17-virtualization.pptx
 
Bridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized EnvironmentBridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized Environment
 
OpenVZ Linux Containers
OpenVZ Linux ContainersOpenVZ Linux Containers
OpenVZ Linux Containers
 
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
 
Elatt Presentation
Elatt PresentationElatt Presentation
Elatt Presentation
 
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
 
Virtualization 101 - DeepDive
Virtualization 101 - DeepDiveVirtualization 101 - DeepDive
Virtualization 101 - DeepDive
 
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
 
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
 
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
 
virtual machine.ppt
virtual machine.pptvirtual machine.ppt
virtual machine.ppt
 
OSv at Usenix ATC 2014
OSv at Usenix ATC 2014OSv at Usenix ATC 2014
OSv at Usenix ATC 2014
 
3. CPU virtualization and scheduling
3. CPU virtualization and scheduling3. CPU virtualization and scheduling
3. CPU virtualization and scheduling
 
MIPS-X
MIPS-XMIPS-X
MIPS-X
 

More from bcantrill

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Presentbcantrill
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmakingbcantrill
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...bcantrill
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsbcantrill
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systemsbcantrill
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolutionbcantrill
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Agebcantrill
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesbcantrill
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Lawbcantrill
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineeringbcantrill
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemapsbcantrill
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarebcantrill
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?bcantrill
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the unionbcantrill
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsbcantrill
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after darkbcantrill
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadershipbcantrill
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathbcantrill
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindbcantrill
 
Down Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab AllocatorDown Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab Allocatorbcantrill
 

More from bcantrill (20)

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Present
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmaking
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systems
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systems
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolution
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Age
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Law
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemaps
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system software
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the union
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after dark
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadership
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data path
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mind
 
Down Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab AllocatorDown Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab Allocator
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Experiences porting KVM to SmartOS

  • 1. Experiences Porting KVM to SmartOS Bryan Cantrill VP, Engineering bryan@joyent.com @bcantrill
  • 2. WTF is SmartOS? • illumos-derived OS that is the foundation of both Joyentʼs public cloud and SmartDataCenter product • As an illumos derivative, has several key features: • ZFS: Enterprise-class copy-on-write filesystem featuring constant time snapshots, writable clones, built-in compression, checksumming, volume management, etc. • DTrace: Facility for dynamic, ad hoc instrumentation of production systems that supports in situ data aggregation, user-level instrumentation, etc. — and is absolutely safe • OS-based virtualization (Zones): Entirely secure virtual OS instances offering hardware performance, high multi-tenancy • Network virtualization (Crossbow): Virtual NIC Infrastructure for easy bandwidth management and resource control
  • 3. KVM on SmartOS? • Despite its rich feature-set, SmartOS was missing an essential component: hardware virtualization • Thanks to Intel and AMD, hardware virtualization can now be remarkably high performing... • We firmly believe that the best hypervisor is the operating system — anyone attempting to implement a “thin” hypervisor will end up retracing OS history • KVM shares this vision — indeed, pioneered it! • Moreover, KVM is best-of-breed: highly competitive performance and a community with critical mass • Imperative was clear: needed to port KVM to SmartOS!
  • 4. Constraining the port • For business and resourcing reasons, elected to focus exclusively on Intel VT-x with EPT... • ...but to not make decisions that would make later AMD SVM work impossible • Only ever interested in x86-64 host support • Only ever interested in x86 and x86-64 guests • Willing to diverge as needed to support illumos constructs or coding practices… • ...but wanted to maintain compatibility with QEMU/KVM interface as much as possible
  • 5. Starting the port • KVM was (rightfully) not designed to be portable in any real sense — it is specific to Linux and Linux facilities • Became clear that emulating Linux functionality would be insufficient — there is simply too much divergence • Given the stability of KVM in Linux 2.6.34, we felt confident that we could diverge from the Linux implementation — while still being able to consume and contribute patches as needed • Also clear that just getting something to compile would be a significant (and largely serial) undertaking • Joyent engineer Max Bruning started on this in late fall...
  • 6. Getting to successful compilation • To expedite compilation, unported blocks of code would be “XXXʼd out” by being enclosed in #ifdef XXX • To help understand when/where we hit XXXʼd code paths, we put a special DTrace probe with __FILE__ and __LINE__ as arguments in the #else case • We could then use simple DTrace enablings to understand what of these cases we were hitting to prioritize work: kvm-xxx { @[stringof(arg0), probefunc, arg1] = count(); } tick-10sec { printf("%-12s %-40s %-8s %8sn", "FILE", "FUNCTION", "LINE", "COUNT"); printa("%20s %8d %@8dn", @); }
  • 7. Accelerating the port • By late March, Max could launch a virtual machine that could run in perpetuity without panicking… • ...but also was not making any progress booting • At this point, the work was more readily parallelized: Joyentʼs Robert Mustacchi and I joined Max in April • Added tooling to understand guest behavior, e.g.: • MDB support to map guest PFNs to QEMU VAs • MDB support for 16-bit disassembly (!) • DTrace probes on VM entry/exit and the ability to pull VM state in DTrace with a new vmregs[] variable
  • 8. Making progress... • To make forward progress, we would debug the issue blocking us (inducing either guest or host panic)… • ...which was usually due to a piece that hadnʼt yet been ported or re-implemented • We would implement that piece (usually eliminating an XXXʼd block in the process), and debug the next issue • The number of XXXʼs over time tell the tale...
  • 9. The tale of the port
  • 10. Port milestones Boots KMDB Boots Linux Boots Windows
  • 11. Notable bugs • In the course of this port, we did not discover any bug that one would call a bug in KVM — itʼs very solid! • Our bugs were (essentially) all self-inflicted, e.g.: • We erroneously configured QEMU such that both QEMU and KVM thought they were responsible for the 8254/8259! • We use a per-CPU GSBASE where Linux does not — Linux KVM doesnʼt have any reason to reload the hostʼs GSBASE on CPU migration, but not doing so induces host GSBASE corruption: two physical CPUs have the same CPU pointer (one believes itʼs the other), resulting in total mayhem • We reimplemented the FPU save code in terms of our native equivalent — and introduced a nasty corruption bug in the process by plowing TS in CR0!
  • 12. Port performance • Not surprisingly, our port performs at baremetal speeds for entirely CPU-bound workloads: • But it took us a surprising amount of time to get to this result: due to dynamic overclocking, SmartOS KVM was initially operating 5% faster than baremetal!
  • 13. Port performance • Our port of KVM seems to at least be in the hunt on other workloads, e.g.:
  • 14. Port status • Port is publicly available: • Github repo for KVM itself: https://github.com/joyent/illumos-kvm • Github repo for our branch of QEMU 0.14.1: https://github.com/joyent/illumos-kvm-cmd • illumos-kvm-cmd repo contains minor QEMU 0.14.1 patches to support our port, all of which we intend to upstream • Within its scope, this port is at or near production quality • Worthwhile to discuss the limitations of our port, the divergences of our port from Linux KVM, and the enhancements to KVM that our port allows...
  • 15. Limitation: guest memory is locked down • As a cloud provider, we have something of an opinion on this: overselling memory is only for idle workloads • In our experience, the dissatisfaction from QoS variability induced by memory oversell is not paid for by the marginal revenue of that oversell • We currently lock down guest memory; failure to lock down memory will result in failure to start • For those high multi-tenancy environments, we believe that hardware is the wrong level at which to virtualize...
  • 16. Limitation: no memory deduplication • We donʼt currently have an analog to the kernel same- page mapping (KSM) found in Linux • This is technically possible, but we donʼt see an acute need (for the same reason we lock down guest memory) • We are interested to hear experiences with this: • What kind of memory savings does one see? • Is one kind of guest (Windows?) more likely to see savings? • What kind of performance overhead from page scanning?
  • 17. Limitation: no nested virtualization • We donʼt currently support nested virtualization — and weʼre not sure that weʼre ever going to implement it • While for our own development purposes, we would like to see VMware Fusion support nested virtualization, we donʼt see an acute need to support it ourselves • Would be curious to hear about experiences with nested virtualization; is it being used in production, or is it primarily for development?
  • 18. Divergence: User/kernel interface • To minimize patches floated on QEMU, wanted to minimize any changes to the user/kernel interface • ...but we have no anon_inode_getfd() analog • This is required to implement the model of a 1-to-1 mapping between a file descriptor and a VCPU • Added a new KVM_CLONE ioctl that makes the driver state in the operated-upon instance point to another • To create a VCPU, QEMU (re)opens /dev/kvm, and calls KVM_CLONE on the new instance, specifying the extant instance
  • 19. Divergence: Context ops • illumos has the ability to install context ops that are executed before and after a thread is scheduled on CPU • Context ops were originally implemented to support CPU performance counter virtualization • Context ops are installed with installctx() • This facility proved essential — we use it to perform the equivalent of kvm_sched_in()/kvm_sched_out()
  • 20. Divergence: Timers • illumos has arbitrary resolution interval timer support via the cyclic subsystem • Cyclics can be bound to a CPU or processor set and can be configured to fire at different interrupt levels • While originally designed to be a high resolution interval timer facility (the system clock is implemented in terms of it), cyclics may also be used as a dynamically reprogrammable one-shots • All KVM timers are implemented as cyclics • We do not migrate cyclics when a VCPU migrates from one CPU to another, choosing instead to poke the target CPU from the cyclic handler
  • 21. Enhancement: ZFS • Strictly speaking, we have done nothing specifically for ZFS: running KVM on a ZFS volume (a zvol) Just Works • But the presence of ZFS allows for KVM enhancements: • Constant time cloning allows for nearly instant provisioning of new KVM guests (assuming that the reference image is already present) • The ZFSʼs unified adaptive replacement cache (ARC) allows for guest I/O to be efficiently cached in the host — resulting in potentially massive improvements in random I/O (depending, of course, on locality) • We believe that ZFS remote replication can provide an efficient foundation for WAN-based cloning and migration
  • 22. Enhancement: OS Virtualization • illumos has deep support for OS virtualization • While our implementation does not require it, we run KVM guests in a local zone, with the QEMU process as the only process • This was originally for reasons of accounting (we use the zone as the basis for QoS, resource management, I/O throttling, billing, instrumentation, etc.)… • ...but given the recent KVM vulnerabilities, it has become a matter of security • OS virtualization neatly containerizes QEMU and drastically reduces attack surface for QEMU exploits
  • 23. Enhancement: Network virtualization • illumos has deep support for network virtualization • We create a virtual NIC (VNIC) per KVM guest • We wrote simple glue to connect this to virtio — and have been able to push 1 Gb line to/from a KVM guest • VNICs give us several important enhancements, all with minimal management overhead: • Anti-spoofing confines guests to a specified IP (or IPs) • Flow management allows guests to be capped at specified levels of bandwidth — essential in overcommitted networks • Resource management allows for observability into per- VNIC (and thus, per-guest) throughput from the host
  • 24. Enhancement: Kernel statistics • illumos has the kstat facility for kernel statistics • We reimplemented kvm_vcpu_stat as a kstat • We added a kvmstat tool to illumos that consumes these kstats, displaying them per-second and per-VCPU • For example, one second of kvmstat output with two VMs running — one idle 2 VCPU Linux guest, with one booting 4 VCPU SmartOS guest: pid vcpu | exits : haltx irqx irqwx iox mmiox | irqs emul eptv 4668 0 | 23 : 6 0 0 1 0 | 6 16 0 4668 1 | 25 : 6 1 0 1 0 | 6 16 0 5026 0 | 17833 : 223 2946 707 106 0 | 3379 13315 0 5026 1 | 18687 : 244 2761 512 0 0 | 3085 14803 0 5026 2 | 15696 : 194 3452 542 0 0 | 3568 11230 0 5026 3 | 16822 : 244 2817 487 0 0 | 3100 12963 0
  • 25. Enhancement: DTrace • As of QEMU 0.14, QEMU has DTrace probes — we lit those up on illumos • Added a bevy of SDT probes to KVM itself, including all of the call-sites of the trace_*() routines • Added vmregs[] variable that queries current VMCS, allowing for guest behavior to be examined • Can all be enabled dynamically and safely, and aggregated on an arbitrary basis (e.g., per-VCPU, per- VM, per-CPU, etc.) • Pairs well with kvmstat to understand workload characteristics in production deployments
  • 26. Enhancement: DTrace, cont. • Example D script: kvm-guest-exit { @[pid, tid, strexitno[vmregs[VMX_VM_EXIT_REASON]] = count(); } tick-1sec { printf("%10s %10s %-50s %sn", "PID", "TID", "REASON", "COUNT"); printa("%10d %10d %-50s %@dn", @); printf("n"); clear(@); } • e.g., output from fork()/exit()-heavy workload: PID TID REASON COUNT 3949 3 EXIT_REASON_CR_ACCESS 0 3949 3 EXIT_REASON_HLT 0 3949 3 EXIT_REASON_IO_INSTRUCTION 2 3949 3 EXIT_REASON_EXCEPTION_NMI 11 3949 3 EXIT_REASON_EXTERNAL_INTERRUPT 14 3949 3 EXIT_REASON_APIC_ACCESS 202 3949 3 EXIT_REASON_CPUID 8440 WTF?!
  • 27. Enhancement: DTrace, cont. • Orthogonal to this work, we have developed a real-time analytics framework that instruments the cloud using DTrace and visualizes the result • We have extended this facility to the new DTrace probes in our KVM port • We have only been experimenting with this very recently, but the results have been fascinating! • For example...
  • 28. Enhancement: Visualizing DTrace on KVM • Observing ext3 write offsets in a logical volume on a workload that creates and removes a 3 GB file:
  • 29. Enhancement: Visualizing DTrace on KVM • Decomposing by guest CR3 and millisecond offset within-the-second, sampled at 99 hertz with two compute-bound processes:
  • 30. Enhancement: Visualizing DTrace on KVM • Same view, but now sampled at 999 hertz — and with one of the compute-bound processes reniced:
  • 31. Enhancement: Visualizing DTrace on KVM • Same view, same sample frequency — but horsing around with nice values:
  • 32. Enhancement: Visualizing DTrace on KVM • Interrupt requests decomposed by IRQ vector and offset within-the-second:
  • 33. Engaging the community • We are very excited to engage the KVM community; potential areas of collaboration: • Working on KVM performance. With DTrace, we have much better visibility into guest behavior; it seems possible (if not likely!) that resulting improvements to KVM will carry from one host system to the other • Collaborating on testing. We would love to participate in automated KVM testing infrastructure; we dream of a farm of oddball ISOs and the infrastructure to boot and execute them! • Collaborating on benchmarking. We have not examined SPECvirt_sc2010 in detail, but would like to work with the community to develop standard benchmarks
  • 34. Thank you! • Josh Wilsdon and Rob Gulewich of Joyent for their instrumental assistance in this effort • Brendan Gregg of Joyent for examining the performance of KVM — and for his tenacity in discovering the effects of dynamic overclocking! • Fabrice Bellard for lighting the path with QEMU • Intel for a rippinʼ fast CPU (+ EPT!) in Nehalem • Avi Kivity and team for putting it all together with KVM! • The illumos community for their enthusiastic support