SlideShare a Scribd company logo
1 of 22
Download to read offline
Instrumenting the
real-time web:
Running node.js in production

Bryan Cantrill
VP, Engineering

bryan@joyent.com
@bcantrill
“Real-time web?”

   • The term has enjoyed some popularity, but there is
     clearly confusion about the definition of “real-time”
   • A real-time system is one in which the correctness of the
     system is relative to its timeliness
   • A hard real-time system is one which the latency
     constraints are rigid: violation constitutes total system
     failure (e.g., an actuator on a physical device)
   • A soft real-time system is one in which latency
     constraints are more flexible: violation is undesirable but
     non-fatal (e.g., a video game or MP3 player)
   • Historically, the only real-time aspect of the web has
     been in some of its static content (e.g. video, audio)
The rise of the real-time web

    • The rise of mobile + HTML5 has given rise to a new
     breed of web application: ones in which dynamic data
     has real-time semantics
    • These data-intensive real-time applications present new
     semantics for web-facing applications
    • These present new data semantics for web applications:
     CRUD, ACID, BASE, CAP — meet DIRT!
The challenge of DIRTy apps

   • DIRTy applications tend to have the human in the loop
      • Good news: deadlines are soft — microseconds only
        matter when they add up to tens of milliseconds

      • Bad news: because humans are in the loop, demand
        for the system can be non-linear

   • One must deal not only with the traditional challenge of
     scalability, but also the challenge of a real-time system!
Building DIRTy apps

   • Embedded real-time systems are sufficiently controlled
     that latency bubbles can be architected away
   • Web-facing systems are far too sloppy to expect this!
   • Focus must shift from preventing latency bubbles to
     preventing latency bubbles from cascading
   • Operations that can induce latency (network, I/O, etc.)
     must not be able to take the system out with them!
   • Implies purely asynchronous and evented architectures,
     which are notoriously difficult to implement...
Enter node.js

   • node.js is a JavaScript-based framework for building
     event-oriented servers:
      var http = require(‘http’);

      http.createServer(function (req, res) {
             res.writeHead(200, {'Content-Type': 'text/plain'});
             res.end('Hello Worldn');
      }).listen(8124, "127.0.0.1");

      console.log(‘Server running at http://127.0.0.1:8124!’);
node.js as building block

    • node.js is a confluence of three ideas:
       • JavaScriptʼs rich support for asynchrony (i.e. closures)
       • High-performance JavaScript VMs (e.g. V8)
       • The system abstractions that God intended (i.e. UNIX)
    • Because everything is asynchronous, node.js is ideal for
     delivering scale in the presence of long-latency events!
The primacy of latency

   • As the correctness of the system is its timeliness, we
     must be able to measure the system to verify it
   • In a real-time system, it does not make sense to
     measure operations per second!
   • The only metric that matters is latency
   • This is dangerous to distill to a single number; the
     distribution of latency over time is essential
   • This poses both instrumentation and visualization
     challenges!
Instrumenting for latency

    • Instrumenting for latency requires modifying the system
     twice: as an operation starts and as it finishes
    • During an operation, the system must track — on a per-
     operation basis — the start time of the operation
    • Upon operation completion, the resulting stored data
     cannot be a scalar — the distribution is essential when
     understanding latency
    • Instrumentation must be systemic; must be able to
     reach to the sources of latency deep within the system
    • These constraints eliminate static instrumentation; we
     need a better way to instrument the system
Enter DTrace

   • Facility for dynamic instrumentation of production
     systems originally developed circa 2003 for Solaris 10
   • Open sourced (along with the rest of Solaris) in 2005;
     subsequently ported to many other systems (MacOS X,
     FreeBSD, NetBSD, QNX, nascent Linux port)
   • Support for arbitrary actions, arbitrary predicates, in
     situ data aggregation, statically-defined instrumentation
   • Designed for safe, ad hoc use in production: concise
     answers to arbitrary questions
   • Particularly well suited to real-time: the original design
     center was the understanding of latency bubbles
DTrace + Node?

   • DTrace instruments the system holistically, which is to
    say, from the kernel, which poses a challenge for
    interpreted environments
   • User-level statically defined tracing (USDT) providers
    describe semantically relevant points of instrumentation
   • Some interpreted environments (e.g., Ruby, Python,
    PHP, Erlang) have added USDT providers that
    instrument the interpreter itself
   • This approach is very fine-grained (e.g., every function
    call) and doesnʼt work in JITʼd environments
   • We decided to take a different tack for Node
DTrace for node.js

    • Given the nature of the paths that we wanted to
     instrument, we introduced a function into JavaScript that
     Node can call to get into USDT-instrumented C++
    • Introduces disabled probe effect: calling from JavaScript
     into C++ costs even when probes are not enabled
    • We use USDT is-enabled probes to minimize disabled
     probe effect once in C++
    • If (and only if) the probe is enabled, we prepare a
     structure for the kernel that allows for translation into a
     structure that is familiar to node programmers
Node USDT Provider

   • Example one-liners:
     dtrace -n ‘node*:::http-server-request{
        printf(“%s of %s from %sn”, args[0]->method,
            args[0]->url, args[1]->remoteAddress)}‘

     dtrace -n http-server-request’{@[args[1]->remoteAddress] = count()}‘

     dtrace -n gc-start’{self->ts = timestamp}’ 
        -n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’



   • A script to measure HTTP latency:
     http-server-request
     {
            self->ts[args[1]->fd] = timestamp;
     }

     http-server-response
     /self->ts[args[0]->fd]/
     {
            @[zonename] = quantize(timestamp - self->ts[args[0]->fd]);
     }
User-defined USDT probes in node.js

   • Our USDT technique has been generalized by Chris
     Andrews in his node-dtrace-provider npm module:
       https://github.com/chrisa/node-dtrace-provider
   • Used by Joyentʼs Mark Cavage in his ldap.js to measure
     and validate operation latency
   • But how to visualize operation latency?
Visualizing latency

    • Could visualize latency as a scalar (i.e., average):




    • This hides outliers — and in a real-time system, it is the
     outliers that you care about!
    • Using percentiles helps to convey distribution — but
     crucial detail remains hidden
Visualizing latency as a heatmap

    • Latency is much better visualized as a heatmap, with
     time on the x-axis, latency on the y-axis, and frequency
     represented with color saturation:




    • Many patterns are now visible (as in this example of
     MySQL query latency), but critical data is still hidden
Visualizing latency as a 4D heatmap

   • Can use hue to represent higher dimensionality: time on
     the x-axis, latency on the y-axis, frequency via color
     saturation, and hue representing the new dimension:




   • In this example, the higher dimension is the MySQL
     database table associated with the operation
Visualizing node.js latency

    • Using the USDT probes as foundation, we developed a
     cloud analytics facility that visualizes latency in real-time
     via four dimensional heatmaps:




    • Facility is available via Joyentʼs no.de service, Joyentʼs
     public cloud, or Joyentʼs SmartDataCenter
Debugging latency

   • Latency visualization is essential for understanding
     where latency is being induced in a complicated system,
     but how can we determine why?
   • This requires associating an external event — an I/O
     request, a network packet, a profiling interrupt — with
     the code thatʼs inducing it
   • For node.js — like other dynamic environments — this is
     historically very difficult: the VM is opaque to the OS
   • Using DTraceʼs helper mechanism, we have developed
     a V8 ustack helper that allows OS-level events to be
     correlated to the node.js-backtrace that induced them
   • Available for node 0.6.7 on Joyentʼs SmartOS
Visualizing node.js CPU latency

   • Using the node.js ustack helper and the DTrace profile
     provider, we can determine the relative frequency of
     stack backtraces in terms of CPU consumption
   • Stacks can be visualized with flame graphs, a stack
     visualization developed by Joyentʼs Brendan Gregg:
node.js in production

    • node.js is particularly amenable for the DIRTy apps that
     typify the real-time web
    • The ability to understand latency must be considered
     when deploying node.js-based systems into production!
    • Understanding latency requires dynamic instrumentation
     and novel visualization
    • At Joyent, we have added DTrace-based dynamic
     instrumentation for node.js to SmartOS, and novel
     visualization into our cloud and software offerings
    • Better production support — better observability, better
     debuggability — remains an important area of node.js
     development!
Thank you!

   • @ryah and @rmustacc for Node DTrace USDT
    integration
   • @dapsays, @rmustacc, @rob_ellis and @notmatt for
    cloud analytics
   • @chrisandrews for node-dtrace-provider and
    @mcavage for putting it to such great use in ldap.js
   • @dapsays for the V8 DTrace ustack helper
   • @brendangregg for both the heatmap and flame graph
    visualizations
   • More information: http://dtrace.org/blogs/dap,
    http://dtrace.org/blogs/brendan and http://smartos.org

More Related Content

What's hot

Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicJoseph Lu
 
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...Unity Technologies
 
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with schedulerLCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with schedulerLinaro
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux InterruptsKernel TLV
 
Introduction to the LLVM Compiler System
Introduction to the LLVM  Compiler SystemIntroduction to the LLVM  Compiler System
Introduction to the LLVM Compiler Systemzionsaint
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Valeriy Kravchuk
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 
Memory hierarchy unit 2 by ram k paliwal
Memory hierarchy  unit 2 by ram k paliwalMemory hierarchy  unit 2 by ram k paliwal
Memory hierarchy unit 2 by ram k paliwalRam Paliwal
 
WALT vs PELT : Redux - SFO17-307
WALT vs PELT : Redux  - SFO17-307WALT vs PELT : Redux  - SFO17-307
WALT vs PELT : Redux - SFO17-307Linaro
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time SystemsDeepak John
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux SystemJian-Hong Pan
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3Electronic Arts / DICE
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernellcplcp1
 
Parallel Futures of a Game Engine
Parallel Futures of a Game EngineParallel Futures of a Game Engine
Parallel Futures of a Game EngineJohan Andersson
 

What's hot (20)

Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
 
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
 
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with schedulerLCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux Interrupts
 
Introduction to the LLVM Compiler System
Introduction to the LLVM  Compiler SystemIntroduction to the LLVM  Compiler System
Introduction to the LLVM Compiler System
 
GCC, GNU compiler collection
GCC, GNU compiler collectionGCC, GNU compiler collection
GCC, GNU compiler collection
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
Embedded Linux on ARM
Embedded Linux on ARMEmbedded Linux on ARM
Embedded Linux on ARM
 
Memory hierarchy unit 2 by ram k paliwal
Memory hierarchy  unit 2 by ram k paliwalMemory hierarchy  unit 2 by ram k paliwal
Memory hierarchy unit 2 by ram k paliwal
 
WALT vs PELT : Redux - SFO17-307
WALT vs PELT : Redux  - SFO17-307WALT vs PELT : Redux  - SFO17-307
WALT vs PELT : Redux - SFO17-307
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time Systems
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
 
Linux programming - Getting self started
Linux programming - Getting self started Linux programming - Getting self started
Linux programming - Getting self started
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Linux Audio Drivers. ALSA
Linux Audio Drivers. ALSALinux Audio Drivers. ALSA
Linux Audio Drivers. ALSA
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
Parallel Futures of a Game Engine
Parallel Futures of a Game EngineParallel Futures of a Game Engine
Parallel Futures of a Game Engine
 

Viewers also liked

FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)Mícheál Ó Foghlú
 
ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale Subbu Allamaraju
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitTyler Treat
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 

Viewers also liked (8)

Node Summit 2012
Node Summit 2012Node Summit 2012
Node Summit 2012
 
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
 
ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale
 
Rqa14 secondary
Rqa14 secondaryRqa14 secondary
Rqa14 secondary
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profit
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 

Similar to Instrumenting the real-time web: Node.js in production

John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenParticular Software
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source Nitesh Jadhav
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Birmingham-20060705
Birmingham-20060705Birmingham-20060705
Birmingham-20060705Miguel Vidal
 
node.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontiernode.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontierbcantrill
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World  (Ram, ITSF 2016)Sync in an NFV World  (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)Adam Paterson
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)Calnex Solutions
 
Onboarding a Historical Company on the Cloud Journey
Onboarding a Historical Company on the Cloud JourneyOnboarding a Historical Company on the Cloud Journey
Onboarding a Historical Company on the Cloud JourneyMarius Zaharia
 
Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Kieran Kunhya
 
Fiware: Connecting to robots
Fiware: Connecting to robotsFiware: Connecting to robots
Fiware: Connecting to robotsJaime Martin Losa
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Docker, Inc.
 
Tech 2 tech low latency networking on Janet presentation
Tech 2 tech low latency networking on Janet presentationTech 2 tech low latency networking on Janet presentation
Tech 2 tech low latency networking on Janet presentationJisc
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 

Similar to Instrumenting the real-time web: Node.js in production (20)

John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Birmingham-20060705
Birmingham-20060705Birmingham-20060705
Birmingham-20060705
 
node.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontiernode.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontier
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World  (Ram, ITSF 2016)Sync in an NFV World  (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)
 
Onboarding a Historical Company on the Cloud Journey
Onboarding a Historical Company on the Cloud JourneyOnboarding a Historical Company on the Cloud Journey
Onboarding a Historical Company on the Cloud Journey
 
Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...
 
Fiware: Connecting to robots
Fiware: Connecting to robotsFiware: Connecting to robots
Fiware: Connecting to robots
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Tech 2 tech low latency networking on Janet presentation
Tech 2 tech low latency networking on Janet presentationTech 2 tech low latency networking on Janet presentation
Tech 2 tech low latency networking on Janet presentation
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 

More from bcantrill

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Presentbcantrill
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmakingbcantrill
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...bcantrill
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsbcantrill
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systemsbcantrill
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolutionbcantrill
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Agebcantrill
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesbcantrill
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Lawbcantrill
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineeringbcantrill
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemapsbcantrill
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarebcantrill
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?bcantrill
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the unionbcantrill
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsbcantrill
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after darkbcantrill
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadershipbcantrill
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathbcantrill
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondbcantrill
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindbcantrill
 

More from bcantrill (20)

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Present
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmaking
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systems
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systems
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolution
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Age
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Law
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemaps
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system software
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the union
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after dark
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadership
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data path
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyond
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mind
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Instrumenting the real-time web: Node.js in production

  • 1. Instrumenting the real-time web: Running node.js in production Bryan Cantrill VP, Engineering bryan@joyent.com @bcantrill
  • 2. “Real-time web?” • The term has enjoyed some popularity, but there is clearly confusion about the definition of “real-time” • A real-time system is one in which the correctness of the system is relative to its timeliness • A hard real-time system is one which the latency constraints are rigid: violation constitutes total system failure (e.g., an actuator on a physical device) • A soft real-time system is one in which latency constraints are more flexible: violation is undesirable but non-fatal (e.g., a video game or MP3 player) • Historically, the only real-time aspect of the web has been in some of its static content (e.g. video, audio)
  • 3. The rise of the real-time web • The rise of mobile + HTML5 has given rise to a new breed of web application: ones in which dynamic data has real-time semantics • These data-intensive real-time applications present new semantics for web-facing applications • These present new data semantics for web applications: CRUD, ACID, BASE, CAP — meet DIRT!
  • 4. The challenge of DIRTy apps • DIRTy applications tend to have the human in the loop • Good news: deadlines are soft — microseconds only matter when they add up to tens of milliseconds • Bad news: because humans are in the loop, demand for the system can be non-linear • One must deal not only with the traditional challenge of scalability, but also the challenge of a real-time system!
  • 5. Building DIRTy apps • Embedded real-time systems are sufficiently controlled that latency bubbles can be architected away • Web-facing systems are far too sloppy to expect this! • Focus must shift from preventing latency bubbles to preventing latency bubbles from cascading • Operations that can induce latency (network, I/O, etc.) must not be able to take the system out with them! • Implies purely asynchronous and evented architectures, which are notoriously difficult to implement...
  • 6. Enter node.js • node.js is a JavaScript-based framework for building event-oriented servers: var http = require(‘http’); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello Worldn'); }).listen(8124, "127.0.0.1"); console.log(‘Server running at http://127.0.0.1:8124!’);
  • 7. node.js as building block • node.js is a confluence of three ideas: • JavaScriptʼs rich support for asynchrony (i.e. closures) • High-performance JavaScript VMs (e.g. V8) • The system abstractions that God intended (i.e. UNIX) • Because everything is asynchronous, node.js is ideal for delivering scale in the presence of long-latency events!
  • 8. The primacy of latency • As the correctness of the system is its timeliness, we must be able to measure the system to verify it • In a real-time system, it does not make sense to measure operations per second! • The only metric that matters is latency • This is dangerous to distill to a single number; the distribution of latency over time is essential • This poses both instrumentation and visualization challenges!
  • 9. Instrumenting for latency • Instrumenting for latency requires modifying the system twice: as an operation starts and as it finishes • During an operation, the system must track — on a per- operation basis — the start time of the operation • Upon operation completion, the resulting stored data cannot be a scalar — the distribution is essential when understanding latency • Instrumentation must be systemic; must be able to reach to the sources of latency deep within the system • These constraints eliminate static instrumentation; we need a better way to instrument the system
  • 10. Enter DTrace • Facility for dynamic instrumentation of production systems originally developed circa 2003 for Solaris 10 • Open sourced (along with the rest of Solaris) in 2005; subsequently ported to many other systems (MacOS X, FreeBSD, NetBSD, QNX, nascent Linux port) • Support for arbitrary actions, arbitrary predicates, in situ data aggregation, statically-defined instrumentation • Designed for safe, ad hoc use in production: concise answers to arbitrary questions • Particularly well suited to real-time: the original design center was the understanding of latency bubbles
  • 11. DTrace + Node? • DTrace instruments the system holistically, which is to say, from the kernel, which poses a challenge for interpreted environments • User-level statically defined tracing (USDT) providers describe semantically relevant points of instrumentation • Some interpreted environments (e.g., Ruby, Python, PHP, Erlang) have added USDT providers that instrument the interpreter itself • This approach is very fine-grained (e.g., every function call) and doesnʼt work in JITʼd environments • We decided to take a different tack for Node
  • 12. DTrace for node.js • Given the nature of the paths that we wanted to instrument, we introduced a function into JavaScript that Node can call to get into USDT-instrumented C++ • Introduces disabled probe effect: calling from JavaScript into C++ costs even when probes are not enabled • We use USDT is-enabled probes to minimize disabled probe effect once in C++ • If (and only if) the probe is enabled, we prepare a structure for the kernel that allows for translation into a structure that is familiar to node programmers
  • 13. Node USDT Provider • Example one-liners: dtrace -n ‘node*:::http-server-request{ printf(“%s of %s from %sn”, args[0]->method, args[0]->url, args[1]->remoteAddress)}‘ dtrace -n http-server-request’{@[args[1]->remoteAddress] = count()}‘ dtrace -n gc-start’{self->ts = timestamp}’ -n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’ • A script to measure HTTP latency: http-server-request { self->ts[args[1]->fd] = timestamp; } http-server-response /self->ts[args[0]->fd]/ { @[zonename] = quantize(timestamp - self->ts[args[0]->fd]); }
  • 14. User-defined USDT probes in node.js • Our USDT technique has been generalized by Chris Andrews in his node-dtrace-provider npm module: https://github.com/chrisa/node-dtrace-provider • Used by Joyentʼs Mark Cavage in his ldap.js to measure and validate operation latency • But how to visualize operation latency?
  • 15. Visualizing latency • Could visualize latency as a scalar (i.e., average): • This hides outliers — and in a real-time system, it is the outliers that you care about! • Using percentiles helps to convey distribution — but crucial detail remains hidden
  • 16. Visualizing latency as a heatmap • Latency is much better visualized as a heatmap, with time on the x-axis, latency on the y-axis, and frequency represented with color saturation: • Many patterns are now visible (as in this example of MySQL query latency), but critical data is still hidden
  • 17. Visualizing latency as a 4D heatmap • Can use hue to represent higher dimensionality: time on the x-axis, latency on the y-axis, frequency via color saturation, and hue representing the new dimension: • In this example, the higher dimension is the MySQL database table associated with the operation
  • 18. Visualizing node.js latency • Using the USDT probes as foundation, we developed a cloud analytics facility that visualizes latency in real-time via four dimensional heatmaps: • Facility is available via Joyentʼs no.de service, Joyentʼs public cloud, or Joyentʼs SmartDataCenter
  • 19. Debugging latency • Latency visualization is essential for understanding where latency is being induced in a complicated system, but how can we determine why? • This requires associating an external event — an I/O request, a network packet, a profiling interrupt — with the code thatʼs inducing it • For node.js — like other dynamic environments — this is historically very difficult: the VM is opaque to the OS • Using DTraceʼs helper mechanism, we have developed a V8 ustack helper that allows OS-level events to be correlated to the node.js-backtrace that induced them • Available for node 0.6.7 on Joyentʼs SmartOS
  • 20. Visualizing node.js CPU latency • Using the node.js ustack helper and the DTrace profile provider, we can determine the relative frequency of stack backtraces in terms of CPU consumption • Stacks can be visualized with flame graphs, a stack visualization developed by Joyentʼs Brendan Gregg:
  • 21. node.js in production • node.js is particularly amenable for the DIRTy apps that typify the real-time web • The ability to understand latency must be considered when deploying node.js-based systems into production! • Understanding latency requires dynamic instrumentation and novel visualization • At Joyent, we have added DTrace-based dynamic instrumentation for node.js to SmartOS, and novel visualization into our cloud and software offerings • Better production support — better observability, better debuggability — remains an important area of node.js development!
  • 22. Thank you! • @ryah and @rmustacc for Node DTrace USDT integration • @dapsays, @rmustacc, @rob_ellis and @notmatt for cloud analytics • @chrisandrews for node-dtrace-provider and @mcavage for putting it to such great use in ldap.js • @dapsays for the V8 DTrace ustack helper • @brendangregg for both the heatmap and flame graph visualizations • More information: http://dtrace.org/blogs/dap, http://dtrace.org/blogs/brendan and http://smartos.org