Node.js was initially challenging to use in production due to memory leaks and lack of debugging tools. Over three years, Joyent developed tools like DTrace probes, MDB for debugging core dumps, Bunyan for logging, and node-restify for building HTTP services to make node.js more reliable and observable in production. These tools helped Joyent successfully deploy many internal services using node.js and identify issues through postmortem analysis. Joyent continues working to improve node.js for production use.
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Node.js in production: Reflections on three years of riding the unicorn
1. node.js in production:
Reflections on three years
of riding the unicorn
Bryan Cantrill
SVP, Engineering
bryan@joyent.com
@bcantrill
Tuesday, December 3, 13
2. Production systems
•
Production systems are ones doing real work: when
they misbehave, users or other systems are affected
•
Production systems value reliability, performance and
ease of deployment — usually in that order
•
Contrast to development systems, that value ease of
development and speed of development — in that order
•
These values can be in tension: new languages and
environments typically arise for their development
values, not their production ones
•
Would node.js be any different?
Tuesday, December 3, 13
3. node.js advantages
•
In terms of production suitability, node.js had — and still
has — a couple of major advantages going for it:
•
•
It’s built on a VM (V8) that itself was designed for
performance
•
Tuesday, December 3, 13
It leverages extant (Unix) abstractions
•
•
It’s not a new language
Its pure event-oriented model aligns ease of
programming with scalability with respect to load
As the stewards of both node and SmartOS, Joyent had
another advantage: we could change, improve or
leverage SmartOS to accommodate node in production
4. node.js challenges
•
But node.js also has a couple of major challenges:
•
•
JavaScript closures make it easy to accidentally
reference memory
•
Because node.js is often used to connect backend
components, failure to propagate back pressure can
induce memory explosion and death
•
Tuesday, December 3, 13
Single-threaded execution of JavaScript means that
compute-bound code can entirely impede progress
High performance VM also implies inscrutable core
dumps and very limited instrumentation
5. August 2010: DTrace in node.js
•
Added simple user-level statically defined tracing
(USDT) probes for node.js on platforms that support
DTrace (e.g., Mac OS X, SmartOS)
•
Probes were around connection establishment, serving
HTTP requests, etc.
•
Allowed questions to be dynamically asked of running,
production node.js servers, e.g.:
dtrace -n ‘node*:::http-server-request{
printf(“%s of %s from %sn”, args[0]->method,
args[0]->url, args[1]->remoteAddress)}‘
dtrace -n http-server-request’{
@[args[1]->remoteAddress] = count()}‘
dtrace -n gc-start’{self->ts = timestamp}’
-n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’
Tuesday, December 3, 13
6. August 2010: Deploying 0.2.x
•
In August 2010, we deployed our first node.js-based
service into production: a NodeKnockout leader-board
that used node.js DTrace probes to geolocate
connections to contestants in real-time
•
Results were promising; surprisingly easy to develop
and deploy a node.js based service — and service
consumed very little CPU
•
Watching the Node Knockout contestants in production
revealed they were all light on CPU:
•
But there was a storm cloud...
Tuesday, December 3, 13
7. August 2010: Deploying 0.2.x, cont.
•
We had a memory leak that resulted in heap exhaustion
after several hours under heavy load
•
Our service was stateless and load balanced for HA, so
this was more disconcerting than debilitating...
•
...but we also had quite a few contestants that would run
their RSS up and crash; there was clearly a larger issue:
Tuesday, December 3, 13
8. February 2011: 0.4.0
•
In February 2011, we deployed our first major node.jsbased service (on 0.4.0)
•
Service was able to be built remarkably quickly — but
with some pain-points around Connect
•
Despite being potentially a compute-bound service,
CPU consumption was (again) a non-issue
•
And with an updated node (and many fixed node leaks),
memory consumption wasn’t necessarily as acute...
•
…but we hit our first “spinning black hole” problem
Tuesday, December 3, 13
9. January 2011: node-dtrace-provider
•
Our DTrace probes in node were proving to be too lowlevel for higher-level services — we needed to allow
USDT probes to be expressed in JavaScript
•
Fortunately, DTrace community member Chris Andrews
extended his libusdt to node.js, allowed statically
defined probes in JavaScript, e.g.:
var dtp = d.createDTraceProvider(‘foo’);
var probe = dtp.addProbe(‘foo-start’);
probe.fire(function(p) {
return ([ { bar: 123, baz: ‘bar’ } ]);
});
Tuesday, December 3, 13
10. April 2011: Restify
•
Based on our experiences with Connect/Express, we
wanted to build a node module that was purpose-built to
implement HTTP-based API endpoints
•
Based on Chris Andrews’ work, we wanted to have first
class support for DTrace
•
Joyent’s Mark Cavage developed node-restify, which
quickly became the foundation for all of our services
•
Built-in DTrace support allows full observability into perroute/per-handler latency — a capability that we could
not live without at this point
Tuesday, December 3, 13
11. November 2011: MDB support for V8
•
In mid-2011, Joyent’s Dave Pacheco dared to dream the
impossible dream: full postmortem support for V8 for
MDB, the debugger native to SmartOS
•
Several unspeakable layer violations, mdb_v8 brought
postmortem debugging to node.js
•
::jsstack prints full stack including both native C++
frames and JavaScript frames
•
•
::jsprint prints JavaScript objects — from the dump
Tuesday, December 3, 13
Thanks to mdb_v8, we were able to go back to a core
dump from that infinite loop in our service deployed
several months earlier — and nail it
12. December 2011: DTrace ustack helper
•
mdb_v8 was actually a way station to an even bolder
dream: a DTrace ustack helper for node.js
•
A ustack helper is a bit of code that accompanies a
binary and assists DTrace in probe context to resolve
stack frames to their higher-level names
•
Once completed, allows user-level stack traces to be
associated with in-kernel events — like profiling events
•
Can use the DTrace profile provider to determine how a
node.js program is consuming CPU via stack sampling
Tuesday, December 3, 13
13. December 2011: Flame graphs
•
Pouring through stack traces can make hot functions
difficult to visualize
•
Joyent’s Brendan Gregg developed flame graphs, which
allow us to easily visualize thousands of sampled
stacks:
Tuesday, December 3, 13
14. January 2012: Bunyan
•
Logging was becoming more and more of a problem for
us — especially as we were developing distributed
systems in node.js
•
Joyent’s Trent Mick developed node-bunyan, a simple
and fast JSON logging library for node.js
•
Provides standardized, JSON, line-based log output that
can be easily processed with JSON tools, e.g.:
{"name":"moray","hostname":"d1cfb6c7-c975-4ed8-a689fb18f94b6bfc","pid":8393,"component":"manatee","path":"/manatee/sdc/
election","level":20,"db":{"available":2,"max":15,"size":2,"waiting":
0},"options":{"async":false,"read":true},"msg":"pg:
entered","time":"2013-12-03T02:54:24.565Z","v":0}
•
Tuesday, December 3, 13
Also includes command line tool, bunyan, for displaying
Bunyan logs
15. February 2012: npm shrinkwrap
•
npm allows for fine-grained semver control over
package dependencies, but we found that nested
dependencies could result in non-replicable installs
•
“npm shrinkwrap” generates a file that shrinkwraps all
nested dependencies into npm-shrinkwrap.json,
thereby locking down all nested versions
•
Guarantees that all installs will have same semver
versions of dependencies
•
Doesn’t necessarily guarantee identical installs,
however; for this, one needs private npm repositories
Tuesday, December 3, 13
16. April 2012: node-vasync
•
There are a number of modules that deal with some of
the mechanics of asynchronous control flow…
•
But we found that libraries that handle We found we
needed one that emphasized debugging, and in
particular,
•
node-vasync captures a number of popular flow patterns
and allows state to be inspected via MDB
Tuesday, December 3, 13
17. May 2012: ::findjsobjects
•
Building on Dave Pacheco’s mdb_v8, we implemented a
debugger command that iterates over all of memory in a
core dump, looking for JavaScript objects
•
Entirely brute force, but allows one to take a swing at a
nasty node.js issue: semantic memory leaks
> ::findjsobjects
OBJECT #OBJECTS
95709ac1
195
957093f9
66
95f13181
130
8432ff55
222
843304dd
91
8432cc55
99
95f08545
66
8432f2e1
546
9570cafd
47
8432be95
415
8432fb09
67
Tuesday, December 3, 13
#PROPS
3
9
5
3
9
9
14
2
24
3
19
CONSTRUCTOR: PROPS
Object: socket, type, handle
Object: uid, windowsVerbatimArguments, stdio, …
<anonymous> (as exports.StringDecoder): …
Buffer: length, offset, parent
Object: refreservation, creation, name, type, …
Object: time, msg, level, hostname, pid, action, …
ChildProcess: _closesNeeded, stdio, …
Array
Object: <sliced string>, <sliced string>, …
Array
Socket: errorEmitted, _bytesDispatched, …
18. May 2012: ::findjsobjects -p
•
Searching by property name allows one to find particular
objects in the JavaScript heap, e.g.:
> ::findjsobjects -p ip4addr | ::findjsobjects | ::jsprint -a
8432b109: {
ip4addr: 9aee115d: "10.88.88.200",
VLAN: 9aee1199: "0",
Host Interface: 9aee1185: "e1000g0",
Link Status: 9aee1175: "up",
MAC Address: 9aee113d: "02:08:20:47:93:82",
}
…
•
While designed for postmortem debugging, this allows
mdb_v8 to be used for in situ debugging in development
•
Also guides one to a best practice: towards unique
property names (which we have historically done in the
operating system via structure prefixing)
Tuesday, December 3, 13
19. July 2012: node-fast
•
While HTTP makes it very easy to put together a
distributed system, parsing and connection
management can become prohibitively expensive
•
In building Manta, we found that we needed something
lighter/faster; Joyent’s Mark Cavage built node-fast
•
Only what you need: fully async/duplex/persistent
connections, simple on-wire protocol (JSON), etc.
•
None of what you don’t want: no IDL madness, no object
model, no binary translation madness, etc.
•
Deliberately light and limited — HTTP is still the right
answer until it isn’t
Tuesday, December 3, 13
20. October 2012: Bunyan + DTrace
•
With all of our services using Bunyan, we could enable
dynamic logging by adding DTrace USDT probes
•
Can use the raw DTrace probes:
# dtrace -qn log-debug'{printf("%sn", copyinstr(arg0))}' -x strsize=8k
{"name":"wf-moray-backend","hostname":"414ffb35-adee-47b7-bdf4d21cb039386c","pid":
10952,"component":"MorayClient","host":"10.99.99.17","port":
2020,"req_id":"bddb180f-1770-edcf-8df2-b3a81d97e9b1","level":
20,"bucket":"wf_runners","key":"414ffb35-adee-47b7-bdf4d21cb039386c","value":
{"active_at":"2013-12-03T07:22:25.125Z","idle":false},"msg":"putObject:
entered","time":"2013-12-03T07:22:25.135Z","v":0}
...
•
Added the json() subroutine to DTrace to make this
easier to process
•
Can also use “bunyan -p” and avoid the lower-level
DTrace details entirely
Tuesday, December 3, 13
21. May 2013: --abort-on-uncaught-exception
•
Crash dumps are great — but aborting after an
uncaught exception makes it very difficult to determine
the true origin of the exception
•
Dave Pacheco implemented a V8 patch to induce a
process abort (and a core dump) on an uncaught
exception
•
This allows us to use postmortem debugging to debug
our everyday logic errors
•
Available starting in 0.10.x — we use it wherever we
have it!
Tuesday, December 3, 13
22. July 2013: Thoth
•
One of the most important systems we have built in
node is Manta, our object store featuring in situ compute
•
Manta is an excellent platform for building data-based
services — especially for large data objects
•
We built manta-thoth, a platform for core and crash
dump analysis that allows us to debug core dumps
without moving them
•
Thoth has become critically important for us to track and
automatically debug production node.js services
Tuesday, December 3, 13
23. December 2013: Dump analysis on Linux
•
Postmortem debugging has been a (the) tremendous
breakthrough for node.js in production…
•
...but despite all node’s postmortem support all being
open source, it has been limited to SmartOS
•
Some have toyed with porting MDB to Linux; this is in
principle possible, but will be rough sledding
•
Joyent’s TJ Fontaine (of node core fame) observed what
we had done with dump analysis on Manta and had a
simpler idea…
•
What about making Linux dumps consumable on
SmartOS — and therefore Manta?
Tuesday, December 3, 13
24. December 2013: Linux support in libproc
•
Over the course of a multiday engineering hackathon,
TJ and Joyent’s Max Brunning added support for Linux
crash dumps in SmartOS’s libproc
•
Fortunately, because of the way the postmortem work
was done by Dave Pacheco, it Just Works
•
Do this yourself:
https://gist.github.com/tjfontaine/de104fe058300a51f7cf
•
For Linux users: put your Linux dumps to Manta, and
you can finally debug those pesky leaks and crashes!
•
Use --abort-on-uncaught-exception and you
can use Manta and postmortem debugging to debug
more quotidian programming errors!
Tuesday, December 3, 13
25. Node.js in production!
•
For us at Joyent, the tooling that we have built into
node.js has resulted in what we believe to be the best
dynamic environment for production use
•
Yes, even when compared to much older platforms like
Java and Erlang...
•
There is still work to be done, especially around add-on
development (see TJ’s shim work!) and potentially better
bundling of objects…
•
We will continue to emphasize production deployment
and use in our stewardship of node.js!
Tuesday, December 3, 13
26. Thank you
•
@dapsays, the Patron Saint of node.js in production, for
DTrace support, MDB support, node-vasync, Manta, etc.
•
•
•
•
•
@mcavage for node-restify, node-fast, Manta, etc.
Tuesday, December 3, 13
@trentmick for node-bunyan
@chrisandrews for node-dtrace-provider
@brendangregg for flame graphs
@tjfontaine for bringing postmortem debugging to an
entirely new audience with Linux support for libproc!