Transport modularity within Salt now allows the use of various networks and transports instead of being tied to a single messaging library. Learn about the evolution of transport modularity in SaltStack and what this means for the future of orchestration and management at any scale. Review the scale and performance benefit of concurrency in SaltStack. And discuss some examples of concurrent processing in the Salt Master.
Unblocking The Main Thread Solving ANRs and Frozen Frames
Saltconf 2016: Salt stack transport and concurrency
1.
2. Salt Transport Modularity and
Concurrency for Performance and
Scale
Thomas Jackson
Staff Site Reliability Engineer
LinkedIn
3. 3
Agenda
• for item in (‘transport’, ‘concurrency’):
• History
• Problems
• Options
• Solution
4. Transport in Salt
4
Salt Transport: a history
• In the beginning Salt was primarily a remote execution engine
• Send jobs from Master to N minions (defined by some target)
• In the beginning there was
5. 5
"ZeroMQ (also spelled ØMQ, 0MQ or ZMQ)
is a high-performance asynchronous
messaging library, aimed at use in
distributed or concurrent applications.”
- Wikipedia (https://en.wikipedia.org/wiki/ZeroMQ)
6. 6
We took a normal TCP socket, injected it with a mix of radioactive
isotopes stolen from a secret Soviet atomic research project,
bombarded it with 1950-era cosmic rays, and put it into the hands
of a drug-addled comic book author with a badly-disguised fetish
for bulging muscles clad in spandex. Yes, ZeroMQ sockets are the
world-saving superheroes of the networking world.
- http://zguide.zeromq.org/page:all#How-It-Began
7. 7
Salt Transport: a history
How ZMQ PUB/SUB looks
Server
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:12345")
socket.send(”Message")
Client
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://localhost:12345")
print socket.recv()
8. 8
Salt Transport: a history
How ZMQ REQ/REP looks
Server
context = zmq.Context()
socket = context.socket(zmq.REP)
socket.bind("tcp://*:12345")
message = socket.recv()
socket.send(“got message”)
Client
context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect("tcp://localhost:12345")
socket.send("Hello”)
message = socket.recv()
9. Request lifecycle
9
Salt Transport: a history
Master Minion
1. Job publish
2. Sign-in (optional – potentially reused or cached)
3. Pillar Fetch
4. SLS/file fetch (optional)
5. Return
10. Initial ZeroMQ implementation
10
Salt Transport: a history
• Master-initiated messages
• Using the pub/sub socket pair in zmq
• All broadcast messages from the master to the minion
• Minion-initiated messages
• Using the req/rep socket pair in zmq
• All messages initiated by the minion, such as:
• Sign-in
• Job return
• Module sync
• Pillar
• Etc.
11. Initial problems
11
Salt Transport: a history
• Message loss
• Broadcasts where filtered client side
• Added zmq filtering: https://github.com/saltstack/salt/pull/13285
• Etc.
13. Larger problems
13
Salt Transport: a history
• Huge ZMQ publisher memory leak (https://github.com/zeromq/libzmq/issues/954)
• Workaround: Process manager in salt
• No concept of client state
• When messages arrive, there is no way to see if the client is still connected– which leads to auth storms
• Workaround: Exponential backoff on the minion side
• No sync "connect" (https://github.com/saltstack/salt/pull/21570)
• Workaround: fire event and wait for it to return (or timeout to expire)
• Some users have issues with the LGPL license
• Workaround: n/a
14.
15. 15
The Reliable Asynchronous Event Transport, or
RAET, is an alternative transport medium developed
specifically with Salt in mind. It has been developed to
allow queuing to happen up on the application layer
and comes with socket layer encryption. It also
abstracts a great deal of control over the socket layer
and makes it easy to bubble up errors and exceptions.
- docs.saltstack.com
Salt Transport: previous attempt
16. RAET
16
Salt Transport: previous attempt
• The good
• No ZMQ!
• The bad
• Effectively a re-implementation of the daemons (separate files, etc.)
• Unable to run zmq and RAET simultaneously (initially, hydra was added later – which just runs both daemons at once)
• The different
• Changed the model from “minions always connect” to “minions are listening”, meaning minions have a socket to
attack
18. What do we really need
18
Salt Transport: back to basics
• Salt is a platform, not a specific transport– we need transports to be modular
• Some requirements:
• Simple interface to implement (such that other modules can be written)
• Test coverage (including pre-canned tests for new modules)
• Support N transports simultaneously (for ramps, and complex infra)
• Clear contract of security/privacy requirements of various methods
22. TCP channel
22
Salt Transport: Channels!
• Wire protocol: msgpack({'head': SOMEHEADER, 'body': SOMEBODY})
• Main advantages over ZMQ? better failure modes
• Faster failure detection (if minion isn’t connected to the master, you don’t have to wait for the timeouts)
• True link-status (no more auth storms!)
• Basically, we have sockets again!
• https://docs.saltstack.com/en/develop/topics/transports/tcp.html
23. TCP: How does it look?
23
Salt Transport: Channels!
async_channel = salt.transport.client.AsyncReqChannel.factory(minion_opts)
ret = yield async_channel.send(msg)
24. TCP: How accurate?
24
Salt Transport: Channels!
• ZeroMQ
• Total jobs: 1000
• Completed jobs: 171
• Hit rate: 17.1%
• TCP
• Total jobs: 1000
• Completed jobs: 1000
• Hit rate: 100%
25. TCP: How does it perform
25
Salt Transport: Channels!
• 15 byte message
• ZeroMQ*
• Average time: 0.00295809405715
• QPS: 2246.952241147
• TCP
• Average time: 0.0023341544863
• QPS: 2580.04452801
26. TCP: How does it perform
26
Salt Transport: Channels!
• 1053 byte message
• ZeroMQ*
• Average time: 0.00278297542184
• QPS: 2489.300394919
• TCP
• Average time: 0.00251070397869
• QPS: 2602.4855051
28. The General Problem
28
Concurrency
We have lots of things to do, some of which are blocking calls to remote things which
are “slow”. It is more efficient (and overall “faster”) to work on something else while we
wait for that “slow” call.
30. Current state of concurrency in Salt
30
Concurrency
• Master-side: the master creates N Mworkers to process N requests in parallel
• N Mworkers to process N requests in parallel
• Interaces with non-blocking as well, using `while True:` loops to do timeouts etc.
• Minion-side:
• Threads used in MultiMaster for managing the multiple master connections
31. Problems
31
Concurrency
• No unified approach (multiprocessing, threading, nonblocking “loops” -- all in use)
• Slow and/or blocking operations hold process/thread while waiting
• No consistent use of non-blocking libraries, so the code is a mix of loops and
blocking calls
• Limited scalability (each approach scales differently)
32. Common solutions in Python
32
Concurrency
• Threading
• Multiprocessing
• User-space “threads”: Coroutines / stackless threads
33. 33
Concurrency
Threading
• Some isolation between threads
• Pre-emptive scheduling
Import threading
def handle_request():
ret = requests.get(‘http://slowthing/’)
# do something else
threads = []
for x in xrange(0, NUM)REQUESTS):
t = threading.Thread(target=handle_request)
t.start()
threads.append(t)
for t in threads:
t.join()
34. 34
Concurrency
Multiprocessing
• Complete isolation
• Pre-emptive scheduling
Import multiprocessing
def handle():
ret = requests.get(‘http://slowthing/’)
# do something else
Processes = []
for x in xrange(0, NUM)REQUESTS):
p = multiprocessing.Process(target=handle)
p.start()
processes.append(p)
For p in processes:
p.join()
35. • User-space “threads”: Coroutines / stackless threads
35
Concurrency
• Some libraries you may have heard of
• gevent
• Stackless python
• Greenlet
• Twisted
• Tornado
• How are these implemented
• Green threads
• callbacks
• coroutines
36. Why Coroutines?
36
Concurrency
• Coroutines have been in use in python for a while (tornado)
• The new asyncio in python3 (tulip) is coroutines
(https://docs.python.org/3/library/asyncio.html)
37. 37
Coroutines are computer program components
that generalize subroutines for nonpreemptive
multitasking, by allowing multiple entry points
for suspending and resuming execution at
certain locations.
- https://en.wikipedia.org/wiki/Coroutine
Concurrency
39. 39
Concurrency
Coroutines– what is this magic?
def some_complex_handle():
while True:
input = yield
out1 = do_something(input)
yield None
out2 = do_something2(out1)
yield None
return do_something3(out2)
40. 40
Concurrency
Tornado coroutines
• Some isolation between coroutines
• Explicit yield
• Light “threads”
Import threading
@tornado.gen.coroutine
def handle_request():
ret = yield requests.get(‘http://slow/’)
# do something else
loop = tornado.ioloop.IOLoop.current()
loop.spawn_callback(handle_request)
loop.start()
41. Coroutines– futures
41
Concurrency
• Futures are just objects that represent a thing that will complete in the future
• This allows methods to return immediately, but finish the task in the future
• This allows the callers to yield execution until the futures they depend on complete
42. 42
Concurrency
Coroutines– with futures
• Yield execution, and get returns
• Method looks fairly normal
• Stack traces in here have context
• Easy chaining of futures
@tornado.gen.coroutine
def some_complex_handle(request):
a = yield is_authd(request)
if not a:
return False
ret = yield do_request(request)
yield save1(ret), save2(ret)
return ret
43. Tornado in Salt
43
Concurrency
• What is tornado?
• Python web framework and asynchronous networking library
• Why Tornado and not asyncio?
• Free python 2.x compatibility!
• A fairly comprehensive set of libraries for it (http, locks, queues, etc.)
44. Back to the transport interfaces
44
Concurrency
• AsyncReqChannel
• send: return a future
• crypted_transfer_decode_dictentry: return a future
ret = yield channel.send(load, timeout=timeout)
45. Now what?
45
Concurrency
• Now that we have a real concurrency model, what have we done with it?
• MultiMinion in a single process (coroutine per connection)
• Easily implement concurrent networking within Salt
• TCP transport
• IPC
48. Race conditions
48
Concurrency problems
• Weird data problems in the reactor: https://github.com/saltstack/salt/issues/23373
• The underlying problem: injected stuff in modules (__salt__ etc.) were just dicts—
which aren’t threadsafe (or coroutinesafe!)
• The solution? `ContextDict`
49. Copy-on-write thread/coroutine specific dict
49
ContextDict
• Works just like a dict
• Exposes a clone() method, which creates a `ChildContextDict` which is a
thread/coroutine local copy
• With tornado’s StackContext, we switch the backing dict of the parent with your
child using a context manager
cd = ContextDict(foo=bar)
print cd[‘foo’] # will be bar
with tornado.stack_context.StackContext(cd.clone):
print cd[‘foo’] # will be bar
cd[‘foo’] = ‘baz’
print cd[‘foo’] # will be baz
print cd[‘foo’] # will be bar
More examples: https://github.com/saltstack/salt/blob/develop/tests/unit/context_test.py
51. Layers!
51
Concurrency problems
• Don’t forget, concurrency at all layers– including your DC-wide state execution
• For example: automated highstate enforcement of your whole DC
• Does it matter if all DB hosts update at once?
• Does it matter if all web servers update at once?
• Does it matter if all edge boxes update at once?
53. Things on my “list”
53
Future Awesomeness
• Transport
• failover groups
• even better HA (https://github.com/saltstack/salt/issues/25700 -- get involved in the conversation)
• Concurrency
• async ext_pillar
• Partially concurrent state execution (prefetch, etc.)?
• Coroutine-based:
• Reactor
• Engines
• Beacons
• Thorium
Transport != concurrency, although transport uses concurrency
10K foot view:
Contexts have sockets
Sockets are message passing things that are “like” sockets, but are not sockets (they are really a socket and a bunch of contexts)
ZeroMQ attempts (and succeeds) in dramatically simplifying message passing, go zmq!
Notice, to switch message types we only had to change the socket type– simple!
Basically, this means we can break down communications in salt into two categories
Effectively two separate transport issues to solve, so two socket pairs– great done
Initially zmq was awesome, as with anything we ran into a variety of weird little issues
Message loss: retries, various new versions of zmq to fix cases that dropped messages
Broadcasts: ran out of B/W for medium sized job publishes, fixed by implementing zmq’s filtering (zmq saved the day!)
But at this point, these are just bugs– nothing that’s a deal breaker
At this point, the problems it has had aren’t really zmq’s fault… so we are okay right?
Memory leaks: connecting and disconnecting on TCP causes ~600 bytes to be leaked on the master! Still unfixed to this day!
Client state: publishes have to wait timeout (even if the minion isn’t connected) AND auth storms!
So at this point, we are running into a variety of issues which we are attempting to hack around that are either getting little response, or are contrary to the design.
Basically at our scale the abstraction layer is costing us too much
At our scale (and with our availability/perf requirements) we need another transport option
SaltStack had been working on a replacement, which I’m sure you have all heard of-- RAET
NOTE: RAET in salt was being used for both transport (RAET) and concurrency (ioflo)
All that being said, RAET (the transport) isn’t bad– its just too specific (and not modular), salt isn’t this specific about anything else-- so why transport?
New systems might require new transports (QUIC, serial ports, USB, over text message??, who knows!)
There had already been some work to consolidate the transport into “channel” classes before, so might as well finish that– then make them pluggable
So, 2 types of channels: req and pub
Master:
Prefork
so that you can bind before forking– to split the FD across multiple processes (to work around python’s GIL limitations)
Process_manager, in case you need to make additional processes of your own (instead of just coroutines on the ioloop)
Post_fork
called in each process after fork, this sets up the handlers etc
Minion:
send– send load
crypted_transfer_decode_dictentry– send `load` encrypted only to the master (e.g. not with the shared symmetric key)
This means that as far as “Salt” is concerned, there is a thing (channel) I can pass something to which will get it to wherever I asked.
And of course, the system couldn’t be considered modular unless there were at least two modules
Since msgpack is an iterably parsed serialization, we can simply write the serialized payload to the wire.
Crypto: still using aes that the zeromq stuff uses
People asked about performance, which TBH I didn’t really think about putting in this presentation– because I was more worried about …accuracy
This is a simple benchmark of sending 1k {‘cmd’: ‘get_token’} as quickly as possible to a master
Note: ZMQ drops a LARGE number of messages– this is due to internal queues in ZMQ filling up– so, even if tcp was slower (which it isn’t) we’d still want it
I am of course obligated to show some metrics. This is a simple benchmark of sending {‘cmd’: ‘get_token’} repeatedly from a master
Note, quick benchmark– mostly to show that it is roughly equivalent. In practice
* ZeroMQ ReqClient is apparently VERY CPU heavy (probably a bug)– it uses ~5 client processes to get this number– whereas TCP uses just one
Same as previous benchmark, we just added ~1k additional bytes to the payload
Especially important for large fast modern CPUs that have to talk to things that are slow/far-away.
Concurrency in python is more fun— because of the GIL, but still helpful because stuff is slooooow
Sorry, not a funny picture :/ but you’ve probably seen it before at some conference
Since stuff is so far, there is no reason to leave the CPU just waiting, we can do something while we wait.
Salt attempts to accommodate this…
Basically-- doing only one thing at a time severely limits your performance and scalability.
So lets go back to what our options are
Lets do some examples
Fairly clunky code, but it works.
Linux pthreads– requires a decent amount of memory, and has some hard limits based on your OS
Walk through how this runs:
Creates a thread per request
Waits for requests to finish
Thread closes
Note: still subject to the GIL
Fairly clunky code, but it works. (Note: serialization (pickle)!!)
Linux processes– requires a decent amount of memory, and has some hard limits based on your OS (pids)
Walk through how this runs:
Creates a process per request
Waits for requests to finish
Process closes
Note: no GIL!
Green threads: All the pre-emptive yields require some amount of monkey-patching, making it… difficult for a plugin based system (like Salt)
Callbacks: mess!
Coroutines-- yes
But we don’t even have to make this decision, python already did!
- quick aside RE ioflo-- basically a naive implementation of coroutines to achieve the required concurrency
for the flo based model it has, serious scaling problems, limited usage, etc.-- details can be messy, talk after :)what exactly is a coroutine?
Lets break that down Preemptive: implicit vs explicit yield
Basically coroutines are explicitly yielded tasks
lets talk a little about what coroutines are with some examples
Great examples on www.dabeaz.com/coroutines/ -- I’ll try to explain it in a shorter way, but I highly recommend reading dabeaz’s page.
To make it clearer, lets copy/paste an example of a naïve implementation in python using generators
So, something like this lets us “schedule” tasks, meaning we can interleave execution of these things– even if they aren all blocking operations
What would be even better– is if we could resume execution when whatever we are waiting on is completed
Note: the return within a generator is new in python 3.x, so tornado (and trollius) use an exception of a specifc type
Cleaner code, easy isolation, lighter concurrency (effectively just a stack)
So, something like this lets us “schedule” tasks, meaning we can interleave execution of these things– even if they aren all blocking operations
What would be even better– is if we could resume execution when whatever we are waiting on is completed
As of 2015.8– you get tornado!
So, from the client– you say “send load with timeout” and we return a future that will fulfill that contract (either send or timeout). So from the client this is SUPER clean
But, of course– we haven’t got this far without breaking anything ;)
Like anything else concurrency isn’t free
The implementation here is a RequestContext (based on tornado's stack_context). This RequestContext will do all of the bookkeeping of which coroutine/thread is currently executing-- and will switch between the values for each one.
With this I made the loader threadsafe (yay!) and it is easy to re-use if you need a concurrent copy-on-write structure
What happens if you have automated highstate enforcement across your proxies??
Limit concurrency of this particular part of your states– but not the rest