4. It’s a Trap!
TCP MSS
1448 bytes
Big
Request
(256 KB)
...
Overly
Long
RTT
Response
Symptom: Overly Long Round-Trip Time for a Request + Response
5. It’s a Trap!
TCP MSS
1448 bytes Only
Specific Request Sizes?
Big
Request
(256 KB)
...
Overly
Long
RTT
Response
Symptom: Overly Long Round-Trip Time for a Request + Response
6. It’s a Trap!
TCP MSS
1448 bytes Only
Specific Request Sizes? OSs?
Big
Request
(256 KB)
...
Overly
Long
RTT
Response
Symptom: Overly Long Round-Trip Time for a Request + Response
7. It’s a Trap!
TCP MSS
1448 bytes Only
Specific Request Sizes? OSs?
Big
Request
(256 KB) Solution 1
... Set SO_RCVBUF
Overly
Long
RTT
Response
Symptom: Overly Long Round-Trip Time for a Request + Response
8. It’s a Trap!
TCP MSS
1448 bytes Only
Specific Request Sizes? OSs?
Big
Request
(256 KB) Solution 1
... Set SO_RCVBUF Pops up again!
Overly
Long
RTT
Response
Symptom: Overly Long Round-Trip Time for a Request + Response
9. It’s a Trap!
TCP MSS
1448 bytes Only
Specific Request Sizes? OSs?
Big
Request
(256 KB) Solution 1
... Set SO_RCVBUF Pops up again!
Solution 2
Overly
Long Set TCP_NODELAY
RTT
Response
Symptom: Overly Long Round-Trip Time for a Request + Response
10. It’s a Trap!
TCP MSS
1448 bytes Only
Specific Request Sizes? OSs?
Big
Request
(256 KB) Solution 1
... Set SO_RCVBUF Pops up again!
Solution 2
Overly
Long Set TCP_NODELAY
RTT
YES! but CPU Skyrockets!
Response
Symptom: Overly Long Round-Trip Time for a Request + Response
11. It’s a Trap!
TCP MSS
1448 bytes Only
Specific Request Sizes? OSs?
Big
Request
(256 KB) Solution 1
... Set SO_RCVBUF Pops up again!
Solution 2
Overly
Long Set TCP_NODELAY
RTT
? YES! but CPU Skyrockets!
Response
Symptom: Overly Long Round-Trip Time for a Request + Response
12. It’s a Trap!
TCP MSS
1448 bytes Only
Specific Request Sizes? OSs?
Big
Request
(256 KB) Solution 1
... Set SO_RCVBUF Pops up again!
Solution 2
Overly
Long Set TCP_NODELAY
RTT
? YES! but CPU Skyrockets!
Response Well understood bad
interaction!
Symptom: Overly Long Round-Trip Time for a Request + Response
14. Challenges with TCP
Nagle
“Don’t send ‘small’
segments if un-
acknowledged
segments exist”
15. Challenges with TCP
Nagle Delayed ACKs
“Don’t send ‘small’
segments if un-
+ Don’t acknowledge data
immediately. Wait a small
acknowledged period of time (200 ms)
segments exist” for responses to be
generated and piggyback
the response with the
acknowledgement
16. Challenges with TCP
Nagle Delayed ACKs Temporary Deadlock
Waiting on an
“Don’t send ‘small’
segments if un-
+ Don’t acknowledge data
immediately. Wait a small
= acknowledgement to send
acknowledged period of time (200 ms) any waiting small segment.
segments exist” for responses to be But acknowledgement is
generated and piggyback delayed waiting on more
the response with the data or a timeout
acknowledgement
17. Challenges with TCP
Nagle Delayed ACKs Temporary Deadlock
Waiting on an
“Don’t send ‘small’
segments if un-
+ Don’t acknowledge data
immediately. Wait a small
= acknowledgement to send
acknowledged period of time (200 ms) any waiting small segment.
segments exist” for responses to be But acknowledgement is
generated and piggyback delayed waiting on more
the response with the data or a timeout
acknowledgement
Solutions?
18. Little Experiment
TCP MSS
16344 bytes
(loopback)
BIG
Request
(32 MB)
...
Response
(1 KB)
Question: Does the size of a send matter that much?
19. Little Experiment
TCP MSS
16344 bytes Chunk Size RTT (msec)
(loopback)
BIG 1500 32
Request
(32 MB) 4096 16
... 8192 12
Response
(1 KB)
Question: Does the size of a send matter that much?
20. Little Experiment
TCP MSS
16344 bytes Chunk Size RTT (msec)
(loopback)
BIG 1500 32
Request
(32 MB) 4096 16 Dramatically
... 8192 12
Higher CPU
Response
(1 KB)
Question: Does the size of a send matter that much?
21. Little Experiment
TCP MSS
16344 bytes Chunk Size RTT (msec)
(loopback)
BIG 1500 32
Request
(32 MB) 4096 16 Dramatically
... 8192 12
Higher CPU
Take Away(s)
Response
(1 KB)
Question: Does the size of a send matter that much?
22. Little Experiment
TCP MSS
16344 bytes Chunk Size RTT (msec)
(loopback)
BIG 1500 32
Request
(32 MB) 4096 16 Dramatically
... 8192 12
Higher CPU
Take Away(s)
Response
(1 KB) “Small” messages are evil?
Question: Does the size of a send matter that much?
23. Little Experiment
TCP MSS
16344 bytes Chunk Size RTT (msec)
(loopback)
BIG 1500 32
Request
(32 MB) 4096 16 Dramatically
... 8192 12
Higher CPU
Take Away(s)
Response
(1 KB) “Small” messages are evil?
Chunks smaller than MSS are evil?
Question: Does the size of a send matter that much?
24. Little Experiment
TCP MSS
16344 bytes Chunk Size RTT (msec)
(loopback)
BIG 1500 32
Request
(32 MB) 4096 16 Dramatically
... 8192 12
Higher CPU
Take Away(s)
Response
(1 KB) “Small” messages are evil?
Chunks smaller than MSS are evil?
... no, or not quite ...
Question: Does the size of a send matter that much?
25. Little Experiment
TCP MSS
16344 bytes Chunk Size RTT (msec)
(loopback)
BIG 1500 32
Request
(32 MB) 4096 16 Dramatically
... 8192 12
Higher CPU
Take Away(s)
Response
(1 KB) “Small” messages are evil?
Chunks smaller than MSS are evil?
... no, or not quite ...
OS pagesize (4096 bytes) matters!
Question: Does the size of a send matter that much?
26. Little Experiment
TCP MSS
16344 bytes Chunk Size RTT (msec)
(loopback)
BIG 1500 32
Request
(32 MB) 4096 16 Dramatically
... 8192 12
Higher CPU
Take Away(s)
Response
(1 KB) “Small” messages are evil?
Chunks smaller than MSS are evil?
... no, or not quite ...
OS pagesize (4096 bytes) matters!
Why?
Question: Does the size of a send matter that much?
27. Little Experiment
TCP MSS
16344 bytes Chunk Size RTT (msec)
(loopback)
BIG 1500 32
Request
(32 MB) 4096 16 Dramatically
... 8192 12
Higher CPU
Take Away(s)
Response
(1 KB) “Small” messages are evil?
Chunks smaller than MSS are evil?
... no, or not quite ...
OS pagesize (4096 bytes) matters!
Why?
Kernel boundary crossings matter!
Question: Does the size of a send matter that much?
28. Little Experiment
TCP MSS
16344 bytes Chunk Size RTT (msec)
(loopback)
BIG 1500 32
Request
(32 MB) 4096 16 Dramatically
... 8192 12
Higher CPU
Take Away(s)
Response
(1 KB) “Small” messages are evil?
Chunks smaller than MSS are evil?
... no, or not quite ...
OS pagesize (4096 bytes) matters!
What about sendfile(2) and Why?
FileChannel.transferTo()? Kernel boundary crossings matter!
Question: Does the size of a send matter that much?
30. Challenges with UDP
Not a Stream
Message boundaries
matter!
(kernel boundary
crossings)
31. Challenges with UDP
Not a Stream
Message boundaries
matter!
(kernel boundary
crossings)
No Nagle
Small messages not
batched
32. Challenges with UDP
Not Reliable Not a Stream
Loss recovery is apps Message boundaries
responsibility matter!
(kernel boundary
crossings)
No Nagle
Small messages not
batched
33. Challenges with UDP
Not Reliable Not a Stream
Loss recovery is apps Message boundaries
responsibility matter!
(kernel boundary
crossings)
No Nagle Causes of Loss
Small messages not ‣Receiver buffer overrun
batched ‣Network congestion
(neither are strictly the apps fault)
34. Challenges with UDP
Not Reliable Not a Stream No Flow Control
Loss recovery is apps Message boundaries Potential to overrun a
responsibility matter! receiver
(kernel boundary
crossings)
No Nagle Causes of Loss
Small messages not ‣Receiver buffer overrun
batched ‣Network congestion
(neither are strictly the apps fault)
35. Challenges with UDP
Not Reliable Not a Stream No Flow Control
Loss recovery is apps Message boundaries Potential to overrun a
responsibility matter! receiver
(kernel boundary
crossings)
No Congestion
No Nagle Causes of Loss
Control
Small messages not ‣Receiver buffer overrun
batched ‣Network congestion Potential impact to all
(neither are strictly the apps fault) competing traffic!!
(unconstrained flow)
38. Network Utilization & Datagrams
“The
Data percentage of
traffic that is
data”
Data + Control
39. Network Utilization & Datagrams
“The
Data percentage of
traffic that is
data”
Data + Control
No. of 200 Byte
Utilization (%)
App Messages
1 87.7
5 97.3
20 99.3
* IP Header = 20 bytes, UDP Header = 8 bytes, no response
40. Network Utilization & Datagrams
“The
Data percentage of
traffic that is
data”
Data + Control
No. of 200 Byte
Utilization (%)
App Messages
1 87.7 Plus
Fewer
5 97.3 interrupts!
20 99.3
* IP Header = 20 bytes, UDP Header = 8 bytes, no response
41. Network Utilization & Datagrams
“The
Data percentage of
traffic that is
data”
Data + Control
Batching?
No. of 200 Byte
Utilization (%)
App Messages
1 87.7 Plus
Fewer
5 97.3 interrupts!
20 99.3
* IP Header = 20 bytes, UDP Header = 8 bytes, no response
43. Application-Level Batching?
Application
Specific
Knowledge
Applications
sometimes know
when to send small
and when to batch
* HTTP (headers + body), etc.
44. Application-Level Batching?
Application +
Specific
Knowledge
Applications
sometimes know
when to send small
and when to batch
* HTTP (headers + body), etc.
45. Application-Level Batching?
Application Performance
+ Limitations &
Specific
Knowledge Tradeoffs
Applications Nagle, Delayed ACKs,
sometimes know Chunk Sizes, UDP
when to send small Network Util, etc.
and when to batch
* HTTP (headers + body), etc.
46. Application-Level Batching?
Application Performance
+ Limitations &
=
Specific
Knowledge Tradeoffs
Applications Nagle, Delayed ACKs,
sometimes know Chunk Sizes, UDP
when to send small Network Util, etc.
and when to batch
* HTTP (headers + body), etc.
47. Application-Level Batching?
Application Performance Batching by
+ Limitations &
=
Specific the Application
Knowledge Tradeoffs
Applications can
Nagle, Delayed ACKs, optimize and make
Applications the tradeoffs
sometimes know Chunk Sizes, UDP
Network Util, etc. necessary at the time
when to send small they are needed
and when to batch
* HTTP (headers + body), etc.
48. Application-Level Batching?
Application Performance Batching by
+ Limitations &
=
Specific the Application
Knowledge Tradeoffs
Applications can
Nagle, Delayed ACKs, optimize and make
Applications the tradeoffs
sometimes know Chunk Sizes, UDP
Network Util, etc. necessary at the time
when to send small they are needed
and when to batch
* HTTP (headers + body), etc.
Addressing
49. Application-Level Batching?
Application Performance Batching by
+ Limitations &
=
Specific the Application
Knowledge Tradeoffs
Applications can
Nagle, Delayed ACKs, optimize and make
Applications the tradeoffs
sometimes know Chunk Sizes, UDP
Network Util, etc. necessary at the time
when to send small they are needed
and when to batch
* HTTP (headers + body), etc.
Addressing
‣Request/Response idiosyncrasies
50. Application-Level Batching?
Application Performance Batching by
+ Limitations &
=
Specific the Application
Knowledge Tradeoffs
Applications can
Nagle, Delayed ACKs, optimize and make
Applications the tradeoffs
sometimes know Chunk Sizes, UDP
Network Util, etc. necessary at the time
when to send small they are needed
and when to batch
* HTTP (headers + body), etc.
Addressing
‣Request/Response idiosyncrasies
‣Send-side optimizations
51. Application-Level Batching?
Application Performance Batching by
+ Limitations &
=
Specific the Application
Knowledge Tradeoffs
Applications can
Nagle, Delayed ACKs, optimize and make
Applications the tradeoffs
sometimes know Chunk Sizes, UDP
Network Util, etc. necessary at the time
when to send small they are needed
and when to batch
* HTTP (headers + body), etc.
Addressing
‣Request/Response idiosyncrasies
‣Send-side optimizations
55. Batching setsockopt()s
TCP_CORK
‣Linux only
‣Only send when MSS full, when unCORKed, or ...
56. Batching setsockopt()s
TCP_CORK
‣Linux only
‣Only send when MSS full, when unCORKed, or ...
‣... after 200 msec
57. Batching setsockopt()s
TCP_CORK
‣Linux only
‣Only send when MSS full, when unCORKed, or ...
‣... after 200 msec
‣unCORKing requires kernel boundary crossing
58. Batching setsockopt()s
TCP_CORK
‣Linux only
‣Only send when MSS full, when unCORKed, or ...
‣... after 200 msec
‣unCORKing requires kernel boundary crossing
‣Intended to work with TCP_NODELAY
59. Batching setsockopt()s
TCP_CORK
‣Linux only
‣Only send when MSS full, when unCORKed, or ...
‣... after 200 msec
‣unCORKing requires kernel boundary crossing
‣Intended to work with TCP_NODELAY
TCP_NOPUSH
60. Batching setsockopt()s
TCP_CORK
‣Linux only
‣Only send when MSS full, when unCORKed, or ...
‣... after 200 msec
‣unCORKing requires kernel boundary crossing
‣Intended to work with TCP_NODELAY
TCP_NOPUSH
‣BSD (some) only
61. Batching setsockopt()s
TCP_CORK
‣Linux only
‣Only send when MSS full, when unCORKed, or ...
‣... after 200 msec
‣unCORKing requires kernel boundary crossing
‣Intended to work with TCP_NODELAY
TCP_NOPUSH
‣BSD (some) only
‣Only send when SO_SNDBUF full
62. Batching setsockopt()s
TCP_CORK
‣Linux only
‣Only send when MSS full, when unCORKed, or ...
‣... after 200 msec
‣unCORKing requires kernel boundary crossing
‣Intended to work with TCP_NODELAY
TCP_NOPUSH
‣BSD (some) only
‣Only send when SO_SNDBUF full
‣Mostly broken on Darwin
63. Batching setsockopt()s
TCP_CORK
‣Linux only When to Batch?
‣Only send when MSS full, when unCORKed, or ...
‣... after 200 msec
‣unCORKing requires kernel boundary crossing
‣Intended to work with TCP_NODELAY
TCP_NOPUSH
When to Flush? ‣BSD (some) only
‣Only send when SO_SNDBUF full
‣Mostly broken on Darwin
67. Flush? Batch?
Batch when...
1. Application logic
2. More data is likely to follow
68. Flush? Batch?
Batch when...
1. Application logic
2. More data is likely to follow
3. Unlikely to get data out before next one
69. Flush? Batch?
Batch when... Flush when...
1. Application logic
2. More data is likely to follow
3. Unlikely to get data out before next one
70. Flush? Batch?
Batch when... Flush when...
1. Application logic 1. Application logic
2. More data is likely to follow
3. Unlikely to get data out before next one
71. Flush? Batch?
Batch when... Flush when...
1. Application logic 1. Application logic
2. More data is likely to follow 2. More data is unlikely to follow
3. Unlikely to get data out before next one
72. Flush? Batch?
Batch when... Flush when...
1. Application logic 1. Application logic
2. More data is likely to follow 2. More data is unlikely to follow
3. Unlikely to get data out before next one 3. Timeout (200 msec?)
73. Flush? Batch?
Batch when... Flush when...
1. Application logic 1. Application logic
2. More data is likely to follow 2. More data is unlikely to follow
3. Unlikely to get data out before next one 3. Timeout (200 msec?)
4. Likely to get data out before next one
74. Flush? Batch?
Batch when... Flush when...
1. Application logic 1. Application logic
2. More data is likely to follow 2. More data is unlikely to follow
3. Unlikely to get data out before next one 3. Timeout (200 msec?)
4. Likely to get data out before next one
An Automatic Transmission for Batching
75. Flush? Batch?
Batch when... Flush when...
1. Application logic 1. Application logic
2. More data is likely to follow 2. More data is unlikely to follow
3. Unlikely to get data out before next one 3. Timeout (200 msec?)
4. Likely to get data out before next one
An Automatic Transmission for Batching
1. Always default to flushing
76. Flush? Batch?
Batch when... Flush when...
1. Application logic 1. Application logic
2. More data is likely to follow 2. More data is unlikely to follow
3. Unlikely to get data out before next one 3. Timeout (200 msec?)
4. Likely to get data out before next one
An Automatic Transmission for Batching
1. Always default to flushing
2. Batch when Mean time between sends < Mean time to send (EWMA?)
77. Flush? Batch?
Batch when... Flush when...
1. Application logic 1. Application logic
2. More data is likely to follow 2. More data is unlikely to follow
3. Unlikely to get data out before next one 3. Timeout (200 msec?)
4. Likely to get data out before next one
An Automatic Transmission for Batching
1. Always default to flushing
2. Batch when Mean time between sends < Mean time to send (EWMA?)
3. Flush on timeout as safety measure
78. Flush? Batch?
Batch when... Flush when...
1. Application logic 1. Application logic
2. More data is likely to follow 2. More data is unlikely to follow
3. Unlikely to get data out before next one 3. Timeout (200 msec?)
4. Likely to get data out before next one
An Automatic Transmission for Batching
1. Always default to flushing
2. Batch when Mean time between sends < Mean time to send (EWMA?)
3. Flush on timeout as safety measure
Question: Can you batch too much?
79. Flush? Batch?
Batch when... Flush when...
1. Application logic 1. Application logic
2. More data is likely to follow 2. More data is unlikely to follow
3. Unlikely to get data out before next one 3. Timeout (200 msec?)
4. Likely to get data out before next one
An Automatic Transmission for Batching
1. Always default to flushing
2. Batch when Mean time between sends < Mean time to send (EWMA?)
3. Flush on timeout as safety measure
Question: Can you batch too much? YES!
80. Flush? Batch?
Batch when... Flush when...
1. Application logic 1. Application logic
2. More data is likely to follow 2. More data is unlikely to follow
3. Unlikely to get data out before next one 3. Timeout (200 msec?)
4. Likely to get data out before next one
An Automatic Transmission for Batching
1. Always default to flushing
2. Batch when Mean time between sends < Mean time to send (EWMA?)
3. Flush on timeout as safety measure
Large UDP (fragmentation) +
Question: Can you batch too much? YES! non-trivial loss probability
84. A Batching Architecture
Blocking Sends
MTBS: Mean Time Between Sends
Socket(s)
MTTS: Mean Time To Send (on socket)
“Smart” Batching
Pulls off all waiting data when
possible (automatically batches
when MTBS < MTTS)
85. A Batching Architecture
Blocking Sends
MTBS: Mean Time Between Sends
Socket(s)
MTTS: Mean Time To Send (on socket)
“Smart” Batching
Pulls off all waiting data when
possible (automatically batches
when MTBS < MTTS)
Advantages
86. A Batching Architecture
Blocking Sends
MTBS: Mean Time Between Sends
Socket(s)
MTTS: Mean Time To Send (on socket)
“Smart” Batching
Pulls off all waiting data when
possible (automatically batches
when MTBS < MTTS)
Advantages
‣Non-contended send threads
87. A Batching Architecture
Blocking Sends
MTBS: Mean Time Between Sends
Socket(s)
MTTS: Mean Time To Send (on socket)
“Smart” Batching
Pulls off all waiting data when
possible (automatically batches
when MTBS < MTTS)
Advantages
‣Non-contended send threads
‣Decoupled API and socket sends
88. A Batching Architecture
Blocking Sends
MTBS: Mean Time Between Sends
Socket(s)
MTTS: Mean Time To Send (on socket)
“Smart” Batching
Pulls off all waiting data when
possible (automatically batches
when MTBS < MTTS)
Advantages
‣Non-contended send threads
‣Decoupled API and socket sends
‣Single writer principle for sockets
89. A Batching Architecture
Blocking Sends
MTBS: Mean Time Between Sends
Socket(s)
MTTS: Mean Time To Send (on socket)
“Smart” Batching
Pulls off all waiting data when
possible (automatically batches
when MTBS < MTTS)
Advantages
‣Non-contended send threads
‣Decoupled API and socket sends
‣Single writer principle for sockets
‣Built-in back pressure (bounded ring buffer)
90. A Batching Architecture
Blocking Sends
MTBS: Mean Time Between Sends
Socket(s)
MTTS: Mean Time To Send (on socket)
“Smart” Batching
Pulls off all waiting data when
possible (automatically batches
when MTBS < MTTS)
Advantages
‣Non-contended send threads
‣Decoupled API and socket sends
‣Single writer principle for sockets
‣Built-in back pressure (bounded ring buffer)
‣Easy to add (async) send notification
91. A Batching Architecture
Blocking Sends
MTBS: Mean Time Between Sends
Socket(s)
MTTS: Mean Time To Send (on socket)
“Smart” Batching
Pulls off all waiting data when
possible (automatically batches
when MTBS < MTTS)
Advantages
‣Non-contended send threads
‣Decoupled API and socket sends
‣Single writer principle for sockets
‣Built-in back pressure (bounded ring buffer)
‣Easy to add (async) send notification
‣Easy to add rate limiting
92. A Batching Architecture
Blocking Sends
MTBS: Mean Time Between Sends
Socket(s)
MTTS: Mean Time To Send (on socket)
“Smart” Batching
Pulls off all waiting data when
possible (automatically batches
when MTBS < MTTS)
Advantages
‣Non-contended send threads Can be re-used for other
‣Decoupled API and socket sends batching tasks (like file I/O, DB
‣Single writer principle for sockets writes, and pipeline requests)!
‣Built-in back pressure (bounded ring buffer)
‣Easy to add (async) send notification
‣Easy to add rate limiting
97. Multi-Message Send/Receive
sendmmsg(2)
‣Linux 3.x only
‣Send multiple datagrams with single call
‣Fits nicely with batching architecture
98. Multi-Message Send/Receive
sendmmsg(2)
‣Linux 3.x only
‣Send multiple datagrams with single call
‣Fits nicely with batching architecture
Compliments gather send
(sendmsg, writev) - which you
can do in the same call!
99. Multi-Message Send/Receive
sendmmsg(2) recvmmsg(2)
‣Linux 3.x only
‣Send multiple datagrams with single call
‣Fits nicely with batching architecture
Compliments gather send
(sendmsg, writev) - which you
can do in the same call!
100. Multi-Message Send/Receive
sendmmsg(2) recvmmsg(2)
‣Linux 3.x only ‣Linux 3.x only
‣Send multiple datagrams with single call
‣Fits nicely with batching architecture
Compliments gather send
(sendmsg, writev) - which you
can do in the same call!
101. Multi-Message Send/Receive
sendmmsg(2) recvmmsg(2)
‣Linux 3.x only ‣Linux 3.x only
‣Send multiple datagrams with single call ‣Receive multiple datagrams with single call
‣Fits nicely with batching architecture
Compliments gather send
(sendmsg, writev) - which you
can do in the same call!
102. Multi-Message Send/Receive
sendmmsg(2) recvmmsg(2)
‣Linux 3.x only ‣Linux 3.x only
‣Send multiple datagrams with single call ‣Receive multiple datagrams with single call
‣Fits nicely with batching architecture ‣So, so, sooo SMOKIN’ HOT!
Compliments gather send
(sendmsg, writev) - which you
can do in the same call!
103. Multi-Message Send/Receive
sendmmsg(2) recvmmsg(2)
‣Linux 3.x only ‣Linux 3.x only
‣Send multiple datagrams with single call ‣Receive multiple datagrams with single call
‣Fits nicely with batching architecture ‣So, so, sooo SMOKIN’ HOT!
Compliments gather send
(sendmsg, writev) - which you
can do in the same call!
Advantages
104. Multi-Message Send/Receive
sendmmsg(2) recvmmsg(2)
‣Linux 3.x only ‣Linux 3.x only
‣Send multiple datagrams with single call ‣Receive multiple datagrams with single call
‣Fits nicely with batching architecture ‣So, so, sooo SMOKIN’ HOT!
Compliments gather send
(sendmsg, writev) - which you
can do in the same call!
Advantages
‣Reduced kernel boundary crossings
105. Multi-Message Send/Receive
sendmmsg(2) recvmmsg(2)
‣Linux 3.x only ‣Linux 3.x only
‣Send multiple datagrams with single call ‣Receive multiple datagrams with single call
‣Fits nicely with batching architecture ‣So, so, sooo SMOKIN’ HOT!
Compliments gather send
(sendmsg, writev) - which you
can do in the same call!
Advantages
‣Reduced kernel boundary crossings
106. Multi-Message Send/Receive
sendmmsg(2) recvmmsg(2)
‣Linux 3.x only ‣Linux 3.x only
‣Send multiple datagrams with single call ‣Receive multiple datagrams with single call
‣Fits nicely with batching architecture ‣So, so, sooo SMOKIN’ HOT!
Compliments gather send Scatter recv (recvmsg, readv) is
(sendmsg, writev) - which you usually not worth the trouble
can do in the same call!
Advantages
‣Reduced kernel boundary crossings
120. PGM Router Assist
S
...
R R
Pragmatic General Multicast (PGM), IETF RFC 3208
121. PGM Router Assist
S
... Loss Here! Effects Both
downstream links
R R
Pragmatic General Multicast (PGM), IETF RFC 3208
122. PGM Router Assist
S
No loss here on these
subtrees
... Loss Here! Effects Both
downstream links
R R
Pragmatic General Multicast (PGM), IETF RFC 3208
123. PGM Router Assist
S
No loss here on these
subtrees
... Loss Here! Effects Both
downstream links
R R
NAK Path
NAKs traverse hop-by-hop
back up the forwarding tree,
saving state in each router for
retransmits to follow
Pragmatic General Multicast (PGM), IETF RFC 3208
124. PGM Router Assist
S
No loss here on these
subtrees Retransmit only needs to
traverse link once for both
downstream links
... Loss Here! Effects Both
downstream links
R R
NAK Path
NAKs traverse hop-by-hop
back up the forwarding tree,
saving state in each router for
Retransmit Path
retransmits to follow
Pragmatic General Multicast (PGM), IETF RFC 3208
125. PGM Router Assist
S
No loss here on these
subtrees Retransmit only needs to
traverse link once for both
downstream links
... Loss Here! Effects Both
downstream links
Advantages
R R
NAK Path
NAKs traverse hop-by-hop
back up the forwarding tree,
saving state in each router for
Retransmit Path
retransmits to follow
Pragmatic General Multicast (PGM), IETF RFC 3208
126. PGM Router Assist
S
No loss here on these
subtrees Retransmit only needs to
traverse link once for both
downstream links
... Loss Here! Effects Both
downstream links
Advantages
‣NAKs follow reverse path R R
NAK Path
NAKs traverse hop-by-hop
back up the forwarding tree,
saving state in each router for
Retransmit Path
retransmits to follow
Pragmatic General Multicast (PGM), IETF RFC 3208
127. PGM Router Assist
S
No loss here on these
subtrees Retransmit only needs to
traverse link once for both
downstream links
... Loss Here! Effects Both
downstream links
Advantages
‣NAKs follow reverse path R R
‣Retransmissions localized
NAK Path
NAKs traverse hop-by-hop
back up the forwarding tree,
saving state in each router for
Retransmit Path
retransmits to follow
Pragmatic General Multicast (PGM), IETF RFC 3208
128. PGM Router Assist
S
No loss here on these
subtrees Retransmit only needs to
traverse link once for both
downstream links
... Loss Here! Effects Both
downstream links
Advantages
‣NAKs follow reverse path R R
‣Retransmissions localized
‣Optimization of bandwidth! NAK Path
NAKs traverse hop-by-hop
back up the forwarding tree,
saving state in each router for
Retransmit Path
retransmits to follow
Pragmatic General Multicast (PGM), IETF RFC 3208
129. PGM Router Assist
S
No loss here on these
subtrees Retransmit only needs to
traverse link once for both
downstream links
... Loss Here! Effects Both
downstream links
Advantages
‣NAKs follow reverse path R R
‣Retransmissions localized
‣Optimization of bandwidth! NAK Path
NAKs traverse hop-by-hop
back up the forwarding tree,
saving state in each router for
Retransmit Path
retransmits to follow
Pragmatic General Multicast (PGM), IETF RFC 3208
To get the most out of the hardware at our disposal today takes a deep understanding of the entire software and hardware stack. At no place is this more evident than the current CPU architectures. However, great gains are ripe for the taking when dealing with networks and the TCP/IP stack. In this talk, we will discuss some techniques that may be well known or unknown about how to get the most out of the TCP/IP stack of any modern OS. We'll discuss:\n(1) how application level batching can be leveraged to remarkably avoid common TCP pitfalls,\n(2) how the impact of UDP datagram size influences CPU and network efficiency,\n(3) why the new OS system calls like sendmmsg/recvmmsg are so hot, and\n(4) how easy it is to leverage asynchronous calls for fun and profit.\n\n