SlideShare a Scribd company logo
1 of 49
Download to read offline
Tuning TCP and NGINX on 
EC2
Who are we? 
Chartbeat measures and monetizes attention on the web. Working with 80% of 
the top US news sites and global media sites in 50 countries, Chartbeat brings 
together editors and advertisers to identify in real time the active time an 
audience consumes articles, videos, paid content, and display advertising. 
2 | TCP/NGINX Tuning on EC2
● Founded in 2009 
● Hosted on AWS , 400-500 servers 
depending on time of day 
● Around 180k - 220k req/sec 
● 6 - 9 million concurrents 
chartbeat.com/totaltotal 
2 | TCP/NGINX Tuning on EC2
Who am I? 
● Sr Web Operations Engineer 
● Previously worked at 
○ Bitly 
○ TheStreet.com 
○ Promotions.com 
2 | TCP/NGINX Tuning on EC2
Traffic Characteristics 
Every 15 seconds 
213 byte, request size 
43 byte, response size 
2 | TCP/NGINX Tuning on EC2
Problem 
● Reports of slowness from some customers 
● Taking 3 seconds to send data 
2 | TCP/NGINX Tuning on EC2 
Default Retransmission Timeout 
RFC 1122: Section 4.2.3.1 
The following values SHOULD be used to initialize the 
estimation parameters for a new connection: 
(a) RTT = 0 seconds. 
(b) RTO = 3 seconds. (The smoothed variance is to be 
initialized to the value that will result in this RTO).
2 | TCP/NGINX Tuning on EC2 
flickr: wallyg
2 | TCP/NGINX Tuning on EC2 
flickr: oregondot
Now what? 
TCPDump + Wireshark confirms retransmissions 
2 | TCP/NGINX Tuning on EC2
DON’T GRAPH ALL THE THINGS 
● Graph only relevant metrics 
○ you’ll end up with a ton of red herrings 
2 | TCP/NGINX Tuning on EC2
Sources of info 
● ss -s 
○ summary of socket statistics 
TCP: 10678 (estab 2503, closed 8167, orphaned 0, synrecv 0, timewait 8167/0), 
ports 0 
● netstat -s 
"tcp_active_connections_openings", 
"tcp_connections_aborted_due_to_timeout", 
"tcp_data_loss_events", 
"tcp_failed_connection_attempts", 
"tcp_other_tcp_timeouts", 
"tcp_passive_connection_openings", 
"tcp_segments_retransmited", 
"tcp_segments_send_out", 
"tcp_syns_to_listen_sockets_dropped", 
"tcp_times_the_listen_queue_of_a_socket_overflowed", 
● 
2 | TCP/NGINX Tuning on EC2
2 | TCP/NGINX Tuning on EC2 
TCP/IP 
Illustrated 
Volume 1 
Second Ed.
2 | TCP/NGINX Tuning on EC2 
Logster + Graphite 
https://github.com/etsy/logster 
Tails logs, generates metrics and 
outputs to Graphite or Ganglia
FINDINGS 
2 | TCP/NGINX Tuning on EC2
Sources of info 
● netstat -s 
"tcp_active_connections_openings", 
"tcp_connections_aborted_due_to_timeout", 
"tcp_data_loss_events", 
"tcp_failed_connection_attempts", 
"tcp_other_tcp_timeouts", 
"tcp_passive_connection_openings", 
"tcp_segments_retransmited", 
"tcp_segments_send_out", 
"tcp_syns_to_listen_sockets_dropped", 
"tcp_times_the_listen_queue_of_a_socket_overflowed", 
2 | TCP/NGINX Tuning on EC2 
Values > 1, can’t be 
good 
Confirmed what we 
suspected 
WHUT
2 | TCP/NGINX Tuning on EC2
net.ipv4.tcp_max_syn_backlog 
Systems Performance 
Enterprise and the Cloud by Brendan Gregg, pg 492 
2 | TCP/NGINX Tuning on EC2 
net.core.somaxconn 
Nginx: listen backlog=####
Insane Defaults 
● net.core.netdev_max_backlog = 1000 
○ Per CPU backlog? 
○ Network Frames 
● net.ipv4.tcp_max_syn_backlog = 128 
● net.core.somaxconn = 128 
● nginx listen backlog = 511 ?!? 
○ Silently truncated to somaxconn value 
2 | TCP/NGINX Tuning on EC2
New Values 
● net.core.netdev_max_backlog = 16384 
● net.ipv4.tcp_max_syn_backlog = 65536 
● net.core.somaxconn = 16384 
● nginx listen backlog = 16384 
○ should be <= somaxconn 
2 | TCP/NGINX Tuning on EC2
2 | TCP/NGINX Tuning on EC2 
Results
Further settings explored 
net.ipv4.tcp_slow_start_after_idle 
net.ipv4.tcp_max_tw_buckets 
net.ipv4.tcp_rmem/wrem 
net.ipv4.tcp_fin_timeout 
net.ipv4.tcp_mem 
2 | TCP/NGINX Tuning on EC2
net.ipv4.tcp_slow_start_after_idle 
Set to 0 to ensure connections don’t go back to 
default window size after being idle too long. 
Example: HTTP KeepAlive 
2 | TCP/NGINX Tuning on EC2
net.ipv4.tcp_max_tw_buckets 
Max number of sockets in TIME_WAIT. We 
actually set this very high, since before we 
moved instances behind an ELB it was normal 
to have 200k+ sockets in TIME_WAIT state. 
Exceeding this leads to sockets being torn 
down until under limit 
2 | TCP/NGINX Tuning on EC2
net.ipv4.tcp_rmem/wrem 
Format: min default max ( in bytes) 
The kernel will autotune the number of bytes to 
use for each socket based on these settings. It 
will start at default and work between the 
min and max 
2 | TCP/NGINX Tuning on EC2
net.ipv4.tcp_fin_timeout 
The time a connection should spend in 
FIN_WAIT_2 state. Default is 60 seconds, 
lowering this will free memory more quickly and 
transition the socket to TIME_WAIT. 
This will NOT reduce the time a socket is in 
TIME_WAIT which is set to 2 * MSL (max 
segment lifetime) 
2 | TCP/NGINX Tuning on EC2
net.ipv4.tcp_fin_timeout continued... 
MSL is hardcoded in the kernel at 60 seconds! 
https://github. 
com/torvalds/linux/blob/master/include/net/tcp. 
h#L115 
#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy 
TIME-WAIT * state, about 60 seconds */ 
2 | TCP/NGINX Tuning on EC2
net.ipv4.tcp_mem 
Format: low pressure max (in pages!) 
Below low, Kernel won’t put pressure on 
sockets to reduce mem usage. Once pressure 
hits, sockets reduce memory until low is hit. If 
max hit, no new sockets. 
2 | TCP/NGINX Tuning on EC2
2 | TCP/NGINX Tuning on EC2
2 | TCP/NGINX Tuning on EC2
net.ipv4.tcp_tw_recycle (DANGEROUS) 
● Clients behind NAT/Stateful FW will get 
dropped 
● *99.99999999% of time should never be 
enabled 
* Probably 100% but there may be a valid case out there 
2 | TCP/NGINX Tuning on EC2
net.ipv4.tcp_tw_reuse 
● Makes a safer attempt at freeing sockets in 
TIME_WAIT state. 
2 | TCP/NGINX Tuning on EC2
Recycle vs Reuse Deep Dive 
http://bit.ly/tcp-time-wait 
2 | TCP/NGINX Tuning on EC2
One last thing… 
TCP Congestion Window - initcwnd (initial) 
2 | TCP/NGINX Tuning on EC2 
Starting in Kernel 2.6.39 , set to 10 
Previous default was 3! 
http://research.google.com/pubs/pub36640.html 
Older Kernel? 
$ ip route change default via 192.168.1.1 dev eth0 proto static initcwnd 10
2 | TCP/NGINX Tuning on EC2 
NGINX
listen statement 
● backlog 
○ limited by net.core.somaxconn 
● defer 
○ TCP_DEFER_ACCEPT - Wait till we receive data 
packet before passing socket to server. Completing 
TCP Handshake won’t trigger an accept() 
2 | TCP/NGINX Tuning on EC2
server block 
● sendfile 
○ Saves context switching from userspace on 
read/write. 
○ “zero copy” , happens in kernel space 
● tcp_nopush 
○ TCP_CORK 
○ allows application to control building of packet, e.g 
pack a packet with full HTTP response 
● tcp_nodelay 
○ Nagle’s Algorithm 
○ Only affects keep-alive connections 
● multi_accept 
○ Accept all connections on listen queue at once 
(careful, can overwhelm workers) 
2 | TCP/NGINX Tuning on EC2
Nagle’s Algorithm (tcp_nopush) 
Small payload + need for low latency? 
Disable 
2 | TCP/NGINX Tuning on EC2
HTTP Keep-Alive 
● Enabled once behind ELB 
● Given small payload and 15 seconds between 
data, waste of resources for us to enable 
exposed directly to net 
2 | TCP/NGINX Tuning on EC2
HTTP Keep-Alive Cont.. 
● Also enable on upstream proxies 
○ Available since 1.1.4 
○ *cough* had to upgrade Nginx and Fix memory leak 
dealing with libevent and keepalives before we could 
get this fully setup 
2 | TCP/NGINX Tuning on EC2
2 | TCP/NGINX Tuning on EC2 
ELB
Cross-Zone load balancing 
Ensures requests to 
each ELB in each AZ 
go to ALL instances 
in ALL AZs 
2 | TCP/NGINX Tuning on EC2
Idle Connection Timeout 
● Defaults to 60 seconds 
● Finally tunable via API. 
● Tweak if doing anything long lived , e.g. 
Websockets, or ensure you are sending 
“pings” 
2 | TCP/NGINX Tuning on EC2
Connection draining 
“Graceful” removal of node from ELB, will 
ensure existing connections can finish instead 
of a hard cutoff (old behavior) 
2 | TCP/NGINX Tuning on EC2
Metrics to monitor 
● SurgeQueueLength (Not Good) 
A count of the total number of requests that are 
pending submission to a registered instance. 
● SpilloverCount (BAD) 
A count of the total number of requests that 
were rejected due to the queue being full. 
2 | TCP/NGINX Tuning on EC2
Conclusions 
● The internet is full of lies 
● With enough traffic, tweaking system and 
application defaults are a necessary 
● Find trusted sources (Me? Maybe?) for 
settings and test in staging environments 
● Measure impact and understand what metrics 
may be impacted by your tweaks 
● Don’t get lost in all the sysctl settings 
● TCP is complicated 
2 | TCP/NGINX Tuning on EC2
2 | TCP/NGINX Tuning on EC2 
FIN 
FIN_WAIT_1 
FIN_WAIT_2 
TIME_WAIT
Resources and References 
https://www.kernel. 
org/doc/Documentation/networking/ip-sysctl.txt 
2 | TCP/NGINX Tuning on EC2 
man tcp(7)
Additional reading 
http://engineering.chartbeat.com 
Full story about experiences with our 
architecture and material discussed in 
slides 
2 | TCP/NGINX Tuning on EC2
Questions / Comments? 
2 | TCP/NGINX Tuning on EC2 
@Lintzston 
justin@chartbeat.com

More Related Content

What's hot

Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Hadoop -NameNode HAの仕組み-
Hadoop -NameNode HAの仕組み-Hadoop -NameNode HAの仕組み-
Hadoop -NameNode HAの仕組み-Yuki Gonda
 
PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)
PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)
PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)NTT DATA Technology & Innovation
 
これからLDAPを始めるなら 「389-ds」を使ってみよう
これからLDAPを始めるなら 「389-ds」を使ってみようこれからLDAPを始めるなら 「389-ds」を使ってみよう
これからLDAPを始めるなら 「389-ds」を使ってみようNobuyuki Sasaki
 
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)NTT DATA Technology & Innovation
 
アーキテクチャから理解するPostgreSQLのレプリケーション
アーキテクチャから理解するPostgreSQLのレプリケーションアーキテクチャから理解するPostgreSQLのレプリケーション
アーキテクチャから理解するPostgreSQLのレプリケーションMasahiko Sawada
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
地理分散DBについて
地理分散DBについて地理分散DBについて
地理分散DBについてKumazaki Hiroki
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Pacemaker 操作方法メモ
Pacemaker 操作方法メモPacemaker 操作方法メモ
Pacemaker 操作方法メモMasayuki Ozawa
 
祝!PostgreSQLレプリケーション10周年!徹底紹介!!
祝!PostgreSQLレプリケーション10周年!徹底紹介!!祝!PostgreSQLレプリケーション10周年!徹底紹介!!
祝!PostgreSQLレプリケーション10周年!徹底紹介!!NTT DATA Technology & Innovation
 
Kubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャー
Kubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャーKubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャー
Kubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャーToru Makabe
 
PostgreSQL - C言語によるユーザ定義関数の作り方
PostgreSQL - C言語によるユーザ定義関数の作り方PostgreSQL - C言語によるユーザ定義関数の作り方
PostgreSQL - C言語によるユーザ定義関数の作り方Satoshi Nagayasu
 
[Cloud OnAir] Bigtable に迫る!基本機能も含めユースケースまで丸ごと紹介 2018年8月30日 放送
[Cloud OnAir] Bigtable に迫る!基本機能も含めユースケースまで丸ごと紹介 2018年8月30日 放送[Cloud OnAir] Bigtable に迫る!基本機能も含めユースケースまで丸ごと紹介 2018年8月30日 放送
[Cloud OnAir] Bigtable に迫る!基本機能も含めユースケースまで丸ごと紹介 2018年8月30日 放送Google Cloud Platform - Japan
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Yoshiyasu SAEKI
 
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)NTT DATA Technology & Innovation
 
OpenStackでも重要な役割を果たすPacemakerを知ろう!
OpenStackでも重要な役割を果たすPacemakerを知ろう!OpenStackでも重要な役割を果たすPacemakerを知ろう!
OpenStackでも重要な役割を果たすPacemakerを知ろう!ksk_ha
 

What's hot (20)

Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Hadoop -NameNode HAの仕組み-
Hadoop -NameNode HAの仕組み-Hadoop -NameNode HAの仕組み-
Hadoop -NameNode HAの仕組み-
 
PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)
PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)
PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)
 
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajpAt least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
 
これからLDAPを始めるなら 「389-ds」を使ってみよう
これからLDAPを始めるなら 「389-ds」を使ってみようこれからLDAPを始めるなら 「389-ds」を使ってみよう
これからLDAPを始めるなら 「389-ds」を使ってみよう
 
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)
 
アーキテクチャから理解するPostgreSQLのレプリケーション
アーキテクチャから理解するPostgreSQLのレプリケーションアーキテクチャから理解するPostgreSQLのレプリケーション
アーキテクチャから理解するPostgreSQLのレプリケーション
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
地理分散DBについて
地理分散DBについて地理分散DBについて
地理分散DBについて
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Pacemaker 操作方法メモ
Pacemaker 操作方法メモPacemaker 操作方法メモ
Pacemaker 操作方法メモ
 
Hadoop入門
Hadoop入門Hadoop入門
Hadoop入門
 
祝!PostgreSQLレプリケーション10周年!徹底紹介!!
祝!PostgreSQLレプリケーション10周年!徹底紹介!!祝!PostgreSQLレプリケーション10周年!徹底紹介!!
祝!PostgreSQLレプリケーション10周年!徹底紹介!!
 
Kubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャー
Kubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャーKubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャー
Kubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャー
 
PostgreSQL - C言語によるユーザ定義関数の作り方
PostgreSQL - C言語によるユーザ定義関数の作り方PostgreSQL - C言語によるユーザ定義関数の作り方
PostgreSQL - C言語によるユーザ定義関数の作り方
 
[Cloud OnAir] Bigtable に迫る!基本機能も含めユースケースまで丸ごと紹介 2018年8月30日 放送
[Cloud OnAir] Bigtable に迫る!基本機能も含めユースケースまで丸ごと紹介 2018年8月30日 放送[Cloud OnAir] Bigtable に迫る!基本機能も含めユースケースまで丸ごと紹介 2018年8月30日 放送
[Cloud OnAir] Bigtable に迫る!基本機能も含めユースケースまで丸ごと紹介 2018年8月30日 放送
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
 
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
 
OpenStackでも重要な役割を果たすPacemakerを知ろう!
OpenStackでも重要な役割を果たすPacemakerを知ろう!OpenStackでも重要な役割を果たすPacemakerを知ろう!
OpenStackでも重要な役割を果たすPacemakerを知ろう!
 

Viewers also liked

Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 InstancesBrendan Gregg
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and TuningNGINX, Inc.
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
 
Lcu14 Lightning Talk- NGINX
Lcu14 Lightning Talk- NGINXLcu14 Lightning Talk- NGINX
Lcu14 Lightning Talk- NGINXLinaro
 
Learn nginx in 90mins
Learn nginx in 90minsLearn nginx in 90mins
Learn nginx in 90minsLarry Cai
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking WalkthroughThomas Graf
 
How to secure your web applications with NGINX
How to secure your web applications with NGINXHow to secure your web applications with NGINX
How to secure your web applications with NGINXWallarm
 
The 3 Models in the NGINX Microservices Reference Architecture
The 3 Models in the NGINX Microservices Reference ArchitectureThe 3 Models in the NGINX Microservices Reference Architecture
The 3 Models in the NGINX Microservices Reference ArchitectureNGINX, Inc.
 
AWS Black Belt Techシリーズ 2015 Amazon Elastic Block Store (EBS)
AWS Black Belt Techシリーズ 2015 Amazon Elastic Block Store (EBS)AWS Black Belt Techシリーズ 2015 Amazon Elastic Block Store (EBS)
AWS Black Belt Techシリーズ 2015 Amazon Elastic Block Store (EBS)Amazon Web Services Japan
 
Nginx Internals
Nginx InternalsNginx Internals
Nginx InternalsJoshua Zhu
 
(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent ...
(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent ...(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent ...
(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent ...Amazon Web Services
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Marian Marinov, 1H Ltd.
Marian Marinov, 1H Ltd.Marian Marinov, 1H Ltd.
Marian Marinov, 1H Ltd.Ontico
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toDatadog
 
Nginx monitoring with graphite
Nginx monitoring with graphiteNginx monitoring with graphite
Nginx monitoring with graphitedamaex17
 
Naxsi, an open source WAF for Nginx
Naxsi, an open source WAF  for NginxNaxsi, an open source WAF  for Nginx
Naxsi, an open source WAF for NginxPositive Hack Days
 
Devops training in Hyderabad
Devops training in HyderabadDevops training in Hyderabad
Devops training in HyderabadDevops Trainer
 
Fluentd and docker monitoring
Fluentd and docker monitoringFluentd and docker monitoring
Fluentd and docker monitoringVinay Krishna
 
How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...Jos Boumans
 

Viewers also liked (20)

Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and Tuning
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
Lcu14 Lightning Talk- NGINX
Lcu14 Lightning Talk- NGINXLcu14 Lightning Talk- NGINX
Lcu14 Lightning Talk- NGINX
 
Learn nginx in 90mins
Learn nginx in 90minsLearn nginx in 90mins
Learn nginx in 90mins
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
 
How to secure your web applications with NGINX
How to secure your web applications with NGINXHow to secure your web applications with NGINX
How to secure your web applications with NGINX
 
The 3 Models in the NGINX Microservices Reference Architecture
The 3 Models in the NGINX Microservices Reference ArchitectureThe 3 Models in the NGINX Microservices Reference Architecture
The 3 Models in the NGINX Microservices Reference Architecture
 
AWS Black Belt Techシリーズ 2015 Amazon Elastic Block Store (EBS)
AWS Black Belt Techシリーズ 2015 Amazon Elastic Block Store (EBS)AWS Black Belt Techシリーズ 2015 Amazon Elastic Block Store (EBS)
AWS Black Belt Techシリーズ 2015 Amazon Elastic Block Store (EBS)
 
Nginx Internals
Nginx InternalsNginx Internals
Nginx Internals
 
(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent ...
(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent ...(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent ...
(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent ...
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Marian Marinov, 1H Ltd.
Marian Marinov, 1H Ltd.Marian Marinov, 1H Ltd.
Marian Marinov, 1H Ltd.
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-to
 
Nginx monitoring with graphite
Nginx monitoring with graphiteNginx monitoring with graphite
Nginx monitoring with graphite
 
Naxsi, an open source WAF for Nginx
Naxsi, an open source WAF  for NginxNaxsi, an open source WAF  for Nginx
Naxsi, an open source WAF for Nginx
 
Devops training in Hyderabad
Devops training in HyderabadDevops training in Hyderabad
Devops training in Hyderabad
 
Fluentd and docker monitoring
Fluentd and docker monitoringFluentd and docker monitoring
Fluentd and docker monitoring
 
Tuning 17 march
Tuning 17 marchTuning 17 march
Tuning 17 march
 
How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...
 

Similar to TCP and NGINX Tuning on EC2

Reconsider TCPdump for Modern Troubleshooting
Reconsider TCPdump for Modern TroubleshootingReconsider TCPdump for Modern Troubleshooting
Reconsider TCPdump for Modern TroubleshootingAvi Networks
 
Abandon Decades-Old TCPdump for Modern Troubleshooting
Abandon Decades-Old TCPdump for Modern TroubleshootingAbandon Decades-Old TCPdump for Modern Troubleshooting
Abandon Decades-Old TCPdump for Modern TroubleshootingAvi Networks
 
Real-time in the real world: DIRT in production
Real-time in the real world: DIRT in productionReal-time in the real world: DIRT in production
Real-time in the real world: DIRT in productionbcantrill
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterAnne Nicolas
 
Make container without_docker_7
Make container without_docker_7Make container without_docker_7
Make container without_docker_7Sam Kim
 
Programming TCP for responsiveness
Programming TCP for responsivenessProgramming TCP for responsiveness
Programming TCP for responsivenessKazuho Oku
 
IRJET- Modeling a New Startup Algorithm for TCP New Reno
IRJET- Modeling a New Startup Algorithm for TCP New RenoIRJET- Modeling a New Startup Algorithm for TCP New Reno
IRJET- Modeling a New Startup Algorithm for TCP New RenoIRJET Journal
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...Chester Chen
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchYutaka Yasuda
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
 
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecasesLF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecasesLF_OpenvSwitch
 
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
ECS19 - Ingo Gegenwarth -  Running Exchangein large environmentECS19 - Ingo Gegenwarth -  Running Exchangein large environment
ECS19 - Ingo Gegenwarth - Running Exchange in large environmentEuropean Collaboration Summit
 
Linux HTTPS/TCP/IP Stack for the Fast and Secure Web
Linux HTTPS/TCP/IP Stack for the Fast and Secure WebLinux HTTPS/TCP/IP Stack for the Fast and Secure Web
Linux HTTPS/TCP/IP Stack for the Fast and Secure WebAll Things Open
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 
High perf-networking
High perf-networkingHigh perf-networking
High perf-networkingmtimjones
 
TCP and Mobile Networks Turbulent Relationship
TCP and Mobile Networks Turbulent RelationshipTCP and Mobile Networks Turbulent Relationship
TCP and Mobile Networks Turbulent RelationshipNatasha Rooney
 
Programming TCP for responsiveness
Programming TCP for responsivenessProgramming TCP for responsiveness
Programming TCP for responsivenessKazuho Oku
 

Similar to TCP and NGINX Tuning on EC2 (20)

Reconsider TCPdump for Modern Troubleshooting
Reconsider TCPdump for Modern TroubleshootingReconsider TCPdump for Modern Troubleshooting
Reconsider TCPdump for Modern Troubleshooting
 
Abandon Decades-Old TCPdump for Modern Troubleshooting
Abandon Decades-Old TCPdump for Modern TroubleshootingAbandon Decades-Old TCPdump for Modern Troubleshooting
Abandon Decades-Old TCPdump for Modern Troubleshooting
 
Real-time in the real world: DIRT in production
Real-time in the real world: DIRT in productionReal-time in the real world: DIRT in production
Real-time in the real world: DIRT in production
 
Transaction TCP
Transaction TCPTransaction TCP
Transaction TCP
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
 
Make container without_docker_7
Make container without_docker_7Make container without_docker_7
Make container without_docker_7
 
Programming TCP for responsiveness
Programming TCP for responsivenessProgramming TCP for responsiveness
Programming TCP for responsiveness
 
IRJET- Modeling a New Startup Algorithm for TCP New Reno
IRJET- Modeling a New Startup Algorithm for TCP New RenoIRJET- Modeling a New Startup Algorithm for TCP New Reno
IRJET- Modeling a New Startup Algorithm for TCP New Reno
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow Switch
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecasesLF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
 
XS Boston 2008 Network Topology
XS Boston 2008 Network TopologyXS Boston 2008 Network Topology
XS Boston 2008 Network Topology
 
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
ECS19 - Ingo Gegenwarth -  Running Exchangein large environmentECS19 - Ingo Gegenwarth -  Running Exchangein large environment
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
 
Linux HTTPS/TCP/IP Stack for the Fast and Secure Web
Linux HTTPS/TCP/IP Stack for the Fast and Secure WebLinux HTTPS/TCP/IP Stack for the Fast and Secure Web
Linux HTTPS/TCP/IP Stack for the Fast and Secure Web
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
High perf-networking
High perf-networkingHigh perf-networking
High perf-networking
 
TCP and Mobile Networks Turbulent Relationship
TCP and Mobile Networks Turbulent RelationshipTCP and Mobile Networks Turbulent Relationship
TCP and Mobile Networks Turbulent Relationship
 
Programming TCP for responsiveness
Programming TCP for responsivenessProgramming TCP for responsiveness
Programming TCP for responsiveness
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 

More from Chartbeat

Chartbeat: Discovery & Engagement in a Mobile-first World
Chartbeat: Discovery & Engagement in a Mobile-first WorldChartbeat: Discovery & Engagement in a Mobile-first World
Chartbeat: Discovery & Engagement in a Mobile-first WorldChartbeat
 
Audience Building in the Age of Platforms
Audience Building in the Age of PlatformsAudience Building in the Age of Platforms
Audience Building in the Age of PlatformsChartbeat
 
Data in the Newsroom: Content Solutions for Advertising Challenges
Data in the Newsroom: Content Solutions for Advertising ChallengesData in the Newsroom: Content Solutions for Advertising Challenges
Data in the Newsroom: Content Solutions for Advertising ChallengesChartbeat
 
ONA 2015 – Can You Have Their Attention Please?
ONA 2015 – Can You Have Their Attention Please?ONA 2015 – Can You Have Their Attention Please?
ONA 2015 – Can You Have Their Attention Please?Chartbeat
 
Insights from Around the World
Insights from Around the WorldInsights from Around the World
Insights from Around the WorldChartbeat
 
A Data State of the Union: Can We Make Quality Pay Online?
A Data State of the Union: Can We Make Quality Pay Online?A Data State of the Union: Can We Make Quality Pay Online?
A Data State of the Union: Can We Make Quality Pay Online?Chartbeat
 
Understanding Your Traffic Sources
Understanding Your Traffic Sources Understanding Your Traffic Sources
Understanding Your Traffic Sources Chartbeat
 
An Introduction to Video Analytics
An Introduction to Video Analytics An Introduction to Video Analytics
An Introduction to Video Analytics Chartbeat
 
Top Trends in Online Journalism for 2014
Top Trends in Online Journalism for 2014Top Trends in Online Journalism for 2014
Top Trends in Online Journalism for 2014Chartbeat
 
Say Hello to the New Chartbeat Publishing
Say Hello to the New Chartbeat Publishing Say Hello to the New Chartbeat Publishing
Say Hello to the New Chartbeat Publishing Chartbeat
 
Building a Loyal and Returning Audience with the New Chartbeat Publishing
Building a Loyal and Returning Audience with the New Chartbeat PublishingBuilding a Loyal and Returning Audience with the New Chartbeat Publishing
Building a Loyal and Returning Audience with the New Chartbeat PublishingChartbeat
 
Tom Germeau: Mobile Apps: Finding the balance between performance and flexibi...
Tom Germeau: Mobile Apps: Finding the balance between performance and flexibi...Tom Germeau: Mobile Apps: Finding the balance between performance and flexibi...
Tom Germeau: Mobile Apps: Finding the balance between performance and flexibi...Chartbeat
 

More from Chartbeat (12)

Chartbeat: Discovery & Engagement in a Mobile-first World
Chartbeat: Discovery & Engagement in a Mobile-first WorldChartbeat: Discovery & Engagement in a Mobile-first World
Chartbeat: Discovery & Engagement in a Mobile-first World
 
Audience Building in the Age of Platforms
Audience Building in the Age of PlatformsAudience Building in the Age of Platforms
Audience Building in the Age of Platforms
 
Data in the Newsroom: Content Solutions for Advertising Challenges
Data in the Newsroom: Content Solutions for Advertising ChallengesData in the Newsroom: Content Solutions for Advertising Challenges
Data in the Newsroom: Content Solutions for Advertising Challenges
 
ONA 2015 – Can You Have Their Attention Please?
ONA 2015 – Can You Have Their Attention Please?ONA 2015 – Can You Have Their Attention Please?
ONA 2015 – Can You Have Their Attention Please?
 
Insights from Around the World
Insights from Around the WorldInsights from Around the World
Insights from Around the World
 
A Data State of the Union: Can We Make Quality Pay Online?
A Data State of the Union: Can We Make Quality Pay Online?A Data State of the Union: Can We Make Quality Pay Online?
A Data State of the Union: Can We Make Quality Pay Online?
 
Understanding Your Traffic Sources
Understanding Your Traffic Sources Understanding Your Traffic Sources
Understanding Your Traffic Sources
 
An Introduction to Video Analytics
An Introduction to Video Analytics An Introduction to Video Analytics
An Introduction to Video Analytics
 
Top Trends in Online Journalism for 2014
Top Trends in Online Journalism for 2014Top Trends in Online Journalism for 2014
Top Trends in Online Journalism for 2014
 
Say Hello to the New Chartbeat Publishing
Say Hello to the New Chartbeat Publishing Say Hello to the New Chartbeat Publishing
Say Hello to the New Chartbeat Publishing
 
Building a Loyal and Returning Audience with the New Chartbeat Publishing
Building a Loyal and Returning Audience with the New Chartbeat PublishingBuilding a Loyal and Returning Audience with the New Chartbeat Publishing
Building a Loyal and Returning Audience with the New Chartbeat Publishing
 
Tom Germeau: Mobile Apps: Finding the balance between performance and flexibi...
Tom Germeau: Mobile Apps: Finding the balance between performance and flexibi...Tom Germeau: Mobile Apps: Finding the balance between performance and flexibi...
Tom Germeau: Mobile Apps: Finding the balance between performance and flexibi...
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

TCP and NGINX Tuning on EC2

  • 1. Tuning TCP and NGINX on EC2
  • 2. Who are we? Chartbeat measures and monetizes attention on the web. Working with 80% of the top US news sites and global media sites in 50 countries, Chartbeat brings together editors and advertisers to identify in real time the active time an audience consumes articles, videos, paid content, and display advertising. 2 | TCP/NGINX Tuning on EC2
  • 3. ● Founded in 2009 ● Hosted on AWS , 400-500 servers depending on time of day ● Around 180k - 220k req/sec ● 6 - 9 million concurrents chartbeat.com/totaltotal 2 | TCP/NGINX Tuning on EC2
  • 4. Who am I? ● Sr Web Operations Engineer ● Previously worked at ○ Bitly ○ TheStreet.com ○ Promotions.com 2 | TCP/NGINX Tuning on EC2
  • 5. Traffic Characteristics Every 15 seconds 213 byte, request size 43 byte, response size 2 | TCP/NGINX Tuning on EC2
  • 6. Problem ● Reports of slowness from some customers ● Taking 3 seconds to send data 2 | TCP/NGINX Tuning on EC2 Default Retransmission Timeout RFC 1122: Section 4.2.3.1 The following values SHOULD be used to initialize the estimation parameters for a new connection: (a) RTT = 0 seconds. (b) RTO = 3 seconds. (The smoothed variance is to be initialized to the value that will result in this RTO).
  • 7. 2 | TCP/NGINX Tuning on EC2 flickr: wallyg
  • 8. 2 | TCP/NGINX Tuning on EC2 flickr: oregondot
  • 9. Now what? TCPDump + Wireshark confirms retransmissions 2 | TCP/NGINX Tuning on EC2
  • 10. DON’T GRAPH ALL THE THINGS ● Graph only relevant metrics ○ you’ll end up with a ton of red herrings 2 | TCP/NGINX Tuning on EC2
  • 11. Sources of info ● ss -s ○ summary of socket statistics TCP: 10678 (estab 2503, closed 8167, orphaned 0, synrecv 0, timewait 8167/0), ports 0 ● netstat -s "tcp_active_connections_openings", "tcp_connections_aborted_due_to_timeout", "tcp_data_loss_events", "tcp_failed_connection_attempts", "tcp_other_tcp_timeouts", "tcp_passive_connection_openings", "tcp_segments_retransmited", "tcp_segments_send_out", "tcp_syns_to_listen_sockets_dropped", "tcp_times_the_listen_queue_of_a_socket_overflowed", ● 2 | TCP/NGINX Tuning on EC2
  • 12. 2 | TCP/NGINX Tuning on EC2 TCP/IP Illustrated Volume 1 Second Ed.
  • 13. 2 | TCP/NGINX Tuning on EC2 Logster + Graphite https://github.com/etsy/logster Tails logs, generates metrics and outputs to Graphite or Ganglia
  • 14. FINDINGS 2 | TCP/NGINX Tuning on EC2
  • 15. Sources of info ● netstat -s "tcp_active_connections_openings", "tcp_connections_aborted_due_to_timeout", "tcp_data_loss_events", "tcp_failed_connection_attempts", "tcp_other_tcp_timeouts", "tcp_passive_connection_openings", "tcp_segments_retransmited", "tcp_segments_send_out", "tcp_syns_to_listen_sockets_dropped", "tcp_times_the_listen_queue_of_a_socket_overflowed", 2 | TCP/NGINX Tuning on EC2 Values > 1, can’t be good Confirmed what we suspected WHUT
  • 16. 2 | TCP/NGINX Tuning on EC2
  • 17. net.ipv4.tcp_max_syn_backlog Systems Performance Enterprise and the Cloud by Brendan Gregg, pg 492 2 | TCP/NGINX Tuning on EC2 net.core.somaxconn Nginx: listen backlog=####
  • 18. Insane Defaults ● net.core.netdev_max_backlog = 1000 ○ Per CPU backlog? ○ Network Frames ● net.ipv4.tcp_max_syn_backlog = 128 ● net.core.somaxconn = 128 ● nginx listen backlog = 511 ?!? ○ Silently truncated to somaxconn value 2 | TCP/NGINX Tuning on EC2
  • 19. New Values ● net.core.netdev_max_backlog = 16384 ● net.ipv4.tcp_max_syn_backlog = 65536 ● net.core.somaxconn = 16384 ● nginx listen backlog = 16384 ○ should be <= somaxconn 2 | TCP/NGINX Tuning on EC2
  • 20. 2 | TCP/NGINX Tuning on EC2 Results
  • 21. Further settings explored net.ipv4.tcp_slow_start_after_idle net.ipv4.tcp_max_tw_buckets net.ipv4.tcp_rmem/wrem net.ipv4.tcp_fin_timeout net.ipv4.tcp_mem 2 | TCP/NGINX Tuning on EC2
  • 22. net.ipv4.tcp_slow_start_after_idle Set to 0 to ensure connections don’t go back to default window size after being idle too long. Example: HTTP KeepAlive 2 | TCP/NGINX Tuning on EC2
  • 23. net.ipv4.tcp_max_tw_buckets Max number of sockets in TIME_WAIT. We actually set this very high, since before we moved instances behind an ELB it was normal to have 200k+ sockets in TIME_WAIT state. Exceeding this leads to sockets being torn down until under limit 2 | TCP/NGINX Tuning on EC2
  • 24. net.ipv4.tcp_rmem/wrem Format: min default max ( in bytes) The kernel will autotune the number of bytes to use for each socket based on these settings. It will start at default and work between the min and max 2 | TCP/NGINX Tuning on EC2
  • 25. net.ipv4.tcp_fin_timeout The time a connection should spend in FIN_WAIT_2 state. Default is 60 seconds, lowering this will free memory more quickly and transition the socket to TIME_WAIT. This will NOT reduce the time a socket is in TIME_WAIT which is set to 2 * MSL (max segment lifetime) 2 | TCP/NGINX Tuning on EC2
  • 26. net.ipv4.tcp_fin_timeout continued... MSL is hardcoded in the kernel at 60 seconds! https://github. com/torvalds/linux/blob/master/include/net/tcp. h#L115 #define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT * state, about 60 seconds */ 2 | TCP/NGINX Tuning on EC2
  • 27. net.ipv4.tcp_mem Format: low pressure max (in pages!) Below low, Kernel won’t put pressure on sockets to reduce mem usage. Once pressure hits, sockets reduce memory until low is hit. If max hit, no new sockets. 2 | TCP/NGINX Tuning on EC2
  • 28. 2 | TCP/NGINX Tuning on EC2
  • 29. 2 | TCP/NGINX Tuning on EC2
  • 30. net.ipv4.tcp_tw_recycle (DANGEROUS) ● Clients behind NAT/Stateful FW will get dropped ● *99.99999999% of time should never be enabled * Probably 100% but there may be a valid case out there 2 | TCP/NGINX Tuning on EC2
  • 31. net.ipv4.tcp_tw_reuse ● Makes a safer attempt at freeing sockets in TIME_WAIT state. 2 | TCP/NGINX Tuning on EC2
  • 32. Recycle vs Reuse Deep Dive http://bit.ly/tcp-time-wait 2 | TCP/NGINX Tuning on EC2
  • 33. One last thing… TCP Congestion Window - initcwnd (initial) 2 | TCP/NGINX Tuning on EC2 Starting in Kernel 2.6.39 , set to 10 Previous default was 3! http://research.google.com/pubs/pub36640.html Older Kernel? $ ip route change default via 192.168.1.1 dev eth0 proto static initcwnd 10
  • 34. 2 | TCP/NGINX Tuning on EC2 NGINX
  • 35. listen statement ● backlog ○ limited by net.core.somaxconn ● defer ○ TCP_DEFER_ACCEPT - Wait till we receive data packet before passing socket to server. Completing TCP Handshake won’t trigger an accept() 2 | TCP/NGINX Tuning on EC2
  • 36. server block ● sendfile ○ Saves context switching from userspace on read/write. ○ “zero copy” , happens in kernel space ● tcp_nopush ○ TCP_CORK ○ allows application to control building of packet, e.g pack a packet with full HTTP response ● tcp_nodelay ○ Nagle’s Algorithm ○ Only affects keep-alive connections ● multi_accept ○ Accept all connections on listen queue at once (careful, can overwhelm workers) 2 | TCP/NGINX Tuning on EC2
  • 37. Nagle’s Algorithm (tcp_nopush) Small payload + need for low latency? Disable 2 | TCP/NGINX Tuning on EC2
  • 38. HTTP Keep-Alive ● Enabled once behind ELB ● Given small payload and 15 seconds between data, waste of resources for us to enable exposed directly to net 2 | TCP/NGINX Tuning on EC2
  • 39. HTTP Keep-Alive Cont.. ● Also enable on upstream proxies ○ Available since 1.1.4 ○ *cough* had to upgrade Nginx and Fix memory leak dealing with libevent and keepalives before we could get this fully setup 2 | TCP/NGINX Tuning on EC2
  • 40. 2 | TCP/NGINX Tuning on EC2 ELB
  • 41. Cross-Zone load balancing Ensures requests to each ELB in each AZ go to ALL instances in ALL AZs 2 | TCP/NGINX Tuning on EC2
  • 42. Idle Connection Timeout ● Defaults to 60 seconds ● Finally tunable via API. ● Tweak if doing anything long lived , e.g. Websockets, or ensure you are sending “pings” 2 | TCP/NGINX Tuning on EC2
  • 43. Connection draining “Graceful” removal of node from ELB, will ensure existing connections can finish instead of a hard cutoff (old behavior) 2 | TCP/NGINX Tuning on EC2
  • 44. Metrics to monitor ● SurgeQueueLength (Not Good) A count of the total number of requests that are pending submission to a registered instance. ● SpilloverCount (BAD) A count of the total number of requests that were rejected due to the queue being full. 2 | TCP/NGINX Tuning on EC2
  • 45. Conclusions ● The internet is full of lies ● With enough traffic, tweaking system and application defaults are a necessary ● Find trusted sources (Me? Maybe?) for settings and test in staging environments ● Measure impact and understand what metrics may be impacted by your tweaks ● Don’t get lost in all the sysctl settings ● TCP is complicated 2 | TCP/NGINX Tuning on EC2
  • 46. 2 | TCP/NGINX Tuning on EC2 FIN FIN_WAIT_1 FIN_WAIT_2 TIME_WAIT
  • 47. Resources and References https://www.kernel. org/doc/Documentation/networking/ip-sysctl.txt 2 | TCP/NGINX Tuning on EC2 man tcp(7)
  • 48. Additional reading http://engineering.chartbeat.com Full story about experiences with our architecture and material discussed in slides 2 | TCP/NGINX Tuning on EC2
  • 49. Questions / Comments? 2 | TCP/NGINX Tuning on EC2 @Lintzston justin@chartbeat.com

Editor's Notes

  1. Record traffic during US Election and World Cup, USA vs Germany 10+ million concurrents. Presidential election at the time was 2x our normal traffic
  2. High packet rate, low bandwidth. 43 bytes is small empty image we can send. Need this for error handling purposes on frontend side. Can’t send empty response
  3. Reports from users about slowness in sending “pings” to our servers. Slow clients, slowness doesn’t really affect our numbers too much as long as its arriving < 5 seconds. Asked for some numbers, seeing pings taking around 3 seconds . That number sets off some alarms
  4. These two numbers should raise alarms when you are doing any troubleshooting with TCP connections. 3 seconds is the default timeout for re-trying a connection, will backoff and re-try after 6 seconds, so 9 seconds total for a connection
  5. Maybe on client side? How do we know it’s on us? At the time our Pingdom monitoring didn’t show anything unusual, later learned this is definitely not enough
  6. Especially if you don’t have a good baseline for some metrics, you will end up chasing oddities in graphs that may be completely irrelevant
  7. Only graphed relevant info from netstat -s, there is a ton of metrics that may be useful for debugging other issues, but we started with these since they appeared to be most related with the issue at hand. For example, “fast retransmit” , while relevant, wouldnt indicate a delay of 3 seconds, since it bypasses the timeout. Push these to ganglia/graphite
  8. Had to enable logging, discard after hour, space contraints. Rotation impacts performance, switch to ext4 on log volume helped, no compress
  9. Confirmed issues, didn’t give us a source. But some symptoms to look into.
  10. Two queues, first for half established connections , can make large to help with SYN floods, although given todays flood attacks, probably not much help Second queue for established connections for your app to pluck off max_syn_backlog = system wide net.core.somaxconn = per process
  11. Still not 100% sure what this controls, from looking at Kernel source , it appears to be this Nginx backlog originally wasn’t documented in documentation, I had to find this in the source code and from googling
  12. Didnt know about nginx listen backlog at first. Initially changed first three values, saw a slight decrease in timeouts and listen queue overflows ,took a bunch of reading till I learned about the fact that each application has to set its own backlog queue and even further research to find what nginxs default value was
  13. Kernel will reuse sockets in TIME_WAIT when it can, a socket in a TIME_WAIT state actually doesn’t take up any resources
  14. Tweak if dealing with sending/receiving large amounts of data to improve throughput We changed but our throughput is fairly low per server so didn’t see any measurable impact
  15. Internet gets this wrong a lot. TIME_WAIT takes up no memory
  16. everyone on the internet gets this wrong! If you really want to change TIME_WAIT time , see ip_conntrack_tcp_timeout_time_wait in ip_conntrack module
  17. The pressure relates to the rmem and wmem settings we set earlier.
  18. Definitions wrong, harmful settings recommended, even seen this in a lot of books when searching books.google.com for settings
  19. If you are reading any blog/book that recommends enabling this, run far away
  20. Amazing read into why recycle is bad, and why TIME_WAIT exists
  21. Allows for more data in flight, if you are serving up larger content, you will see nice improvements here
  22. Defer , saves on resources where handshake occurs but no data is sent or data is delayed. Leaves Nginx free to deal with connections already sending a data payload
  23. we set both to on. Small payloads tcp_nopush = application controls things tcp_nodelay = seamless to developer, just happens multi_accept, we have off, given our constant stream of connections, it can overwhelm downstream
  24. Lower CPU utilization as well
  25. Previous behavior, if request hit ELB in AZ us-east-1d, would only get routed to instances behind there. This change really smoothed out distribution for us
  26. Indicates capacity issues
  27. It’s easy to get carried away and tune too many things (premature optimization) or settings which may have little to no effect for you.
  28. Dont trust random blogs, filled with terrible information. Sysctl settings defined wrong or extremely vague