SlideShare a Scribd company logo
1 of 40
Zombie routes
Paweł Małachowski, 2020.09.29
@pawmal80
Agenda
1. BGP withdrawals and zombie routes
2. Real life cases
3. Detection and debugging
4. Zombie risk mitigation
whoami(1)
 Atende Software
 redGuardian DDoS mitigation
 my previous talks: DPDK, DPI/regexp, DUT perftesting, BGP hijacks
https://www.slideshare.net/atendesoftware/presentations
 Previously
 Netia S.A.
 ATM S.A.
 local hosting and ISP companies, community network
 Roles: system engineer, IT operations lead, business analyst
@pawmal80
BGP withdrawals and zombie routes
BGP zombie / ghost route
 „an active routing table entry for a prefix that has been withdrawn
by its origin network”
source: https://labs.ripe.net/Members/romain_fontugne/bgp-zombies (2019)
see also: „BGP Zombies: an Analysis of Beacons Stuck Routes” (2019),
https://www.iij-ii.co.jp/en/members/romain/pdf/romain_pam2019.pdf
 not a new phenomenon
 Ghost Route Hunter (2003): https://www.sixxs.net/tools/grh/what/
 „An overview of the global IPv6 routing table” (2005):
https://meetings.ripe.net/ripe-50/presentations/ripe50-plenary-tue-ipv6-routing.pdf
 may take hours/days to „expire”
BGP zombie / ghost route
 Who cares?
It was withdrawn anyway!
 Unless we are talking about
 partial withdrawal and some ingress traffic goes via different path
you may expect / does not converge or even loops
 more-specific route and zombie sits in Tier1/Tier2/NSP/IXP
infrastructure causing partial or complete outage
More-specific prefix usage examples
 Traffic engineering
 Announce 10.0.0.0/23 into global table
 Announce 10.0.0.0/24 to some IXP peers to override their local prefs
 Customer delegation
 ISP1 announces 10.0.0.0/16 PA block
 ISP1 delegates 10.1.2.0/24 to customer
 Customer runs own BGP, announces 10.1.2.0/24 via ISP1, ISP2 and IXP
Real life cases
2016 (TPNET-OTI loop)
 Orange PL (5617) – Opentransit (5511)
 Zombie AS path: 5511 1299 24724 57811 201029 x
 Looking glass:
 TPNET sees (zombie) more specific via OTI
 OTI has less specific via TPNET
 I gave up after 20 minute outage and reannounced
more specific to save „x”
 Withdrawn later with no issues
2016 (Interoute/AS8928 hijack)
1. Warsaw: PLIX, THINX, NASK
2. Interoute: Prague, Paris, Madrid
3. NTT Madrid
4. Telia: Madrid, Hamburg
5. Warsaw: TPNET
6. Customer
2016 (Interoute/AS8928 hijack)
• zombie /24 route via NTT at former
Interoute/Madrid hijacked significant part of
ingress traffic
• luckily, no loop; trace reaches customer in
Warsaw
• many hours, finally „fixed” by
announce/withdraw flaps
2018 (Telia loop)
Massive outage after
„1299 3356 …”
path withdrawal
2018 (Telia loop)
2018 (Telia loop)
• 1299 announces zombie route
• hijacks and loops large portion of ingress traffic
• we reproduced this problem with another, non-production prefix
• ~two days of disaster!
• „Routeprocessor Switchover in one of our backbone router in Chicago
solved the issue”
2020 (TATA-Level3 loop)
Router: gin-n0v-tcore1
Site: US, New York, N0V
Command: traceroute inet4 x as-number-lookup
traceroute to x (x), 30 hops max, 52 byte packets
1 if-ae-7-5.tcore1.nto-newyork.as6453.net (63.243.128.141) 2.990 ms 1.545 ms 1.369 ms
MPLS Label=415563 CoS=0 TTL=1 S=1
2 if-ae-9-2.tcore1.n75-newyork.as6453.net (63.243.128.122) 1.653 ms 1.704 ms 1.439 ms
3 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 3.038 ms 1.118 ms 3.086 ms
4 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 82.672 ms 81.989 ms 82.221 ms
5 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 82.072 ms 81.949 ms 81.731 ms
6 if-ae-4-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.17) 87.154 ms if-ae-59-2.tcore2.fnm-
frankfurt.as6453.net (195.219.87.194) 87.064 ms 87.038 ms
MPLS Label=486720 CoS=0 TTL=1 S=1
7 if-ae-30-2.tcore1.pvu-paris.as6453.net (80.231.153.89) 86.645 ms if-ae-9-3.tcore1.pvu-
paris.as6453.net (195.219.87.14) 87.036 ms if-ae-9-2.tcore1.pvu-paris.as6453.net (195.219.87.10)
87.412 ms
MPLS Label=345609 CoS=0 TTL=1 S=1
8 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 87.357 ms 87.522 ms 86.774 ms
MPLS Label=525823 CoS=0 TTL=1 S=1
9 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 87.089 ms 86.984 ms 87.120 ms
MPLS Label=558832 CoS=0 TTL=1 S=1
10 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 86.711 ms 86.872 ms 87.689 ms
MPLS Label=300093 CoS=0 TTL=1 S=1
11 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 86.838 ms 86.749 ms 86.667 ms
12 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 87.039 ms 86.777 ms 108.465 ms
13 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 167.903 ms 167.436 ms 167.919
ms
14 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 167.316 ms 167.016 ms 167.156 ms
15 if-ae-4-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.17) 172.082 ms 172.347 ms if-ae-59-
2.tcore2.fnm-frankfurt.as6453.net (195.219.87.194) 172.688 ms
MPLS Label=486720 CoS=0 TTL=1 S=1
16 if-ae-9-3.tcore1.pvu-paris.as6453.net (195.219.87.14) 172.403 ms if-ae-9-2.tcore1.pvu-
paris.as6453.net (195.219.87.10) 177.623 ms 172.588 ms
MPLS Label=345609 CoS=0 TTL=1 S=1
17 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 173.956 ms 176.402 ms 172.581
ms
MPLS Label=525823 CoS=0 TTL=1 S=1
18 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 172.784 ms 172.592 ms 172.921
ms
MPLS Label=558832 CoS=0 TTL=1 S=1
19 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 172.660 ms 172.503 ms
172.937 ms
MPLS Label=300093 CoS=0 TTL=1 S=1
20 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 172.258 ms 172.540 ms 171.995
ms
21 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 183.732 ms 171.950 ms
172.068 ms
22 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 252.748 ms 252.855 ms
252.719 ms
23 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 253.215 ms 253.049 ms
252.474 ms
24 if-ae-59-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.194) 258.598 ms if-ae-4-
2.tcore2.fnm-frankfurt.as6453.net (195.219.87.17) 258.467 ms 257.584 ms
MPLS Label=486720 CoS=0 TTL=1 S=1
25 if-ae-9-3.tcore1.pvu-paris.as6453.net (195.219.87.14) 257.906 ms 257.857 ms if-ae-9-
2.tcore1.pvu-paris.as6453.net (195.219.87.10) 258.308 ms
MPLS Label=345609 CoS=0 TTL=1 S=1
26 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 257.546 ms 257.812 ms 268.691
ms
MPLS Label=525823 CoS=0 TTL=1 S=1
27 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 261.149 ms 257.873 ms 258.124
ms
MPLS Label=558832 CoS=0 TTL=1 S=1
28 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 257.746 ms 257.491 ms
258.035 ms
MPLS Label=300093 CoS=0 TTL=1 S=1
29 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 257.737 ms 258.226 ms 257.614
ms
30 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 257.587 ms 259.322 ms
258.347 ms
2020 (TATA-Level3 loop)
…
20 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 172.258 ms 172.540 ms 171.995 ms
21 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 183.732 ms 171.950 ms 172.068 ms
22 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 252.748 ms 252.855 ms 252.719 ms
23 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 253.215 ms 253.049 ms 252.474 ms
24 if-ae-59-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.194) 258.598 ms if-ae-4-2.tcore2.fnm-
frankfurt.as6453.net (195.219.87.17) 258.467 ms 257.584 ms
MPLS Label=486720 CoS=0 TTL=1 S=1
25 if-ae-9-3.tcore1.pvu-paris.as6453.net (195.219.87.14) 257.906 ms 257.857 ms if-ae-9-2.tcore1.pvu-
paris.as6453.net (195.219.87.10) 258.308 ms
MPLS Label=345609 CoS=0 TTL=1 S=1
26 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 257.546 ms 257.812 ms 268.691 ms
MPLS Label=525823 CoS=0 TTL=1 S=1
27 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 261.149 ms 257.873 ms 258.124 ms
MPLS Label=558832 CoS=0 TTL=1 S=1
28 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 257.746 ms 257.491 ms 258.035 ms
MPLS Label=300093 CoS=0 TTL=1 S=1
29 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 257.737 ms 258.226 ms 257.614 ms
…
2020 (TATA-Level3 loop)
1. TATA/US „sees” more specific via Level3/US
2. Level3/US does not have this zombie route and
uses „cold potato” routing to reach
Level3/Frankfurt
3. Level3 passes packets to TATA in Frankfurt (less
specific route, destination is TATAs customer in
Poland)
4. once passed to TATA, „zombie more specific via
Level3” kicks in – traffic goes to Tata/US where
it is passed to Level3/US once again…
2020 (Level3 loop and zombie resurrection)
• First outage directly after withdrawal
• Finally BGP converges
• However, few hours later zombie route resurrects in AS3356 core and causes
another 1h outage
2020 (Level3 loop and zombie resurrection)
2020 Aug (well known Centurylink/Level3-related outage)
NANOG mailing list threads:
 „Centurylink having a bad morning?”
 „[outages] Major Level3 (CenturyLink) Issues”
https://mailman.nanog.org/pipermail/nanog/2020-August/thread.html
https://mailman.nanog.org/pipermail/nanog/2020-September/thread.html
https://puck.nether.net/pipermail/outages/2020-August/013204.html
2020 Aug (well known Centurylink/Level3-related outage)
Analysis:
 https://blog.thousandeyes.com/centurylink-level-3-outage-analysis/
„Level 3 continues to advertise stale routes despite services withdrawing routes”
 https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
 https://radar.qrator.net/blog/another-centurylink-bgp-incident
Detection and debugging
Detection & debugging
 Complete outage
 should be easy to spot
 Partial outage, suboptimal routing
 traces from the outer world
 BGP tables: Tier1s, NSP, ISP, IXP, HE.net, Qrator Radar and NLNOG Ring
looking glasses / route servers
 BGP updates log
Toolbox: traces
 http://ping.pe/
 simple and quick
 https://mtr.sh/
 fancy
 https://www.globaltraceroute.com/
 RIPE Atlas probes
 wide range of locations, very slow
ping.pe
mtr.sh
Toolbox: looking glasses
 http://lg.ring.nlnog.net/
 https://lg.he.net/
 https://radar.qrator.net/
 https://www.pch.net/tools/looking_glass/
NLNOG Ring Looking glass
BGP maps: HE vs. Qrator Radar
Toolbox: BGP updates
 PCH
 https://www.pch.net/resources/Routing_Data/IPv4_daily_snapshots/
 https://www.pch.net/resources/Raw_Routing_Data/
 RIPE
 https://stat.ripe.net/
 https://stat.ripe.net/special/bgplay (history)
 https://ris-live.ripe.net/ (live BGP stream)
 https://www.ripe.net/analyse/internet-measurements/routing-
information-service-ris/ris-raw-data
RIPE RIS Live
RIPE BGPlay
Zombie risk mitigation
Zombie risk mitigation
 Fix all Tier1 routers 
 Gradual more specific withdrawal
 stage 1: withdraw from distant locations and transits
 stage 2: withdraw from local/national peerings
 Selective more specific announcements
 by continent/peer
 no transit, just peerings
 gratis: faster convergence!
Selective announcements / traffic steering
 Use the communities, Luke!
 Features
 excellent customer BGP communities (NTT, Telia, GTT, DE-CIX)
 good enough
 ~nothing (HE)
 secret
 Transition
 transparent
 partial clear/override
 full clear
 overlap risk! (EC/LC still not widely adopted)
Example: add GTT leak to the mix (via RETN)
Note: covers all RETN, Telia, GTT and
TATA customers (not visible here)
Example: leak to Telia (via Level3)
Note: leaks to all Level3 customers
(incl. RETN) and Telia customers
Per customer announcement tailoring (BIRD filter syntax)
case bgp_path.last {
# ASx Customer Foo (uses: Level3, Telia)
x:
if pop = "PLIX" then bgp_community.add(level3_yes_telia);
if pop = "THINX" then bgp_community.add(retn_yes_telia);
if pop = "LINX" then {…}
# ASy Customer Bar (uses: GTT, Cogent)
y:
if pop = "PLIX" then bgp_community.add(level3_yes_cogent);
if pop = "THINX" then bgp_community.add(retn_yes_gtt);
if pop = "LINX" then {…}
# ASz Customer Baz...
}
docs: https://bird.network.cz/?get_doc&v=20&f=bird-5.html#ss5.4
Summary
 Still not well understood
 BGP update queueing, races/reordering, losses?
 BGP optimizers/stabilizers, broken damping?
 In $vendors we trust
 Avoid more-specifics in global table
 Monitor your reachability/visibility
e–Q&A
@redguardianeu

More Related Content

What's hot

DDoS Threats Landscape : Countering Large-scale DDoS attacks
DDoS Threats Landscape : Countering Large-scale DDoS attacksDDoS Threats Landscape : Countering Large-scale DDoS attacks
DDoS Threats Landscape : Countering Large-scale DDoS attacksMyNOG
 
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...Edge AI and Vision Alliance
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...Edge AI and Vision Alliance
 
Write microservice in golang
Write microservice in golangWrite microservice in golang
Write microservice in golangBo-Yi Wu
 
Andes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorialAndes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorialRISC-V International
 
Nmap Hacking Guide
Nmap Hacking GuideNmap Hacking Guide
Nmap Hacking GuideAryan G
 
Triple Data Encryption Standard (t-DES)
Triple Data Encryption Standard (t-DES) Triple Data Encryption Standard (t-DES)
Triple Data Encryption Standard (t-DES) Hardik Manocha
 
Next Generation Nexus 9000 Architecture
Next Generation Nexus 9000 ArchitectureNext Generation Nexus 9000 Architecture
Next Generation Nexus 9000 ArchitectureCisco Canada
 
Kubernetes networking: Introduction to overlay networks, communication models...
Kubernetes networking: Introduction to overlay networks, communication models...Kubernetes networking: Introduction to overlay networks, communication models...
Kubernetes networking: Introduction to overlay networks, communication models...Murat Mukhtarov
 
How to Intercept a Conversation Held on the Other Side of the Planet
How to Intercept a Conversation Held on the Other Side of the PlanetHow to Intercept a Conversation Held on the Other Side of the Planet
How to Intercept a Conversation Held on the Other Side of the PlanetPositive Hack Days
 

What's hot (20)

DDoS Threats Landscape : Countering Large-scale DDoS attacks
DDoS Threats Landscape : Countering Large-scale DDoS attacksDDoS Threats Landscape : Countering Large-scale DDoS attacks
DDoS Threats Landscape : Countering Large-scale DDoS attacks
 
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
 
Wireshark
Wireshark Wireshark
Wireshark
 
FPGA / SOC teknologi - i dag og i fremtiden
FPGA / SOC teknologi - i dag og i fremtidenFPGA / SOC teknologi - i dag og i fremtiden
FPGA / SOC teknologi - i dag og i fremtiden
 
Write microservice in golang
Write microservice in golangWrite microservice in golang
Write microservice in golang
 
[AVTOKYO 2017] What is red team?
[AVTOKYO 2017] What is red team?[AVTOKYO 2017] What is red team?
[AVTOKYO 2017] What is red team?
 
Andes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorialAndes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorial
 
Nmap Hacking Guide
Nmap Hacking GuideNmap Hacking Guide
Nmap Hacking Guide
 
Python basic
Python basicPython basic
Python basic
 
Wireshark
WiresharkWireshark
Wireshark
 
Triple Data Encryption Standard (t-DES)
Triple Data Encryption Standard (t-DES) Triple Data Encryption Standard (t-DES)
Triple Data Encryption Standard (t-DES)
 
Next Generation Nexus 9000 Architecture
Next Generation Nexus 9000 ArchitectureNext Generation Nexus 9000 Architecture
Next Generation Nexus 9000 Architecture
 
NMAP - The Network Scanner
NMAP - The Network ScannerNMAP - The Network Scanner
NMAP - The Network Scanner
 
Nmap
NmapNmap
Nmap
 
Kubernetes networking: Introduction to overlay networks, communication models...
Kubernetes networking: Introduction to overlay networks, communication models...Kubernetes networking: Introduction to overlay networks, communication models...
Kubernetes networking: Introduction to overlay networks, communication models...
 
Cryptography - 101
Cryptography - 101Cryptography - 101
Cryptography - 101
 
Nmap
NmapNmap
Nmap
 
How to Intercept a Conversation Held on the Other Side of the Planet
How to Intercept a Conversation Held on the Other Side of the PlanetHow to Intercept a Conversation Held on the Other Side of the Planet
How to Intercept a Conversation Held on the Other Side of the Planet
 

Similar to BGP zombie routes

Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemSneha Inguva
 
K8s上の containerized cloud foundryとcontainerized open stackをprometheusで監視してみる
K8s上の containerized cloud foundryとcontainerized open stackをprometheusで監視してみるK8s上の containerized cloud foundryとcontainerized open stackをprometheusで監視してみる
K8s上の containerized cloud foundryとcontainerized open stackをprometheusで監視してみるJUNICHI YOSHISE
 
Chapter 3. sensors in the network domain
Chapter 3. sensors in the network domainChapter 3. sensors in the network domain
Chapter 3. sensors in the network domainPhu Nguyen
 
Seqüestro de dados na Internet
Seqüestro de dados na InternetSeqüestro de dados na Internet
Seqüestro de dados na InternetJoão S Magalhães
 
Introduction to tcp ip linux networking
Introduction to tcp ip   linux networkingIntroduction to tcp ip   linux networking
Introduction to tcp ip linux networkingSreenatha Reddy K R
 
Lecture 06 and 07.pptx
Lecture 06 and 07.pptxLecture 06 and 07.pptx
Lecture 06 and 07.pptxHanzlaNaveed1
 
TechWiseTV Workshop: Software-Defined Access
TechWiseTV Workshop: Software-Defined AccessTechWiseTV Workshop: Software-Defined Access
TechWiseTV Workshop: Software-Defined AccessRobb Boyd
 
Black Hat Europe 2015 - Time and Position Spoofing with Open Source Projects
Black Hat Europe 2015 - Time and Position Spoofing with Open Source ProjectsBlack Hat Europe 2015 - Time and Position Spoofing with Open Source Projects
Black Hat Europe 2015 - Time and Position Spoofing with Open Source ProjectsWang Kang
 
Data centre networking at London School of Economics and Political Science - ...
Data centre networking at London School of Economics and Political Science - ...Data centre networking at London School of Economics and Political Science - ...
Data centre networking at London School of Economics and Political Science - ...Jisc
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioHajime Tazaki
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PROIDEA
 
Ground to ns3 - Basic wireless topology implementation
Ground to ns3 - Basic wireless topology implementationGround to ns3 - Basic wireless topology implementation
Ground to ns3 - Basic wireless topology implementationJawad Khan
 
Next-gen Network Telemetry is Within Your Packets: In-band OAM
Next-gen Network Telemetry is Within Your Packets: In-band OAMNext-gen Network Telemetry is Within Your Packets: In-band OAM
Next-gen Network Telemetry is Within Your Packets: In-band OAMFrank Brockners
 

Similar to BGP zombie routes (20)

Unix 4 en
Unix 4 enUnix 4 en
Unix 4 en
 
How to use mtr 2
How to use mtr 2How to use mtr 2
How to use mtr 2
 
Day2
Day2Day2
Day2
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use Them
 
K8s上の containerized cloud foundryとcontainerized open stackをprometheusで監視してみる
K8s上の containerized cloud foundryとcontainerized open stackをprometheusで監視してみるK8s上の containerized cloud foundryとcontainerized open stackをprometheusで監視してみる
K8s上の containerized cloud foundryとcontainerized open stackをprometheusで監視してみる
 
Chapter 3. sensors in the network domain
Chapter 3. sensors in the network domainChapter 3. sensors in the network domain
Chapter 3. sensors in the network domain
 
Quic illustrated
Quic illustratedQuic illustrated
Quic illustrated
 
Seqüestro de dados na Internet
Seqüestro de dados na InternetSeqüestro de dados na Internet
Seqüestro de dados na Internet
 
Linux networking
Linux networkingLinux networking
Linux networking
 
Introduction to tcp ip linux networking
Introduction to tcp ip   linux networkingIntroduction to tcp ip   linux networking
Introduction to tcp ip linux networking
 
Lecture 06 and 07.pptx
Lecture 06 and 07.pptxLecture 06 and 07.pptx
Lecture 06 and 07.pptx
 
The Internet
The InternetThe Internet
The Internet
 
TechWiseTV Workshop: Software-Defined Access
TechWiseTV Workshop: Software-Defined AccessTechWiseTV Workshop: Software-Defined Access
TechWiseTV Workshop: Software-Defined Access
 
Black Hat Europe 2015 - Time and Position Spoofing with Open Source Projects
Black Hat Europe 2015 - Time and Position Spoofing with Open Source ProjectsBlack Hat Europe 2015 - Time and Position Spoofing with Open Source Projects
Black Hat Europe 2015 - Time and Position Spoofing with Open Source Projects
 
Data centre networking at London School of Economics and Political Science - ...
Data centre networking at London School of Economics and Political Science - ...Data centre networking at London School of Economics and Political Science - ...
Data centre networking at London School of Economics and Political Science - ...
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
 
Stress your DUT
Stress your DUTStress your DUT
Stress your DUT
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
 
Ground to ns3 - Basic wireless topology implementation
Ground to ns3 - Basic wireless topology implementationGround to ns3 - Basic wireless topology implementation
Ground to ns3 - Basic wireless topology implementation
 
Next-gen Network Telemetry is Within Your Packets: In-band OAM
Next-gen Network Telemetry is Within Your Packets: In-band OAMNext-gen Network Telemetry is Within Your Packets: In-band OAM
Next-gen Network Telemetry is Within Your Packets: In-band OAM
 

More from Redge Technologies

[PL] DDoS na sieć ISP (KIKE 2023)
[PL] DDoS na sieć ISP (KIKE 2023)[PL] DDoS na sieć ISP (KIKE 2023)
[PL] DDoS na sieć ISP (KIKE 2023)Redge Technologies
 
100M pakietów na sekundę czyli jak radzić sobie z atakami DDoS
100M pakietów na sekundę czyli jak radzić sobie z atakami DDoS100M pakietów na sekundę czyli jak radzić sobie z atakami DDoS
100M pakietów na sekundę czyli jak radzić sobie z atakami DDoSRedge Technologies
 
redGuardian DP100 large scale DDoS mitigation solution
redGuardian DP100 large scale DDoS mitigation solutionredGuardian DP100 large scale DDoS mitigation solution
redGuardian DP100 large scale DDoS mitigation solutionRedge Technologies
 
Ochrona przed atakami DDoS na platformie x86. Czy można mieć jednocześnie wyd...
Ochrona przed atakami DDoS na platformie x86. Czy można mieć jednocześnie wyd...Ochrona przed atakami DDoS na platformie x86. Czy można mieć jednocześnie wyd...
Ochrona przed atakami DDoS na platformie x86. Czy można mieć jednocześnie wyd...Redge Technologies
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformRedge Technologies
 
100Mpps czyli jak radzić sobie z atakami DDoS?
100Mpps czyli jak radzić sobie z atakami DDoS?100Mpps czyli jak radzić sobie z atakami DDoS?
100Mpps czyli jak radzić sobie z atakami DDoS?Redge Technologies
 
SCAP – standaryzacja formatów wymiany danych w zakresie bezpieczeństwa IT
SCAP – standaryzacja formatów wymiany danych w zakresie bezpieczeństwa ITSCAP – standaryzacja formatów wymiany danych w zakresie bezpieczeństwa IT
SCAP – standaryzacja formatów wymiany danych w zakresie bezpieczeństwa ITRedge Technologies
 
100 M pakietów na sekundę dla każdego.
100 M pakietów na sekundę dla każdego. 100 M pakietów na sekundę dla każdego.
100 M pakietów na sekundę dla każdego. Redge Technologies
 

More from Redge Technologies (11)

[PL] DDoS na sieć ISP (KIKE 2023)
[PL] DDoS na sieć ISP (KIKE 2023)[PL] DDoS na sieć ISP (KIKE 2023)
[PL] DDoS na sieć ISP (KIKE 2023)
 
100M pakietów na sekundę czyli jak radzić sobie z atakami DDoS
100M pakietów na sekundę czyli jak radzić sobie z atakami DDoS100M pakietów na sekundę czyli jak radzić sobie z atakami DDoS
100M pakietów na sekundę czyli jak radzić sobie z atakami DDoS
 
BGP hijacks and leaks
BGP hijacks and leaksBGP hijacks and leaks
BGP hijacks and leaks
 
redGuardian DP100 large scale DDoS mitigation solution
redGuardian DP100 large scale DDoS mitigation solutionredGuardian DP100 large scale DDoS mitigation solution
redGuardian DP100 large scale DDoS mitigation solution
 
Ochrona przed atakami DDoS na platformie x86. Czy można mieć jednocześnie wyd...
Ochrona przed atakami DDoS na platformie x86. Czy można mieć jednocześnie wyd...Ochrona przed atakami DDoS na platformie x86. Czy można mieć jednocześnie wyd...
Ochrona przed atakami DDoS na platformie x86. Czy można mieć jednocześnie wyd...
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
 
100Mpps czyli jak radzić sobie z atakami DDoS?
100Mpps czyli jak radzić sobie z atakami DDoS?100Mpps czyli jak radzić sobie z atakami DDoS?
100Mpps czyli jak radzić sobie z atakami DDoS?
 
SCAP – standaryzacja formatów wymiany danych w zakresie bezpieczeństwa IT
SCAP – standaryzacja formatów wymiany danych w zakresie bezpieczeństwa ITSCAP – standaryzacja formatów wymiany danych w zakresie bezpieczeństwa IT
SCAP – standaryzacja formatów wymiany danych w zakresie bezpieczeństwa IT
 
Na froncie walki z DDoS
Na froncie walki z DDoSNa froncie walki z DDoS
Na froncie walki z DDoS
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
100 M pakietów na sekundę dla każdego.
100 M pakietów na sekundę dla każdego. 100 M pakietów na sekundę dla każdego.
100 M pakietów na sekundę dla każdego.
 

Recently uploaded

Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesLumiverse Solutions Pvt Ltd
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 

Recently uploaded (9)

Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best Practices
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 

BGP zombie routes

  • 1. Zombie routes Paweł Małachowski, 2020.09.29 @pawmal80
  • 2. Agenda 1. BGP withdrawals and zombie routes 2. Real life cases 3. Detection and debugging 4. Zombie risk mitigation
  • 3. whoami(1)  Atende Software  redGuardian DDoS mitigation  my previous talks: DPDK, DPI/regexp, DUT perftesting, BGP hijacks https://www.slideshare.net/atendesoftware/presentations  Previously  Netia S.A.  ATM S.A.  local hosting and ISP companies, community network  Roles: system engineer, IT operations lead, business analyst @pawmal80
  • 4. BGP withdrawals and zombie routes
  • 5. BGP zombie / ghost route  „an active routing table entry for a prefix that has been withdrawn by its origin network” source: https://labs.ripe.net/Members/romain_fontugne/bgp-zombies (2019) see also: „BGP Zombies: an Analysis of Beacons Stuck Routes” (2019), https://www.iij-ii.co.jp/en/members/romain/pdf/romain_pam2019.pdf  not a new phenomenon  Ghost Route Hunter (2003): https://www.sixxs.net/tools/grh/what/  „An overview of the global IPv6 routing table” (2005): https://meetings.ripe.net/ripe-50/presentations/ripe50-plenary-tue-ipv6-routing.pdf  may take hours/days to „expire”
  • 6. BGP zombie / ghost route  Who cares? It was withdrawn anyway!  Unless we are talking about  partial withdrawal and some ingress traffic goes via different path you may expect / does not converge or even loops  more-specific route and zombie sits in Tier1/Tier2/NSP/IXP infrastructure causing partial or complete outage
  • 7. More-specific prefix usage examples  Traffic engineering  Announce 10.0.0.0/23 into global table  Announce 10.0.0.0/24 to some IXP peers to override their local prefs  Customer delegation  ISP1 announces 10.0.0.0/16 PA block  ISP1 delegates 10.1.2.0/24 to customer  Customer runs own BGP, announces 10.1.2.0/24 via ISP1, ISP2 and IXP
  • 9. 2016 (TPNET-OTI loop)  Orange PL (5617) – Opentransit (5511)  Zombie AS path: 5511 1299 24724 57811 201029 x  Looking glass:  TPNET sees (zombie) more specific via OTI  OTI has less specific via TPNET  I gave up after 20 minute outage and reannounced more specific to save „x”  Withdrawn later with no issues
  • 10. 2016 (Interoute/AS8928 hijack) 1. Warsaw: PLIX, THINX, NASK 2. Interoute: Prague, Paris, Madrid 3. NTT Madrid 4. Telia: Madrid, Hamburg 5. Warsaw: TPNET 6. Customer
  • 11. 2016 (Interoute/AS8928 hijack) • zombie /24 route via NTT at former Interoute/Madrid hijacked significant part of ingress traffic • luckily, no loop; trace reaches customer in Warsaw • many hours, finally „fixed” by announce/withdraw flaps
  • 12. 2018 (Telia loop) Massive outage after „1299 3356 …” path withdrawal
  • 14. 2018 (Telia loop) • 1299 announces zombie route • hijacks and loops large portion of ingress traffic • we reproduced this problem with another, non-production prefix • ~two days of disaster! • „Routeprocessor Switchover in one of our backbone router in Chicago solved the issue”
  • 15. 2020 (TATA-Level3 loop) Router: gin-n0v-tcore1 Site: US, New York, N0V Command: traceroute inet4 x as-number-lookup traceroute to x (x), 30 hops max, 52 byte packets 1 if-ae-7-5.tcore1.nto-newyork.as6453.net (63.243.128.141) 2.990 ms 1.545 ms 1.369 ms MPLS Label=415563 CoS=0 TTL=1 S=1 2 if-ae-9-2.tcore1.n75-newyork.as6453.net (63.243.128.122) 1.653 ms 1.704 ms 1.439 ms 3 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 3.038 ms 1.118 ms 3.086 ms 4 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 82.672 ms 81.989 ms 82.221 ms 5 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 82.072 ms 81.949 ms 81.731 ms 6 if-ae-4-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.17) 87.154 ms if-ae-59-2.tcore2.fnm- frankfurt.as6453.net (195.219.87.194) 87.064 ms 87.038 ms MPLS Label=486720 CoS=0 TTL=1 S=1 7 if-ae-30-2.tcore1.pvu-paris.as6453.net (80.231.153.89) 86.645 ms if-ae-9-3.tcore1.pvu- paris.as6453.net (195.219.87.14) 87.036 ms if-ae-9-2.tcore1.pvu-paris.as6453.net (195.219.87.10) 87.412 ms MPLS Label=345609 CoS=0 TTL=1 S=1 8 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 87.357 ms 87.522 ms 86.774 ms MPLS Label=525823 CoS=0 TTL=1 S=1 9 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 87.089 ms 86.984 ms 87.120 ms MPLS Label=558832 CoS=0 TTL=1 S=1 10 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 86.711 ms 86.872 ms 87.689 ms MPLS Label=300093 CoS=0 TTL=1 S=1 11 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 86.838 ms 86.749 ms 86.667 ms 12 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 87.039 ms 86.777 ms 108.465 ms 13 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 167.903 ms 167.436 ms 167.919 ms 14 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 167.316 ms 167.016 ms 167.156 ms 15 if-ae-4-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.17) 172.082 ms 172.347 ms if-ae-59- 2.tcore2.fnm-frankfurt.as6453.net (195.219.87.194) 172.688 ms MPLS Label=486720 CoS=0 TTL=1 S=1 16 if-ae-9-3.tcore1.pvu-paris.as6453.net (195.219.87.14) 172.403 ms if-ae-9-2.tcore1.pvu- paris.as6453.net (195.219.87.10) 177.623 ms 172.588 ms MPLS Label=345609 CoS=0 TTL=1 S=1 17 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 173.956 ms 176.402 ms 172.581 ms MPLS Label=525823 CoS=0 TTL=1 S=1 18 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 172.784 ms 172.592 ms 172.921 ms MPLS Label=558832 CoS=0 TTL=1 S=1 19 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 172.660 ms 172.503 ms 172.937 ms MPLS Label=300093 CoS=0 TTL=1 S=1 20 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 172.258 ms 172.540 ms 171.995 ms 21 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 183.732 ms 171.950 ms 172.068 ms 22 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 252.748 ms 252.855 ms 252.719 ms 23 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 253.215 ms 253.049 ms 252.474 ms 24 if-ae-59-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.194) 258.598 ms if-ae-4- 2.tcore2.fnm-frankfurt.as6453.net (195.219.87.17) 258.467 ms 257.584 ms MPLS Label=486720 CoS=0 TTL=1 S=1 25 if-ae-9-3.tcore1.pvu-paris.as6453.net (195.219.87.14) 257.906 ms 257.857 ms if-ae-9- 2.tcore1.pvu-paris.as6453.net (195.219.87.10) 258.308 ms MPLS Label=345609 CoS=0 TTL=1 S=1 26 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 257.546 ms 257.812 ms 268.691 ms MPLS Label=525823 CoS=0 TTL=1 S=1 27 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 261.149 ms 257.873 ms 258.124 ms MPLS Label=558832 CoS=0 TTL=1 S=1 28 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 257.746 ms 257.491 ms 258.035 ms MPLS Label=300093 CoS=0 TTL=1 S=1 29 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 257.737 ms 258.226 ms 257.614 ms 30 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 257.587 ms 259.322 ms 258.347 ms
  • 16. 2020 (TATA-Level3 loop) … 20 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 172.258 ms 172.540 ms 171.995 ms 21 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 183.732 ms 171.950 ms 172.068 ms 22 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 252.748 ms 252.855 ms 252.719 ms 23 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 253.215 ms 253.049 ms 252.474 ms 24 if-ae-59-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.194) 258.598 ms if-ae-4-2.tcore2.fnm- frankfurt.as6453.net (195.219.87.17) 258.467 ms 257.584 ms MPLS Label=486720 CoS=0 TTL=1 S=1 25 if-ae-9-3.tcore1.pvu-paris.as6453.net (195.219.87.14) 257.906 ms 257.857 ms if-ae-9-2.tcore1.pvu- paris.as6453.net (195.219.87.10) 258.308 ms MPLS Label=345609 CoS=0 TTL=1 S=1 26 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 257.546 ms 257.812 ms 268.691 ms MPLS Label=525823 CoS=0 TTL=1 S=1 27 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 261.149 ms 257.873 ms 258.124 ms MPLS Label=558832 CoS=0 TTL=1 S=1 28 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 257.746 ms 257.491 ms 258.035 ms MPLS Label=300093 CoS=0 TTL=1 S=1 29 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 257.737 ms 258.226 ms 257.614 ms …
  • 17. 2020 (TATA-Level3 loop) 1. TATA/US „sees” more specific via Level3/US 2. Level3/US does not have this zombie route and uses „cold potato” routing to reach Level3/Frankfurt 3. Level3 passes packets to TATA in Frankfurt (less specific route, destination is TATAs customer in Poland) 4. once passed to TATA, „zombie more specific via Level3” kicks in – traffic goes to Tata/US where it is passed to Level3/US once again…
  • 18. 2020 (Level3 loop and zombie resurrection) • First outage directly after withdrawal • Finally BGP converges • However, few hours later zombie route resurrects in AS3356 core and causes another 1h outage
  • 19. 2020 (Level3 loop and zombie resurrection)
  • 20. 2020 Aug (well known Centurylink/Level3-related outage) NANOG mailing list threads:  „Centurylink having a bad morning?”  „[outages] Major Level3 (CenturyLink) Issues” https://mailman.nanog.org/pipermail/nanog/2020-August/thread.html https://mailman.nanog.org/pipermail/nanog/2020-September/thread.html https://puck.nether.net/pipermail/outages/2020-August/013204.html
  • 21. 2020 Aug (well known Centurylink/Level3-related outage) Analysis:  https://blog.thousandeyes.com/centurylink-level-3-outage-analysis/ „Level 3 continues to advertise stale routes despite services withdrawing routes”  https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/  https://radar.qrator.net/blog/another-centurylink-bgp-incident
  • 23. Detection & debugging  Complete outage  should be easy to spot  Partial outage, suboptimal routing  traces from the outer world  BGP tables: Tier1s, NSP, ISP, IXP, HE.net, Qrator Radar and NLNOG Ring looking glasses / route servers  BGP updates log
  • 24. Toolbox: traces  http://ping.pe/  simple and quick  https://mtr.sh/  fancy  https://www.globaltraceroute.com/  RIPE Atlas probes  wide range of locations, very slow
  • 27. Toolbox: looking glasses  http://lg.ring.nlnog.net/  https://lg.he.net/  https://radar.qrator.net/  https://www.pch.net/tools/looking_glass/
  • 29. BGP maps: HE vs. Qrator Radar
  • 30. Toolbox: BGP updates  PCH  https://www.pch.net/resources/Routing_Data/IPv4_daily_snapshots/  https://www.pch.net/resources/Raw_Routing_Data/  RIPE  https://stat.ripe.net/  https://stat.ripe.net/special/bgplay (history)  https://ris-live.ripe.net/ (live BGP stream)  https://www.ripe.net/analyse/internet-measurements/routing- information-service-ris/ris-raw-data
  • 34. Zombie risk mitigation  Fix all Tier1 routers   Gradual more specific withdrawal  stage 1: withdraw from distant locations and transits  stage 2: withdraw from local/national peerings  Selective more specific announcements  by continent/peer  no transit, just peerings  gratis: faster convergence!
  • 35. Selective announcements / traffic steering  Use the communities, Luke!  Features  excellent customer BGP communities (NTT, Telia, GTT, DE-CIX)  good enough  ~nothing (HE)  secret  Transition  transparent  partial clear/override  full clear  overlap risk! (EC/LC still not widely adopted)
  • 36. Example: add GTT leak to the mix (via RETN) Note: covers all RETN, Telia, GTT and TATA customers (not visible here)
  • 37. Example: leak to Telia (via Level3) Note: leaks to all Level3 customers (incl. RETN) and Telia customers
  • 38. Per customer announcement tailoring (BIRD filter syntax) case bgp_path.last { # ASx Customer Foo (uses: Level3, Telia) x: if pop = "PLIX" then bgp_community.add(level3_yes_telia); if pop = "THINX" then bgp_community.add(retn_yes_telia); if pop = "LINX" then {…} # ASy Customer Bar (uses: GTT, Cogent) y: if pop = "PLIX" then bgp_community.add(level3_yes_cogent); if pop = "THINX" then bgp_community.add(retn_yes_gtt); if pop = "LINX" then {…} # ASz Customer Baz... } docs: https://bird.network.cz/?get_doc&v=20&f=bird-5.html#ss5.4
  • 39. Summary  Still not well understood  BGP update queueing, races/reordering, losses?  BGP optimizers/stabilizers, broken damping?  In $vendors we trust  Avoid more-specifics in global table  Monitor your reachability/visibility