SlideShare a Scribd company logo
1 of 101
Download to read offline
SALT + NETBOX + VYOS
= NETWORK AUTOMATION
+ ROUTING SECURITY
MAREK ISALSKI — FAELIX
faelix.link/vuknof3
VyOS SaltStack YAML Netbox
BGP OSPF FRR RPKI IRR XDP
bgpq3 UTRS RTBH NetFlow
MAREK ISALSKI — FAELIX
faelix.link/vuknof3
Who?
Marek and Lou and Laura
A bunch of @NetworkMoose
Giants
Who?
Marek and Lou and Laura
A bunch of @NetworkMoose
Giants:
FRR, Routinator 3000, VyOS+contributors, Linux,
netfilter/iptables, Salt+contributors, RIPE+RIRs,
Team Cymru, DigitalOcean+Netbox-contributors…
The Story So Far…
2019-04: UKNOF43 MikroTik IPv6 routing vuln talk
Hinted about AS41495's future plans
SCANNING IPv6
ADDRESS SPACE…
AND THE REMOTE VULNERABILITIES IT UNCOVERS
https://faelix.link/uknof43
The Story So Far…
2019-04: UKNOF43 MikroTik IPv6 routing vuln talk
Hinted about AS41495's future plans
2019-10: NetMcr40 post-migration tech talk
Mentioned shortcomings and follow-on actions
2020-04: UKNOF46 talk on operational experience
…postponed till next time?
The Story So Far…
2019-04: UKNOF43 MikroTik IPv6 routing vuln talk
Hinted about AS41495's future plans
2019-10: NetMcr40 post-migration tech talk
Mentioned shortcomings and follow-on actions
2020-04: UKNOF46 talk on operational experience
…postponed till next time?
no time like the present!
Where?
This story all took place in:
Reynolds House (Equinix MA2)
Williams House (Equinix MA1)
…and then "laps of honour" in:
AQL DC2
Telehouse West and North
Interxion LON2
Why?
Why?
actually involved
MX routers, but cautionary tale
applies to Cisco 7600/6500 and
MikroTik CCR too!
Why?
RIPE NCC Update 2019-10-02
Today we allocated the last of our contiguous /22
IPv4 address blocks. We still have approximately
one million addresses available, in the form of
/23s and /24s, and we will continue making /22-
equivalent allocations made up of these smaller
blocks. Once we can no longer allocate the
equivalent of a /22, we will announce that we have
reached run-out. We expect this to occur in
November 2019.
Why?
1M routes
in 2021?
Why?
DFZ
Why?
Why?
How? DFZ
How?
DFZ
OSPF
Area 0
How?
DFZ 800k routes800k routes
default! default!
HARDWARE
How?
4x 10G SFP+ ports (must not be RJ45, cannot be
QSFP+ with SFP+ breakout)
At least 2x 1G RJ45 ports
Low CPU power draw
Prefer GHz over cores (single-thread performance)
~16 Gb RAM
Ideally 1U
Ideally dual PSU and IPMI
How?
SM Chassis: Front I/O, Redundant 400W PSUs
SM X11SCM-F (2x RJ45 Intel® I210-AT + OOB/IPMI)
Xeon E3-1265Lv5 (4c/8t, 2.9GHz, 45W)
16GB DDR4 2666 ECC, 2x 240Gb SSDs
Intel X710-DA4 (4x SFP+ NIC)
How?
How?
SOFTWARE
VyOS
Linux
FRR (Quagga fork) for routing protocols
iproute2, iptables, ipset, etc
VyOS
Linux
FRR (Quagga fork) for routing protocols
iproute2, iptables, ipset, etc
VyOS 1.3: planned XDP (eXpress Data Path) support
Like DPDK / FDIO / VPP (but less tap/vif mess)
Up towards 20Mpps per CPU core…!
SaltStack
Big investment in Salt @faelix
Use it for bare metal (virtualisation clusters)
Use it for our ISP services and NFVs
Use it for full-/part-managed customer VPSs
…why not use it for routers?
VyOS includes salt-minion (agent)
Introducing: hphr
"Halophile Router" (salt-loving)
Uses Netbox for single source-of-truth
Uses SLS (YAML) for anything not possible in Netbox
Generates a VyOS router configuration file
Loads, compares, commits, saves to config.boot
Released open-source: github.com/faelix/hphr
Addressing
Connections
Sub-Interfaces
OSPF, BGP, etc?
Netbox ≈ "physical infra"
Using SaltStack's "pillar" in two ways:
Netbox for DCIM / IPAM
SLS (YAML-syntax) files for OSPF, BGP, etc
Peering/Transit Interface
P/T Interface: BCP38
P/T Interface: BCP38
P/T Interface: BCP38
bgpq3
P/T Interface: BCP38
bgpq3
BCP38 Implementation
Throws away entire VyOS firewall stack
Native commands are off-limits fragility
Currently using iptables
Will transition to nft it's the future, offload, etc
Eventually move this into XDP
Along with routing decision more PPS
P/T Interface: NetFlow
BGP Peering
BGP Peering
bgpq3
BGP Peering
bgpq3 AS-LLNW | wc
50436 252179 2056972
BGP Peering
“aggregate”
bgpq3 -A AS-LLNW | wc
18425 106080 792306
BGP Peering
bgpq3 -A AS-LLNW | wc
18425 106080 792306
wc /config/config.boot
558568
“aggregate”
Long
Bootup
Times!
BGP Peering
BGP Peering
BGP Peering
RPKI via RTRR
BGP via PeeringDB (v2)
BGP via PeeringDB (v2)
peering
by ASN
copy-paste
is dead
BGP via PeeringDB (v2)
peeringdb
API via salt
BGP via PeeringDB (v2)
defaults
overrides
BGP Upstream
BGP Upstream
BGP Upstream
BGP Upstream
RPKI via RTRR
RPKI
Routinator 3000
GOTCHAS
BGP C/P Protection
BGP C/P Protection
bgpd state -> unresponsive : no response
yet to ping sent 90 seconds ago
BGP C/P Protection
/usr/lib/frr/watchfrr.sh restart bgpd
BGP C/P Protection
847177 bgp routes removed from the rib
BGP C/P Protection
847177 bgp routes removed from the rib
bgpd restarting, we're gonna be ok?!
[NARRATOR VOICE]
THEY WERE NOT "OK"
bly$ show ip bgp summary
% BGP instance not found
bly$
Architecture of VyOS
protocols {
bgp 41495 {
………
}
}
router bgp 41495
………"VyOS"
/config/config.boot
FRR running-config
Architecture of VyOS
protocols {
bgp 41495 {
………
}
}
router bgp 41495
………"VyOS"
/config/config.boot
FRR running-config
https://phabricator.vyos.net/
T1514
So: Reboot the Router
https://phabricator.vyos.net/T1514
FRR's running-config not "refreshable" if a
daemon crashes, can only reboot router to make
VyOS send new running-config
or: delete protocols bgp 41495, commit,
load /config/config.boot, commit
Rebooting a router is annoying, but we're in a
maintenance window, the other router is ok, so we're
gonna be ok.
[NARRATOR VOICE]
THEY WERE NOT "OK"
FRR Converges BGP Fast
Router booted up
Began configuring itself
Brought interfaces up
Brought BGP instance up
Cogent transit session established
Peering sessions on LINX established
…then …?
FRR Converges BGP Fast
Router booted up
Began configuring itself
Brought interfaces up
Brought BGP instance up
Cogent transit session established
Peering sessions on LINX established
…then VyOS applied prefix-lists and route-maps
FRR Converges BGP Fast
Router booted up
Began configuring itself
Brought interfaces up
Brought BGP instance up
Cogent transit session established
Peering sessions on LINX established
…then VyOS applied prefix-lists and route-maps :-(
FRR Converges BGP Fast
Router booted up
Began configuring itself
Brought interfaces up
Brought BGP instance up
Cogent transit session established
Peering sessions on LINX established
…then VyOS applied prefix-lists and route-maps :-(
ANNOUNCING 800K ROUTES
TO LINX MAN (BY ACCIDENT)
ANNOUNCING 800K ROUTES
TO LINX MAN (BY ACCIDENT)
PROVIDING TRANSIT
TO TIER-1 ASNs Achievement
Unlocked :(
ANNOUNCING 800K ROUTES
TO LINX MAN (BY ACCIDENT)
T1698 + T1148 + T944
BUGS!
Achievement
Unlocked :(
FRR Impedance Mismatch
router bgp 41495
neighbor 195.66.244.6 remote-as 6939
neighbor 195.66.244.6 soft-reconfiguration inbound
neighbor 195.66.244.6 maximum-prefix 200000
neighbor 195.66.244.6 prefix-list hphr-DFZ-IPv4 in
neighbor 195.66.244.6 prefix-list auto-AS-FAELIX out
neighbor 195.66.244.6 route-map AS6939-LINX-in-IPv4 in
neighbor 195.66.244.6 route-map TRANSIT-out-IPv4 out
FRR Impedance Mismatch
router bgp 41495
neighbor 195.66.244.6 remote-as 6939
neighbor 195.66.244.6 soft-reconfiguration inbound
neighbor 195.66.244.6 maximum-prefix 200000
neighbor 195.66.244.6 prefix-list hphr-DFZ-IPv4 in
neighbor 195.66.244.6 prefix-list auto-AS-FAELIX out
neighbor 195.66.244.6 route-map AS6939-LINX-in-IPv4 in
neighbor 195.66.244.6 route-map TRANSIT-out-IPv4 out
neighbor shutdown
no shutdown
FRR Impedance Mismatch
"Since FRR lacks a command for creating peers in down state, no other
workaround is possible." Bug: T1148
router bgp 41495
neighbor 195.66.244.6 remote-as 6939
neighbor 195.66.244.6 soft-reconfiguration inbound
neighbor 195.66.244.6 maximum-prefix 200000
neighbor 195.66.244.6 prefix-list hphr-DFZ-IPv4 in
neighbor 195.66.244.6 prefix-list auto-AS-FAELIX out
neighbor 195.66.244.6 route-map AS6939-LINX-in-IPv4 in
neighbor 195.66.244.6 route-map TRANSIT-out-IPv4 out
FRR Impedance Mismatch
"Since FRR lacks a command for creating peers in down state, no other
workaround is possible." Bug: T1148
router bgp 41495
no bgp default ipv4-unicast
neighbor 195.66.244.6 remote-as 6939
neighbor 195.66.244.6 soft-reconfiguration inbound
neighbor 195.66.244.6 maximum-prefix 200000
address-family ipv4
neighbor 195.66.244.6 prefix-list hphr-DFZ-IPv4 in
neighbor 195.66.244.6 prefix-list auto-AS-FAELIX out
Well, actually…
BGP C/P Protection
BGP C/P Protection
added to hphr
INTERLUDE
“ IS BGP SAFE YET ”
isbgpsafeyet.com
Pride comes…
…before a segfault
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x67) [0x7f43747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0x113) [0x7f4374798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x712e5) [0x7f43747b92e5]74798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890) [0x7f43735c2890]92e5]74798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x65) [0x5576bbe42415]92e5]74798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/modules/bgpd_rpki.so(+0x5042) [0x7f436fa1d042]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x60) [0x7f43747c6b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xd8) [0x7f43747965c8]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(main+0x2ff) [0x5576bbdecb4f]run+0xd8) [0x7f43747965c8]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f4373229b45]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(+0x3cb6c) [0x5576bbdeeb6c]_main+0xf5) [0x7f4373229b45]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:509#01242415); aborting...
…before a segfault
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x67) [0x7f43747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0x113) [0x7f4374798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x712e5) [0x7f43747b92e5]74798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890) [0x7f43735c2890]92e5]74798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x65) [0x5576bbe42415]92e5]74798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/modules/bgpd_rpki.so(+0x5042) [0x7f436fa1d042]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x60) [0x7f43747c6b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xd8) [0x7f43747965c8]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(main+0x2ff) [0x5576bbdecb4f]run+0xd8) [0x7f43747965c8]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f4373229b45]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(+0x3cb6c) [0x5576bbdeeb6c]_main+0xf5) [0x7f4373229b45]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:509#01242415); aborting...
… bgpd_sync_callback from bgpd/bgp_rpki.c
RPKI KICKED OUR BUTT!
…before a segfault
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x67) [0x7f43747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0x113) [0x7f4374798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x712e5) [0x7f43747b92e5]74798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890) [0x7f43735c2890]92e5]74798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x65) [0x5576bbe42415]92e5]74798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/modules/bgpd_rpki.so(+0x5042) [0x7f436fa1d042]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x60) [0x7f43747c6b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xd8) [0x7f43747965c8]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(main+0x2ff) [0x5576bbdecb4f]run+0xd8) [0x7f43747965c8]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f4373229b45]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(+0x3cb6c) [0x5576bbdeeb6c]_main+0xf5) [0x7f4373229b45]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:509#01242415); aborting...
… bgpd_sync_callback from bgpd/bgp_rpki.c
Fixed inFRRouting7.2.1
RPKI KICKED OUR BUTT!
…before a segfault
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x67) [0x7f43747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0x113) [0x7f4374798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x712e5) [0x7f43747b92e5]74798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890) [0x7f43735c2890]92e5]74798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x65) [0x5576bbe42415]92e5]74798833]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/modules/bgpd_rpki.so(+0x5042) [0x7f436fa1d042]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x60) [0x7f43747c6b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xd8) [0x7f43747965c8]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(main+0x2ff) [0x5576bbdecb4f]run+0xd8) [0x7f43747965c8]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f4373229b45]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(+0x3cb6c) [0x5576bbdeeb6c]_main+0xf5) [0x7f4373229b45]b00]]3747983d7]
Apr 17 08:22:42 aebi bgpd[1238]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:509#01242415); aborting...
… bgpd_sync_callback from bgpd/bgp_rpki.c
Fixed inFRRouting7.2.1
Shout out to @wolf480pl@mstdn.io for
brainstorming this with me to be sure!
RPKI KICKED OUR BUTT!
BACK TO
SCHEDULED CONTENT
…before an OOM
Sep 3 04:15:21 bly bgpd[1228]: [EC 100663313] SLOW THREAD: task bgpd_sync_callback (7fe7131dcd60) ran for 51696ms (cpu time 51688ms)
Sep 3 04:16:00 bly watchfrr[1167]: [EC 268435457] bgpd state -> unresponsive : no response yet to ping sent 90 seconds ago
Sep 3 04:16:00 bly watchfrr[1167]: [EC 100663303] Forked background command [pid 4836]: /usr/lib/frr/watchfrr.sh restart bgpd
Sep 3 04:16:12 bly bgpd[1228]: [EC 100663313] SLOW THREAD: task bgpd_sync_callback (7fe7131dcd60) ran for 50939ms (cpu time 50929ms)
Sep 3 04:16:12 bly bgpd[1228]: Terminating on signal
Sep 3 04:16:20 bly watchfrr[1167]: Warning: restart bgpd child process 4836 still running after 20 seconds, sending signal 15
Sep 3 04:16:20 bly watchfrr[1167]: restart bgpd process 4836 terminated due to signal 15
Sep 3 04:16:31 bly zebra[1224]: [EC 4043309117] Client 'vnc' encountered an error and is shutting down.
Sep 3 04:16:31 bly zebra[1224]: [EC 4043309117] Client 'bgp' encountered an error and is shutting down.
Sep 3 04:16:31 bly watchfrr[1167]: [EC 268435457] bgpd state -> down : unexpected read error: Connection reset by peer
Sep 3 04:16:31 bly zebra[1224]: client 30 disconnected. 0 vnc routes removed from the rib
Sep 3 04:16:31 bly zebra[1224]: zebra/zebra_ptm.c:1345 failed to find process pid registration
Sep 3 04:16:31 bly zebra[1224]: client 20 disconnected. 906341 bgp routes removed from the rib
Sep 3 04:17:21 bly watchfrr[1167]: [EC 100663303] Forked background command [pid 5576]: /usr/lib/frr/watchfrr.sh restart bgpd
Sep 3 04:17:21 bly zebra[1224]: client 20 says hello and bids fair to announce only bgp routes vrf=0
Sep 3 04:17:21 bly zebra[1224]: client 32 says hello and bids fair to announce only vnc routes vrf=0
Sep 3 04:17:21 bly watchfrr[1167]: bgpd state -> up : connect succeeded
restart bgpd process 4836 terminated due to signal 15
client 20 disconnected. 906341 bgp routes removed from the RIB
…BACK TO
SCHEDULED CONTENT…?
…before a segfault
Sep 3 09:32:16 aebi watchfrr[1302]: [EC 268435457] ospf6d state -> down : read
returned EOF
Sep 3 09:32:16 aebi zebra[1332]: [EC 4043309121] Client 'ospf6' encountered an
error and is shutting down.
Sep 3 09:32:16 aebi zebra[1332]: client 52 disconnected. 47 ospf6 routes removed
from the rib
Sep 3 09:32:21 aebi watchfrr[1302]: [EC 100663303] Forked background command [pid
11593]: /usr/lib/frr/watchfrr.sh restart ospf6d
Client 'ospf6' encountered an error and is shutting down.
client 52 disconnected. 47 ospf6 routes removed from the rib
…before a segfault
Sep 3 09:32:16 aebi watchfrr[1302]: [EC 268435457] ospf6d state -> down : read
returned EOF
Sep 3 09:32:16 aebi zebra[1332]: [EC 4043309121] Client 'ospf6' encountered an
error and is shutting down.
Sep 3 09:32:16 aebi zebra[1332]: client 52 disconnected. 47 ospf6 routes removed
from the rib
Sep 3 09:32:21 aebi watchfrr[1302]: [EC 100663303] Forked background command [pid
11593]: /usr/lib/frr/watchfrr.sh restart ospf6d
Client 'ospf6' encountered an error and is shutting down.
client 52 disconnected. 47 ospf6 routes removed from the rib
Fixed inFRRouting
7.5
FRR
#6735 + #6086
THIS STORY ISN'T OVER…
XXXTODO
Add features to FRR, provide VyOS workarounds, or
make Salt states for router reboot maintenance
Speed-up VyOS' prefix-list insertion into FRR T2425
VyOS to refresh config into FRR's daemons T2175
Fragility of VyOS firewalling when replaced with
barebones iptables from hphr
NIC parameter and bare metal router sysctl tuning
XDP (hopefully we get time to follow on from @atoonk)
What We Achieved
Two very long nights of maintenance (plus London)
Full table BGP on VyOS converge time in seconds
Routing on MikroTiks converges near-instantly
BCP38 (customers cannot spoof source address)
IRR filtering* (only accept where route/route6 object)
RPKI (will not accept invalid routes from P/T)
Templated configuration (repeatable, automated)
Single source of truth (the docs become the config)
VYOS & RPKI
AT THE
BGP AS EDGE
E: marek @ faelix . net
T: @maznu
E: lou @ faelix . net
T: @gremlinlou

More Related Content

What's hot

第53回WIT研究会におけるリアルタイム映像配信 -技術編-
第53回WIT研究会におけるリアルタイム映像配信 -技術編-第53回WIT研究会におけるリアルタイム映像配信 -技術編-
第53回WIT研究会におけるリアルタイム映像配信 -技術編-
Toshimitsu YAMAGUCHI
 

What's hot (16)

Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
ACI MultiPod Config Guide
ACI MultiPod Config GuideACI MultiPod Config Guide
ACI MultiPod Config Guide
 
SF Kubernetes Meetup Lightning Talk
SF Kubernetes Meetup Lightning TalkSF Kubernetes Meetup Lightning Talk
SF Kubernetes Meetup Lightning Talk
 
Cisco CCNA-Router on Stick
Cisco CCNA-Router on StickCisco CCNA-Router on Stick
Cisco CCNA-Router on Stick
 
ACI MultiPod 구성
ACI MultiPod 구성ACI MultiPod 구성
ACI MultiPod 구성
 
Cisco CCNA- NAT Configuration
Cisco CCNA- NAT ConfigurationCisco CCNA- NAT Configuration
Cisco CCNA- NAT Configuration
 
DMVPN configuration - Configuring Cisco dynamic Multipoint VPN - HUB, SPOKES,...
DMVPN configuration - Configuring Cisco dynamic Multipoint VPN - HUB, SPOKES,...DMVPN configuration - Configuring Cisco dynamic Multipoint VPN - HUB, SPOKES,...
DMVPN configuration - Configuring Cisco dynamic Multipoint VPN - HUB, SPOKES,...
 
Kubernetes networking-made-easy-with-open-v switch
Kubernetes networking-made-easy-with-open-v switchKubernetes networking-made-easy-with-open-v switch
Kubernetes networking-made-easy-with-open-v switch
 
OpenStack DVR_What is DVR?
OpenStack DVR_What is DVR?OpenStack DVR_What is DVR?
OpenStack DVR_What is DVR?
 
Indicaciones nota 4
Indicaciones nota 4Indicaciones nota 4
Indicaciones nota 4
 
How to Configure NetFlow v5 & v9 on Cisco Routers
How to Configure NetFlow v5 & v9 on Cisco RoutersHow to Configure NetFlow v5 & v9 on Cisco Routers
How to Configure NetFlow v5 & v9 on Cisco Routers
 
Basic BGP Configuration
Basic BGP ConfigurationBasic BGP Configuration
Basic BGP Configuration
 
Best Current Operational Practices - Dos, Don’ts and lessons learned
Best Current Operational Practices - Dos, Don’ts and lessons learnedBest Current Operational Practices - Dos, Don’ts and lessons learned
Best Current Operational Practices - Dos, Don’ts and lessons learned
 
Nat
NatNat
Nat
 
ACI MultiFabric 소개
ACI MultiFabric 소개ACI MultiFabric 소개
ACI MultiFabric 소개
 
第53回WIT研究会におけるリアルタイム映像配信 -技術編-
第53回WIT研究会におけるリアルタイム映像配信 -技術編-第53回WIT研究会におけるリアルタイム映像配信 -技術編-
第53回WIT研究会におけるリアルタイム映像配信 -技術編-
 

Similar to VYOS & RPKI at the BGP as edge

Avb pov 2017 v2
Avb pov 2017 v2Avb pov 2017 v2
Avb pov 2017 v2
Jeff Green
 

Similar to VYOS & RPKI at the BGP as edge (20)

Dynamische Routingprotokolle Aufzucht und Pflege - OSPF
Dynamische Routingprotokolle Aufzucht und Pflege - OSPFDynamische Routingprotokolle Aufzucht und Pflege - OSPF
Dynamische Routingprotokolle Aufzucht und Pflege - OSPF
 
Krzysztof Mazepa - IOS XR - IP Fast Convergence
Krzysztof Mazepa - IOS XR - IP Fast ConvergenceKrzysztof Mazepa - IOS XR - IP Fast Convergence
Krzysztof Mazepa - IOS XR - IP Fast Convergence
 
Getting started with IPv6
Getting started with IPv6Getting started with IPv6
Getting started with IPv6
 
6th floorsharingsession ep 1 - networking - arp v 1.0
6th floorsharingsession ep 1 - networking - arp v 1.06th floorsharingsession ep 1 - networking - arp v 1.0
6th floorsharingsession ep 1 - networking - arp v 1.0
 
Service Provider Networks and Frame Relay
Service Provider Networks and Frame RelayService Provider Networks and Frame Relay
Service Provider Networks and Frame Relay
 
Cfgmgmtcamp 2023 — eBPF Superpowers
Cfgmgmtcamp 2023 — eBPF SuperpowersCfgmgmtcamp 2023 — eBPF Superpowers
Cfgmgmtcamp 2023 — eBPF Superpowers
 
Cloud Traffic Engineer – Google Espresso Project by Shaowen Ma
Cloud Traffic Engineer – Google Espresso Project  by Shaowen MaCloud Traffic Engineer – Google Espresso Project  by Shaowen Ma
Cloud Traffic Engineer – Google Espresso Project by Shaowen Ma
 
Operationalizing BGP in the SDDC
Operationalizing BGP in the SDDCOperationalizing BGP in the SDDC
Operationalizing BGP in the SDDC
 
cisco-ws-c3560cx-12pd-s-datasheet.pdf
cisco-ws-c3560cx-12pd-s-datasheet.pdfcisco-ws-c3560cx-12pd-s-datasheet.pdf
cisco-ws-c3560cx-12pd-s-datasheet.pdf
 
Lan Network with Redundancy.ppt
Lan Network with Redundancy.pptLan Network with Redundancy.ppt
Lan Network with Redundancy.ppt
 
Lan Network with Redundancy
Lan Network with RedundancyLan Network with Redundancy
Lan Network with Redundancy
 
cisco-n3k-c3172tq-xl-datasheet.pdf
cisco-n3k-c3172tq-xl-datasheet.pdfcisco-n3k-c3172tq-xl-datasheet.pdf
cisco-n3k-c3172tq-xl-datasheet.pdf
 
cisco-n3k-c3172tq-32t-datasheet.pdf
cisco-n3k-c3172tq-32t-datasheet.pdfcisco-n3k-c3172tq-32t-datasheet.pdf
cisco-n3k-c3172tq-32t-datasheet.pdf
 
PhNOG 2019: RPKI Deployment Update
PhNOG 2019: RPKI Deployment UpdatePhNOG 2019: RPKI Deployment Update
PhNOG 2019: RPKI Deployment Update
 
cisco-ws-c4500x-16sfp+-datasheet.pdf
cisco-ws-c4500x-16sfp+-datasheet.pdfcisco-ws-c4500x-16sfp+-datasheet.pdf
cisco-ws-c4500x-16sfp+-datasheet.pdf
 
Testing PPT
Testing PPTTesting PPT
Testing PPT
 
Avb pov 2017 v2
Avb pov 2017 v2Avb pov 2017 v2
Avb pov 2017 v2
 
Cisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPICisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPI
 
07.bgp
07.bgp07.bgp
07.bgp
 
PLNOG 6: Robert Raszuk, Nana Ogawa - FIB table saving technique (with simple ...
PLNOG 6: Robert Raszuk, Nana Ogawa - FIB table saving technique (with simple ...PLNOG 6: Robert Raszuk, Nana Ogawa - FIB table saving technique (with simple ...
PLNOG 6: Robert Raszuk, Nana Ogawa - FIB table saving technique (with simple ...
 

More from Faelix Ltd

How we found a firewall vendor bug using Teleport as a bastion jump host
How we found a firewall vendor bug using Teleport as a bastion jump hostHow we found a firewall vendor bug using Teleport as a bastion jump host
How we found a firewall vendor bug using Teleport as a bastion jump host
Faelix Ltd
 

More from Faelix Ltd (6)

Bastion jump hosts with Teleport
Bastion jump hosts with TeleportBastion jump hosts with Teleport
Bastion jump hosts with Teleport
 
How we found a firewall vendor bug using Teleport as a bastion jump host
How we found a firewall vendor bug using Teleport as a bastion jump hostHow we found a firewall vendor bug using Teleport as a bastion jump host
How we found a firewall vendor bug using Teleport as a bastion jump host
 
The Story of CVE-2018-19299 - finding and reporting bugs in Mikrotik RouterOS v6
The Story of CVE-2018-19299 - finding and reporting bugs in Mikrotik RouterOS v6The Story of CVE-2018-19299 - finding and reporting bugs in Mikrotik RouterOS v6
The Story of CVE-2018-19299 - finding and reporting bugs in Mikrotik RouterOS v6
 
Keeping your rack cool with one "/IP route rule"
Keeping your rack cool with one "/IP route rule"Keeping your rack cool with one "/IP route rule"
Keeping your rack cool with one "/IP route rule"
 
MikroTik & RouterOS
MikroTik & RouterOSMikroTik & RouterOS
MikroTik & RouterOS
 
SDN, CMDB, NMS ...CRM! How we're putting the customer at the centre of our ne...
SDN, CMDB, NMS ...CRM! How we're putting the customer at the centre of our ne...SDN, CMDB, NMS ...CRM! How we're putting the customer at the centre of our ne...
SDN, CMDB, NMS ...CRM! How we're putting the customer at the centre of our ne...
 

Recently uploaded

Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
Asmae Rabhi
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
galaxypingy
 

Recently uploaded (20)

Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 

VYOS & RPKI at the BGP as edge

  • 1. SALT + NETBOX + VYOS = NETWORK AUTOMATION + ROUTING SECURITY MAREK ISALSKI — FAELIX faelix.link/vuknof3
  • 2. VyOS SaltStack YAML Netbox BGP OSPF FRR RPKI IRR XDP bgpq3 UTRS RTBH NetFlow MAREK ISALSKI — FAELIX faelix.link/vuknof3
  • 3. Who? Marek and Lou and Laura A bunch of @NetworkMoose Giants
  • 4. Who? Marek and Lou and Laura A bunch of @NetworkMoose Giants: FRR, Routinator 3000, VyOS+contributors, Linux, netfilter/iptables, Salt+contributors, RIPE+RIRs, Team Cymru, DigitalOcean+Netbox-contributors…
  • 5. The Story So Far… 2019-04: UKNOF43 MikroTik IPv6 routing vuln talk Hinted about AS41495's future plans
  • 6. SCANNING IPv6 ADDRESS SPACE… AND THE REMOTE VULNERABILITIES IT UNCOVERS https://faelix.link/uknof43
  • 7. The Story So Far… 2019-04: UKNOF43 MikroTik IPv6 routing vuln talk Hinted about AS41495's future plans 2019-10: NetMcr40 post-migration tech talk Mentioned shortcomings and follow-on actions 2020-04: UKNOF46 talk on operational experience …postponed till next time?
  • 8. The Story So Far… 2019-04: UKNOF43 MikroTik IPv6 routing vuln talk Hinted about AS41495's future plans 2019-10: NetMcr40 post-migration tech talk Mentioned shortcomings and follow-on actions 2020-04: UKNOF46 talk on operational experience …postponed till next time? no time like the present!
  • 9. Where? This story all took place in: Reynolds House (Equinix MA2) Williams House (Equinix MA1) …and then "laps of honour" in: AQL DC2 Telehouse West and North Interxion LON2
  • 10. Why?
  • 11. Why? actually involved MX routers, but cautionary tale applies to Cisco 7600/6500 and MikroTik CCR too!
  • 12. Why? RIPE NCC Update 2019-10-02 Today we allocated the last of our contiguous /22 IPv4 address blocks. We still have approximately one million addresses available, in the form of /23s and /24s, and we will continue making /22- equivalent allocations made up of these smaller blocks. Once we can no longer allocate the equivalent of a /22, we will announce that we have reached run-out. We expect this to occur in November 2019.
  • 15. Why?
  • 16. Why?
  • 19. OSPF Area 0 How? DFZ 800k routes800k routes default! default!
  • 21. How? 4x 10G SFP+ ports (must not be RJ45, cannot be QSFP+ with SFP+ breakout) At least 2x 1G RJ45 ports Low CPU power draw Prefer GHz over cores (single-thread performance) ~16 Gb RAM Ideally 1U Ideally dual PSU and IPMI
  • 22. How? SM Chassis: Front I/O, Redundant 400W PSUs SM X11SCM-F (2x RJ45 Intel® I210-AT + OOB/IPMI) Xeon E3-1265Lv5 (4c/8t, 2.9GHz, 45W) 16GB DDR4 2666 ECC, 2x 240Gb SSDs Intel X710-DA4 (4x SFP+ NIC)
  • 23. How?
  • 24. How?
  • 26. VyOS Linux FRR (Quagga fork) for routing protocols iproute2, iptables, ipset, etc
  • 27. VyOS Linux FRR (Quagga fork) for routing protocols iproute2, iptables, ipset, etc VyOS 1.3: planned XDP (eXpress Data Path) support Like DPDK / FDIO / VPP (but less tap/vif mess) Up towards 20Mpps per CPU core…!
  • 28. SaltStack Big investment in Salt @faelix Use it for bare metal (virtualisation clusters) Use it for our ISP services and NFVs Use it for full-/part-managed customer VPSs …why not use it for routers? VyOS includes salt-minion (agent)
  • 29. Introducing: hphr "Halophile Router" (salt-loving) Uses Netbox for single source-of-truth Uses SLS (YAML) for anything not possible in Netbox Generates a VyOS router configuration file Loads, compares, commits, saves to config.boot Released open-source: github.com/faelix/hphr
  • 34. Netbox ≈ "physical infra" Using SaltStack's "pillar" in two ways: Netbox for DCIM / IPAM SLS (YAML-syntax) files for OSPF, BGP, etc
  • 40. BCP38 Implementation Throws away entire VyOS firewall stack Native commands are off-limits fragility Currently using iptables Will transition to nft it's the future, offload, etc Eventually move this into XDP Along with routing decision more PPS
  • 44. BGP Peering bgpq3 AS-LLNW | wc 50436 252179 2056972
  • 45. BGP Peering “aggregate” bgpq3 -A AS-LLNW | wc 18425 106080 792306
  • 46. BGP Peering bgpq3 -A AS-LLNW | wc 18425 106080 792306 wc /config/config.boot 558568 “aggregate” Long Bootup Times!
  • 51. BGP via PeeringDB (v2) peering by ASN copy-paste is dead
  • 52. BGP via PeeringDB (v2) peeringdb API via salt
  • 53. BGP via PeeringDB (v2) defaults overrides
  • 61. BGP C/P Protection bgpd state -> unresponsive : no response yet to ping sent 90 seconds ago
  • 63. BGP C/P Protection 847177 bgp routes removed from the rib
  • 64. BGP C/P Protection 847177 bgp routes removed from the rib bgpd restarting, we're gonna be ok?!
  • 66. bly$ show ip bgp summary % BGP instance not found bly$
  • 67. Architecture of VyOS protocols { bgp 41495 { ……… } } router bgp 41495 ………"VyOS" /config/config.boot FRR running-config
  • 68. Architecture of VyOS protocols { bgp 41495 { ……… } } router bgp 41495 ………"VyOS" /config/config.boot FRR running-config https://phabricator.vyos.net/ T1514
  • 69. So: Reboot the Router https://phabricator.vyos.net/T1514 FRR's running-config not "refreshable" if a daemon crashes, can only reboot router to make VyOS send new running-config or: delete protocols bgp 41495, commit, load /config/config.boot, commit Rebooting a router is annoying, but we're in a maintenance window, the other router is ok, so we're gonna be ok.
  • 71.
  • 72.
  • 73. FRR Converges BGP Fast Router booted up Began configuring itself Brought interfaces up Brought BGP instance up Cogent transit session established Peering sessions on LINX established …then …?
  • 74. FRR Converges BGP Fast Router booted up Began configuring itself Brought interfaces up Brought BGP instance up Cogent transit session established Peering sessions on LINX established …then VyOS applied prefix-lists and route-maps
  • 75. FRR Converges BGP Fast Router booted up Began configuring itself Brought interfaces up Brought BGP instance up Cogent transit session established Peering sessions on LINX established …then VyOS applied prefix-lists and route-maps :-(
  • 76. FRR Converges BGP Fast Router booted up Began configuring itself Brought interfaces up Brought BGP instance up Cogent transit session established Peering sessions on LINX established …then VyOS applied prefix-lists and route-maps :-(
  • 77. ANNOUNCING 800K ROUTES TO LINX MAN (BY ACCIDENT)
  • 78. ANNOUNCING 800K ROUTES TO LINX MAN (BY ACCIDENT) PROVIDING TRANSIT TO TIER-1 ASNs Achievement Unlocked :(
  • 79. ANNOUNCING 800K ROUTES TO LINX MAN (BY ACCIDENT) T1698 + T1148 + T944 BUGS! Achievement Unlocked :(
  • 80. FRR Impedance Mismatch router bgp 41495 neighbor 195.66.244.6 remote-as 6939 neighbor 195.66.244.6 soft-reconfiguration inbound neighbor 195.66.244.6 maximum-prefix 200000 neighbor 195.66.244.6 prefix-list hphr-DFZ-IPv4 in neighbor 195.66.244.6 prefix-list auto-AS-FAELIX out neighbor 195.66.244.6 route-map AS6939-LINX-in-IPv4 in neighbor 195.66.244.6 route-map TRANSIT-out-IPv4 out
  • 81. FRR Impedance Mismatch router bgp 41495 neighbor 195.66.244.6 remote-as 6939 neighbor 195.66.244.6 soft-reconfiguration inbound neighbor 195.66.244.6 maximum-prefix 200000 neighbor 195.66.244.6 prefix-list hphr-DFZ-IPv4 in neighbor 195.66.244.6 prefix-list auto-AS-FAELIX out neighbor 195.66.244.6 route-map AS6939-LINX-in-IPv4 in neighbor 195.66.244.6 route-map TRANSIT-out-IPv4 out neighbor shutdown no shutdown
  • 82. FRR Impedance Mismatch "Since FRR lacks a command for creating peers in down state, no other workaround is possible." Bug: T1148 router bgp 41495 neighbor 195.66.244.6 remote-as 6939 neighbor 195.66.244.6 soft-reconfiguration inbound neighbor 195.66.244.6 maximum-prefix 200000 neighbor 195.66.244.6 prefix-list hphr-DFZ-IPv4 in neighbor 195.66.244.6 prefix-list auto-AS-FAELIX out neighbor 195.66.244.6 route-map AS6939-LINX-in-IPv4 in neighbor 195.66.244.6 route-map TRANSIT-out-IPv4 out
  • 83. FRR Impedance Mismatch "Since FRR lacks a command for creating peers in down state, no other workaround is possible." Bug: T1148 router bgp 41495 no bgp default ipv4-unicast neighbor 195.66.244.6 remote-as 6939 neighbor 195.66.244.6 soft-reconfiguration inbound neighbor 195.66.244.6 maximum-prefix 200000 address-family ipv4 neighbor 195.66.244.6 prefix-list hphr-DFZ-IPv4 in neighbor 195.66.244.6 prefix-list auto-AS-FAELIX out Well, actually…
  • 86. INTERLUDE “ IS BGP SAFE YET ”
  • 89. …before a segfault Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x67) [0x7f43747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0x113) [0x7f4374798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x712e5) [0x7f43747b92e5]74798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890) [0x7f43735c2890]92e5]74798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x65) [0x5576bbe42415]92e5]74798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/modules/bgpd_rpki.so(+0x5042) [0x7f436fa1d042]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x60) [0x7f43747c6b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xd8) [0x7f43747965c8]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(main+0x2ff) [0x5576bbdecb4f]run+0xd8) [0x7f43747965c8]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f4373229b45]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(+0x3cb6c) [0x5576bbdeeb6c]_main+0xf5) [0x7f4373229b45]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:509#01242415); aborting...
  • 90. …before a segfault Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x67) [0x7f43747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0x113) [0x7f4374798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x712e5) [0x7f43747b92e5]74798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890) [0x7f43735c2890]92e5]74798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x65) [0x5576bbe42415]92e5]74798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/modules/bgpd_rpki.so(+0x5042) [0x7f436fa1d042]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x60) [0x7f43747c6b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xd8) [0x7f43747965c8]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(main+0x2ff) [0x5576bbdecb4f]run+0xd8) [0x7f43747965c8]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f4373229b45]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(+0x3cb6c) [0x5576bbdeeb6c]_main+0xf5) [0x7f4373229b45]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:509#01242415); aborting... … bgpd_sync_callback from bgpd/bgp_rpki.c RPKI KICKED OUR BUTT!
  • 91. …before a segfault Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x67) [0x7f43747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0x113) [0x7f4374798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x712e5) [0x7f43747b92e5]74798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890) [0x7f43735c2890]92e5]74798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x65) [0x5576bbe42415]92e5]74798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/modules/bgpd_rpki.so(+0x5042) [0x7f436fa1d042]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x60) [0x7f43747c6b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xd8) [0x7f43747965c8]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(main+0x2ff) [0x5576bbdecb4f]run+0xd8) [0x7f43747965c8]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f4373229b45]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(+0x3cb6c) [0x5576bbdeeb6c]_main+0xf5) [0x7f4373229b45]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:509#01242415); aborting... … bgpd_sync_callback from bgpd/bgp_rpki.c Fixed inFRRouting7.2.1 RPKI KICKED OUR BUTT!
  • 92. …before a segfault Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x67) [0x7f43747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0x113) [0x7f4374798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x712e5) [0x7f43747b92e5]74798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890) [0x7f43735c2890]92e5]74798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x65) [0x5576bbe42415]92e5]74798833]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/modules/bgpd_rpki.so(+0x5042) [0x7f436fa1d042]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x60) [0x7f43747c6b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xd8) [0x7f43747965c8]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(main+0x2ff) [0x5576bbdecb4f]run+0xd8) [0x7f43747965c8]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f4373229b45]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: /usr/lib/frr/bgpd(+0x3cb6c) [0x5576bbdeeb6c]_main+0xf5) [0x7f4373229b45]b00]]3747983d7] Apr 17 08:22:42 aebi bgpd[1238]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:509#01242415); aborting... … bgpd_sync_callback from bgpd/bgp_rpki.c Fixed inFRRouting7.2.1 Shout out to @wolf480pl@mstdn.io for brainstorming this with me to be sure! RPKI KICKED OUR BUTT!
  • 94. …before an OOM Sep 3 04:15:21 bly bgpd[1228]: [EC 100663313] SLOW THREAD: task bgpd_sync_callback (7fe7131dcd60) ran for 51696ms (cpu time 51688ms) Sep 3 04:16:00 bly watchfrr[1167]: [EC 268435457] bgpd state -> unresponsive : no response yet to ping sent 90 seconds ago Sep 3 04:16:00 bly watchfrr[1167]: [EC 100663303] Forked background command [pid 4836]: /usr/lib/frr/watchfrr.sh restart bgpd Sep 3 04:16:12 bly bgpd[1228]: [EC 100663313] SLOW THREAD: task bgpd_sync_callback (7fe7131dcd60) ran for 50939ms (cpu time 50929ms) Sep 3 04:16:12 bly bgpd[1228]: Terminating on signal Sep 3 04:16:20 bly watchfrr[1167]: Warning: restart bgpd child process 4836 still running after 20 seconds, sending signal 15 Sep 3 04:16:20 bly watchfrr[1167]: restart bgpd process 4836 terminated due to signal 15 Sep 3 04:16:31 bly zebra[1224]: [EC 4043309117] Client 'vnc' encountered an error and is shutting down. Sep 3 04:16:31 bly zebra[1224]: [EC 4043309117] Client 'bgp' encountered an error and is shutting down. Sep 3 04:16:31 bly watchfrr[1167]: [EC 268435457] bgpd state -> down : unexpected read error: Connection reset by peer Sep 3 04:16:31 bly zebra[1224]: client 30 disconnected. 0 vnc routes removed from the rib Sep 3 04:16:31 bly zebra[1224]: zebra/zebra_ptm.c:1345 failed to find process pid registration Sep 3 04:16:31 bly zebra[1224]: client 20 disconnected. 906341 bgp routes removed from the rib Sep 3 04:17:21 bly watchfrr[1167]: [EC 100663303] Forked background command [pid 5576]: /usr/lib/frr/watchfrr.sh restart bgpd Sep 3 04:17:21 bly zebra[1224]: client 20 says hello and bids fair to announce only bgp routes vrf=0 Sep 3 04:17:21 bly zebra[1224]: client 32 says hello and bids fair to announce only vnc routes vrf=0 Sep 3 04:17:21 bly watchfrr[1167]: bgpd state -> up : connect succeeded restart bgpd process 4836 terminated due to signal 15 client 20 disconnected. 906341 bgp routes removed from the RIB
  • 96. …before a segfault Sep 3 09:32:16 aebi watchfrr[1302]: [EC 268435457] ospf6d state -> down : read returned EOF Sep 3 09:32:16 aebi zebra[1332]: [EC 4043309121] Client 'ospf6' encountered an error and is shutting down. Sep 3 09:32:16 aebi zebra[1332]: client 52 disconnected. 47 ospf6 routes removed from the rib Sep 3 09:32:21 aebi watchfrr[1302]: [EC 100663303] Forked background command [pid 11593]: /usr/lib/frr/watchfrr.sh restart ospf6d Client 'ospf6' encountered an error and is shutting down. client 52 disconnected. 47 ospf6 routes removed from the rib
  • 97. …before a segfault Sep 3 09:32:16 aebi watchfrr[1302]: [EC 268435457] ospf6d state -> down : read returned EOF Sep 3 09:32:16 aebi zebra[1332]: [EC 4043309121] Client 'ospf6' encountered an error and is shutting down. Sep 3 09:32:16 aebi zebra[1332]: client 52 disconnected. 47 ospf6 routes removed from the rib Sep 3 09:32:21 aebi watchfrr[1302]: [EC 100663303] Forked background command [pid 11593]: /usr/lib/frr/watchfrr.sh restart ospf6d Client 'ospf6' encountered an error and is shutting down. client 52 disconnected. 47 ospf6 routes removed from the rib Fixed inFRRouting 7.5 FRR #6735 + #6086
  • 98. THIS STORY ISN'T OVER…
  • 99. XXXTODO Add features to FRR, provide VyOS workarounds, or make Salt states for router reboot maintenance Speed-up VyOS' prefix-list insertion into FRR T2425 VyOS to refresh config into FRR's daemons T2175 Fragility of VyOS firewalling when replaced with barebones iptables from hphr NIC parameter and bare metal router sysctl tuning XDP (hopefully we get time to follow on from @atoonk)
  • 100. What We Achieved Two very long nights of maintenance (plus London) Full table BGP on VyOS converge time in seconds Routing on MikroTiks converges near-instantly BCP38 (customers cannot spoof source address) IRR filtering* (only accept where route/route6 object) RPKI (will not accept invalid routes from P/T) Templated configuration (repeatable, automated) Single source of truth (the docs become the config)
  • 101. VYOS & RPKI AT THE BGP AS EDGE E: marek @ faelix . net T: @maznu E: lou @ faelix . net T: @gremlinlou