LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloud
1. LAN, WAN, SAN upgrades.
Hyperconverged vs traditional vs cloud etc
Experiences of two merged colleges in Wales
Simon Palmer, Head of IT development, Coleg Sir Gar
2. • communities - like networkshop
• Member of the TTP thettp.org
• ITSYSMAN (FE IT Systems Managers, Wales)
• And conf: https://gregynogconference.wordpress.com/
• Linux, OSS, ethernet, IoT, wifi, IDM/AM, SAML, 802.1x etc
• Follow UKNOF, UKNOT and lots of jiscmail lists.
About me
2
Interest in:
3. Coleg Sir Gar and Coleg Ceredigion
3
•1,000 staff
•10,000 learners including
•14 -16 GCSE school links
•16-19 FE
•19+ HE
•Work Based Learning and
apprenticeships
4. Coleg Sir Gar and Coleg Ceredigion
4
•5 Coleg Sir Gar campuses
•“Coleg = college”, “Sir” = like “shire”,
“Gar” is “sea”
•2 Coleg Ceredigion campuses
•Aberystwyth and Cardigan (Welsh west
coast towns)
•Coleg Sir Gar merged with UWTSD
•Coleg Ceredigion merged with UWTSD
•Coleg Sir Gar merged with Coleg
Ceredigion
5. Coleg Sir Gar
5
•Development team:
•Me
•Sysadmin & php, linux
•2 x web developers (php, .net, java, sql, linux)
•Graphics/CSS/websites/content
•Support team:
•7 tech’s - recruiting now - on www.fejobs.com
6. Things we support
6
•7 Campuses
•2200 Windows 10 (LTSB)
•300 Macs OSX
•1000 Chromebooks
•300 Windows 10 Laptops
•200 iPads
•270 Wifi Access points (around 2.5k concurrent) (Aruba instant)
•91 “stacks” of switches (ExtremeNetworks x440 10G)
•ExtremeNetworks X670/x690 core, dual home switches and 20 VMWare/XEN hosts
•200 VMs (80% Linux, 20% Windows)
7. Things we support
7
•Mobile Device Management MDM Mobile Iron (previously Airwatch)
•Windows management with Zenworks
•OSX management with Jamf
•Finance, Personnel, Student Records platforms
•email, file/print, licensing, mobile phones, Moodle, Google G Suite,
Office 365, all software etc.
•Building Management BMS, Solar panel monitoring etc
8. Requirements
8
•Improve bandwidth for teaching & learning
•Google Drive Stream, and Onedrive/Sharepoint (vs USB!)
•Reduce single points of failure (No SAN at remote sites)
•Simplify - reduce maintenance time, OOH maintenance.
•Replace old kit with supported, secure, faster etc.
•Improve backup and restore time.
•Consolidate our HA and DR strategy to try and reduce cost, and
improve RPO/RTO etc
•Allow for future SIP (VoIP) trunks
9. Security Certifications
9
•ISO 27001
•Cyber Essentials Plus recently
•Improved change management (local Gitlab)
•Pen testing end user devices only (not servers/network)
•2 weeks to patch!
10. Project 1: Compute and SAN
•2 x HP C7000 Blade enclosures (4 active, 4 backup blades)
•256G, Vsphere 6.5, AMD 2x16 core
•Flex 10, Brocade 8GB Fibre Channel (FC)
•10 servers on 4 other sites (6-10 years old)
•2 x servers per site, running Xen hypervisor (not vmware)
•DR computing environment (4 x DL385s)
•2 x HP 3Par FC SANs 40TB
•2 x 9 year old FC Hitachi HDS 2300 SANs 18TB
•WAN…
•Almost everything really!!
10
Replace EOL kit:
11. Compute/SAN Project Options:
•Move all compute/storage to AWS/Azure/GCP
•Replace like for like
•Hyperconverged!?
•Simplify?
•Openstack, Ovirt, Red Hat, etc.
11
12. Disaster Recovery vs/& Business Continuity
•DC1 and DC2 at site 1
•DC1: 3Par Fibre Channel, HP C7000’s, 10G Flex10, 4 x BL465
G8, 256G
•DC2: 3Par Fibre Channel, HP C7000’s, 10G Flex10, 4 x BL465
G8, 256G
•DC3 at site 2
•Dell Compellent, 4 x HP DL385 servers, 10G ethernet
•Veeam replication and backup
•2 x 10G Infortrend 10G iSCSI SAN for Veeam and rsync backups
12
13. Project 2: WAN
13
•10 years ago, 100M fibre (OpenReach ethernet) circuit ring
•5 years ago, move to PSBA MPLS 100M/1000M circuits
(reduced cost)
•Now, consider 10G WAN, MPLS is too expensive, back to fibre!
•Except fibre links to Aberystwyth or Cardigan too far
14. PSBA (Public Sector Network in Wales)
14
•PSBA = “Public Sector Broadband Aggregation”
•Awarded to BT in 2017? (Used to be run by Logicalis)
•Health, Councils, FE, HE, Police, etc
•All Cisco based
•Local backhauls have been traditionally 1G, recently 10G
•CSG was DDoS’d in 2014 14-17Gbps broke lots of West Wales
•(We caught Daniel Kelley, due sentencing soon…)
15. Project 3: Janet connection upgrade
15
•PSBA CPE equipment EOL from Cisco
•1G primary, 1G backup
•PSBA said 4G over 10G bearer (backhauls being 10G)
•Web filtering (iBoss)
•Firewall (PfSense)
•Want to move to BGP (OSPF currently and HSRP)
•Thinking about using quagga on PfSense vs Extreme Networks
BGP
•HRSP means both connections flap when attacked.
•HSRP has a single point of failure in our design (our link)
16. Why so much bandwidth?!
16
•Cloud! Google Drive Stream, Onedrive, Sharepoint.
•(We’ve been holding back)
•Updates! (Out of hours patching - 20Gbps at HQ)
•Adobe patching!
•Accidental patching, mismanaged/no QOS.
•Locally: Imaging, software distribution, patching etc
•(Our record for imaging is 1000 Windows 10 devices/day).
•Even some Android/IOS updates are now 1GB+
•Wifi usage is 1/3 of total traffic, pictures, video uploads. And
17. So, decisions:
17
•Weighed up costs:
•No need for servers at remote site IF “resilient” 10G/20G
•10G MPLS too expensive (because of NTE costs), so move back
to openreach 10G P2P
•So we will build a “ring” around Carmarthenshire
•AWS/Azure/GCP pricing worked out quite expensive… (scary)
•Some SAN/HCI vendor pricing was even more scary!
•£120K - £500+K
18. Telecoms
18
•All VOIP, except PRI/ISDN30 inbound links
•DDoS in 2013-2014 reduced our confidence in cloud SIP
•We found the attacker’s home IP by matching netflow, HTTP and
full logging of DNS queries around attack times, included using
the eDNS “Client Subnet” and PowerDNS.
19. Pricing openreach circuits
19
•Really transparent pricing (£5.5k install, £5.5k annual)
•Portal gives a good shopping cart idea of what’s available
•Resilience, Optical Spectrum Access (OSA) etc
•More cost effective than we expected
20. Connecting dual 10G circuits at each site
20
•Extreme X670’s (48 SFP+) at all sites, 10G stacks (SM&MM)
•Improve with X690’s at HQ and DR, (re-use X670’s at CC)
•Gives us 4 x QSFP28’s (breakouts to 16 x 25Gbps) per switch
•Or 4 x 100G ports/switch
•2 switches at each site
21. Where to put a cluster quorum?
21
•Reduce “split brain” risk as much as possible
•Does network team talk to storage team?
•Quorum on cloud?
•Quorum on 3rd site?
•Quorum at DR site?
•Quorum at HQ seems to make sense
•HQ has 50+% of people
•HQ has business critical services locally
•Still some SPOF areas, ducts, BT exchange etc.
25. UDP QOS by udp/physical port
25
•WOL
•PXE
•TFTP
•DHCP
•VOIP
•Priority all UDP from VM host!
26. MPLS vs circuits vs fibre
26
•MPLS -
•expensive Network Termination Equipment (NTE) (£20K+)
•MPLS +
•Cloud type architecture
•Cheaper for long distance links than fibre
32. Janet network connection upgrade
32
•2 sites with internet breakout
•Web filtering (iBoss)
•Firewall (PfSense)
•Want to move to BGP (OSPF
currently)
•Thinking about using quagga on
PfSense vs Extreme Networks BGP