2. Agenda
2
Hadoop and Related Network
Yahoo! JAPAN’s Hadoop Network Transition
Network Related Problems and Solutions
Network Related Problems
Network Requirements of The Latest Cluster
Adopted IP CLOS Network for Solving Problems
Yahoo! JAPAN’s IP CLOS Network
Architecture
Performance Tests
New Problems
Future Plan
4. Hadoop and Related Network
4
Hadoop has various communication events
Heartbeat
Reports (Job/Block/Resource)
Block Data Transfer
“HDFS Architecture“. Apache Hadoop.
http://hadoop.apache.org/docs/current/hadoop-project-
dist/hadoop-hdfs/HdfsDesign.html. (10/06/2016).
“Google I/O 2011: App Engine MapReduce”. (05/11/2011).
Retrieved https://www.youtube.com/watch?v=EIxelKcyCC0.
(10/06/2016).
5. Hadoop and Related Network
5
Hadoop has various communication events
Heartbeat
Reports (Job/Block/Resource)
Block Data Transfer
“HDFS Architecture“. Apache Hadoop.
http://hadoop.apache.org/docs/current/hadoop-project-
dist/hadoop-hdfs/HdfsDesign.html. (10/06/2016).
“Google I/O 2011: App Engine MapReduce”. (05/11/2011).
Retrieved https://www.youtube.com/watch?v=EIxelKcyCC0.
(10/06/2016).
6. Hadoop and Related Network
6
Hadoop has various communication events
Heartbeat
Reports (Job/Block/Resource)
Block Data Transfer
North/South
7. Hadoop and Related Network
7
Hadoop has various communication events
Heartbeat
Reports (Job/Block/Resource)
Block Data Transfer
East/West
8. Hadoop and Related Network
8
Hadoop has various communication events
Heartbeat
Reports (Job/Block/Resource)
Block Data Transfer
High
Low
9. Hadoop and Related Network
9
“Introduction to Facebook‘s data center fabric”. (11/14/2014). Retrieved
https://www.youtube.com/watch?v=mLEawo6OzFM. (10/06/2016).
10. Hadoop and Related Network
10
Oversubscription
commonly expressed as a ratio of the amount of desired bandwidth required
versus bandwidth available
10Gbps
1Gbps NIC 40Nodes
= 40Gbps
Oversubscription
40 : 10 = 4 : 1
“Hadoop Operations by Eric Sammer (O’Reilly). Copyright 2012 Eric Sammer, 978-1-449-32705-7.”
36. Network Related Problems
36
Effect of switch failure in the Stack Architecture
Load on the switch due to BUM Traffic
Limitations for the DataNode Decommission
Limitations for the Scale-out
37. 37
Effect of switch failure in the Stack Architecture
One of the switches which formed
the Stack failed
This affected the other switches
forming the same Stack
Communication interruption
among 90 nodes(5 racks)
insufficient computing resources
and processing stoppage
Network Related Problems
38. 38
Load on the switch due to BUM Traffic
L2 Fabric
… …
4400Nodes
Due to ARP traffic from servers,
load on the core switch CPU
increases
Tuning of ARP Cache entry
timeout
The problem is Large Network
Address
Network Related Problems
39. 39
Limitations for the DataNode Decommission
Network Related Problems
Consideration of the impact on
jobs
Limiting the number of nodes
for Decommissioning
40. 40
Limitations for the Scale-out
Stack Architecture
Up to ~10 switches
L2 Fabric Architecture
Depending on the number of
chassis
Network Related Problems
44. Adopted IP CLOS Network For Solving Problems
44
Google, Facebook, Amazon, Yahoo…
Over The Top have adopted
DC network architecture
“Introducing data center fabric, the next-generation Facebook data center network”. Facebook
Code. https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-
next-generation-facebook-data-center-network/. (10/06/2016).
45. Adopted IP CLOS Network For Solving Problems
45
Improved scalability
Improved high availability
Cope-Up with increase in East-West traffic
Reduction in operating cost
50. Why was this architecture adopted?
Reduce in items to be managed
IP address and cable, Interface, BGP Neighbor…..
Overcomes the physical constraints,
such as one floor limit
Reduction in cost
Architecture
51. ECMP
Between Spine and Leaf is BGP
51
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
BGP
Architecture
59. Architecture
59
Effect of switch failure in the Stack Architecture
Load on the switch due to BUM Traffic
Limitations for the DataNode Decommission
Limitations for the Scale-out
60. Architecture
60
Effect of switch failure in the Stack Architecture
Load on the switch due to BUM Traffic
Limitations for the DataNode Decommission
Limitations for the Scale-out
✔
✔
✔
66. New Problems
66
Delay in data transfer
Out of 4, 1 error packet is generated in Uplink
That one affected the data transfer delay
Slow
67. New Problems
67
Delay in data transfer
Out of 4, 1 error packet is generated in Uplink
That one affected the data transfer delay
“org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror”
Slow
68. New Problems
68
Delay in data transfer
Out of 4, 1 error packet is generated in Uplink
That one affected the data transfer delay
“org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror”
Slow
69. New Problems
69
IP changes when the server rack changes
Also has a network address for each rack
Access control using IP address
Requires ACL update according to relocation
192.168.0.0/26 192.168.0.64/26
192.168.0.10 192.168.0.100