SlideShare a Scribd company logo
1 of 84
Download to read offline
2016/10/27
1
Kai Fukazawa, Yahoo Japan Corporation
Network for the Large-scale
Hadoop cluster at Yahoo! JAPAN
Agenda
2
Hadoop and Related Network
Yahoo! JAPAN’s Hadoop Network Transition
Network Related Problems and Solutions
 Network Related Problems
 Network Requirements of The Latest Cluster
 Adopted IP CLOS Network for Solving Problems
Yahoo! JAPAN’s IP CLOS Network
 Architecture
 Performance Tests
 New Problems
Future Plan
Hadoop and Related Network
Hadoop and Related Network
4
 Hadoop has various communication events
 Heartbeat
 Reports (Job/Block/Resource)
 Block Data Transfer
“HDFS Architecture“. Apache Hadoop.
http://hadoop.apache.org/docs/current/hadoop-project-
dist/hadoop-hdfs/HdfsDesign.html. (10/06/2016).
“Google I/O 2011: App Engine MapReduce”. (05/11/2011).
Retrieved https://www.youtube.com/watch?v=EIxelKcyCC0.
(10/06/2016).
Hadoop and Related Network
5
 Hadoop has various communication events
 Heartbeat
 Reports (Job/Block/Resource)
 Block Data Transfer
“HDFS Architecture“. Apache Hadoop.
http://hadoop.apache.org/docs/current/hadoop-project-
dist/hadoop-hdfs/HdfsDesign.html. (10/06/2016).
“Google I/O 2011: App Engine MapReduce”. (05/11/2011).
Retrieved https://www.youtube.com/watch?v=EIxelKcyCC0.
(10/06/2016).
Hadoop and Related Network
6
 Hadoop has various communication events
 Heartbeat
 Reports (Job/Block/Resource)
 Block Data Transfer
North/South
Hadoop and Related Network
7
 Hadoop has various communication events
 Heartbeat
 Reports (Job/Block/Resource)
 Block Data Transfer
East/West
Hadoop and Related Network
8
 Hadoop has various communication events
 Heartbeat
 Reports (Job/Block/Resource)
 Block Data Transfer
High
Low
Hadoop and Related Network
9
“Introduction to Facebook‘s data center fabric”. (11/14/2014). Retrieved
https://www.youtube.com/watch?v=mLEawo6OzFM. (10/06/2016).
Hadoop and Related Network
10
 Oversubscription
 commonly expressed as a ratio of the amount of desired bandwidth required
versus bandwidth available
10Gbps
1Gbps NIC 40Nodes
= 40Gbps
Oversubscription
40 : 10 = 4 : 1
“Hadoop Operations by Eric Sammer (O’Reilly). Copyright 2012 Eric Sammer, 978-1-449-32705-7.”
Yahoo! JAPAN’s
Hadoop Network Transition
12
Yahoo! JAPAN’s Hadoop Network Transition
0
10
20
30
40
50
60
70
80
Cluster1
(Jun. 2011)
Cluster2
(Jan. 2013)
Cluster3
(Apr. 2014)
Cluster4
(Dec. 2015)
Cluster5
(Jun. 2016)
PB Cluster Volume
13
Yahoo! JAPAN’s Hadoop Network Transition
Cluster1
Stack Architecture
Nodes/Rack
Server NIC
UpLink
Oversubscription
14
Yahoo! JAPAN’s Hadoop Network Transition
20G
Cluster1
4 Switches/Stack
Stack Architecture
Nodes/Rack
Server NIC
UpLink
Oversubscription
15
Yahoo! JAPAN’s Hadoop Network Transition
Cluster1
Stack Architecture
Nodes/Rack 90Nodes
Server NIC 1Gbps
UpLink
Oversubscription
16
Yahoo! JAPAN’s Hadoop Network Transition
Cluster1
Stack Architecture
Nodes/Rack 90Nodes
Server NIC 1Gbps
UpLink
Oversubscription
17
Yahoo! JAPAN’s Hadoop Network Transition
Cluster1
Stack Architecture
Nodes/Rack 90Nodes
Server NIC 1Gbps
UpLink 20Gbps
Oversubscription20Gbps
18
Yahoo! JAPAN’s Hadoop Network Transition
20Gbps
Cluster1
Stack Architecture
Nodes/Rack 90Nodes
Server NIC 1Gbps
UpLink 20Gbps
Oversubscription 4.5 : 1
19
Yahoo! JAPAN’s Hadoop Network Transition
20Gbps
Cluster1
Stack Architecture
Nodes/Rack 90Nodes
Server NIC 1Gbps
UpLink 20Gbps
Oversubscription 4.5 : 1
Up to ~10 switches
20
…
Cluster2
Yahoo! JAPAN’s Hadoop Network Transition
Spanning Tree Protocol
Nodes/Rack
Server NIC
UpLink
Oversubscription
21
…
Cluster2
Yahoo! JAPAN’s Hadoop Network Transition
Spanning Tree Protocol
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink
Oversubscription
22
Yahoo! JAPAN’s Hadoop Network Transition
…
Cluster2
Spanning Tree Protocol
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink
Oversubscription
23
Yahoo! JAPAN’s Hadoop Network Transition
…
Cluster2
Spanning Tree Protocol
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink 10Gbps
Oversubscription10Gbps
24
Yahoo! JAPAN’s Hadoop Network Transition
…
Cluster2
Spanning Tree Protocol
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink 10Gbps
Oversubscription 4 : 110Gbps
25
Yahoo! JAPAN’s Hadoop Network Transition
…
Cluster2
Spanning Tree Protocol
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink 10Gbps
Oversubscription 4 : 1Blocking
26
L2 Fabric
…
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack
Server NIC
UpLink
Oversubscription
Cluster3
27
L2 Fabric
…
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink
Oversubscription
Cluster3
28
L2 Fabric
…
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink
Oversubscription
Cluster3
29
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink 20Gbps
Oversubscription
L2 Fabric
…
Cluster3
20Gbps 20Gbps
30
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink 20Gbps
Oversubscription 2 : 1
L2 Fabric
…
Cluster3
20Gbps 20Gbps
31
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack
Server NIC
UpLink
Oversubscription
L2 Fabric
…
Cluster4
32
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack 16Nodes
Server NIC 10Gbps
UpLink
Oversubscription
L2 Fabric
…
Cluster4
33
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack 16Nodes
Server NIC 10Gbps
UpLink 80Gbps
Oversubscription 2 : 1
L2 Fabric
…
80Gbps 80Gbps
Cluster4
34
Yahoo! JAPAN’s Hadoop Network transition
Release Volume #Nodes/Switch NIC Oversubscription
Cluster1 3PByte 90 1Gbps 4.5:1
Cluster2 20PByte 40 1Gbps 4:1
Cluster3 38PByte 40 1Gbps 2:1
Cluster4 58PByte 16 10Gbps 2:1
Cluster5 75PByte ? ?Gbps ?:?
Network Related Problems
And Solutions
Network Related Problems
36
 Effect of switch failure in the Stack Architecture
 Load on the switch due to BUM Traffic
 Limitations for the DataNode Decommission
 Limitations for the Scale-out
37
Effect of switch failure in the Stack Architecture
 One of the switches which formed
the Stack failed
 This affected the other switches
forming the same Stack
 Communication interruption
among 90 nodes(5 racks)
 insufficient computing resources
and processing stoppage
Network Related Problems
38
Load on the switch due to BUM Traffic
L2 Fabric
… …
4400Nodes
 Due to ARP traffic from servers,
load on the core switch CPU
increases
 Tuning of ARP Cache entry
timeout
 The problem is Large Network
Address
Network Related Problems
39
Limitations for the DataNode Decommission
Network Related Problems
 Consideration of the impact on
jobs
 Limiting the number of nodes
for Decommissioning
40
Limitations for the Scale-out
 Stack Architecture
 Up to ~10 switches
 L2 Fabric Architecture
 Depending on the number of
chassis
Network Related Problems
41
Requirements
120~200 Racks
Scale-out possible up to 10000 Nodes
100~200Gbps UpLink/Rack
10Gbps NIC Server
20Nodes/Rack
DataCenter Located in US
Network Requirements of The Latest Cluster
42
How to solve these problems?
43
How to solve these problems?
We adopted IP CLOS Network!
Adopted IP CLOS Network For Solving Problems
44
Google, Facebook, Amazon, Yahoo…
Over The Top have adopted
DC network architecture
“Introducing data center fabric, the next-generation Facebook data center network”. Facebook
Code. https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-
next-generation-facebook-data-center-network/. (10/06/2016).
Adopted IP CLOS Network For Solving Problems
45
Improved scalability
Improved high availability
Cope-Up with increase in East-West traffic
Reduction in operating cost
Yahoo! JAPAN’s
IP CLOS Network
47
BoxSwitch Architecture
 No limitation on Scale-out
 Requires many switches
・・・
・・
・・・
・・
・・・
・・
・・・
・・
・・ ・・ ・・ ・・・・・
Spine
Leaf
ToR
Architecture
48
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
Architecture
Architecture
49
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
・・・
・・Spine
Leaf
 Why was this architecture adopted?
 Reduce in items to be managed
IP address and cable, Interface, BGP Neighbor…..
 Overcomes the physical constraints,
such as one floor limit
 Reduction in cost
Architecture
ECMP
Between Spine and Leaf is BGP
51
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
BGP
Architecture
52
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
/31
/26 /27
Architecture
Between Spine and Leaf : /31
Rack : /26, /27
53
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
/31
/26 /27
Architecture
Resolved the “BUM Traffic problem”
54
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
Leaf Uplink 40Gbps x 4 = 160Gbps
160Gbps
①
②
③
④
Architecture
55
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
Leaf Uplink 40Gbps x 4 = 160Gbps
①
②
③
④
Architecture
10Gbps NIC
20Nodes
160Gbps
56
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
Leaf Uplink 40Gbps x 4 = 160Gbps
160G
①
②
③
④
Architecture
200 : 160 = 1.25 : 1
10Gbps NIC
20Nodes
57
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
Leaf Uplink 40Gbps x 4 = 160Gbps
160G
①
②
③
④
Architecture
200 : 160 = 1.25 : 1
Resolved the “Limitations for the DataNode
Decommission”
10Gbps NIC
20Nodes
58
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
Leaf Uplink 40Gbps x 4 = 160Gbps
160G
①
②
③
④
Architecture
200 : 160 = 1.25 : 1Improved High Availability
10Gbps NIC
20Nodes
Architecture
59
 Effect of switch failure in the Stack Architecture
 Load on the switch due to BUM Traffic
 Limitations for the DataNode Decommission
 Limitations for the Scale-out
Architecture
60
 Effect of switch failure in the Stack Architecture
 Load on the switch due to BUM Traffic
 Limitations for the DataNode Decommission
 Limitations for the Scale-out
✔
✔
✔
61
Yahoo! JAPAN’s Hadoop Network transition
Release Volume #Nodes/Switch NIC Oversubscription
Cluster1 3PByte 90 1Gbps 4.5:1
Cluster2 20PByte 40 1Gbps 4:1
Cluster3 38PByte 40 1Gbps 2:1
Cluster4 58PByte 16 10Gbps 2:1
Cluster5 75PByte 20 10Gbps 1.25:1
Performance Tests(5TB Terasort)
62
63
Performance Tests(40TB DistCp)
64
Performance Tests(40TB DistCp)
16Nodes/Rack
8Gbps/Node
65
Performance Tests(40TB DistCp)
16Nodes/Rack
8Gbps/Node
About 30Gbps x 4 = 120Gbps
New Problems
66
 Delay in data transfer
 Out of 4, 1 error packet is generated in Uplink
 That one affected the data transfer delay
Slow
New Problems
67
 Delay in data transfer
 Out of 4, 1 error packet is generated in Uplink
 That one affected the data transfer delay
“org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror”
Slow
New Problems
68
 Delay in data transfer
 Out of 4, 1 error packet is generated in Uplink
 That one affected the data transfer delay
“org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror”
Slow
New Problems
69
 IP changes when the server rack changes
 Also has a network address for each rack
 Access control using IP address
 Requires ACL update according to relocation
192.168.0.0/26 192.168.0.64/26
192.168.0.10 192.168.0.100
Future Plan
Future Plan
71
 Detecting error packet failure before affecting the data
transfer
Error!
Future Plan
72
Error!
Auto Shutdown
 Detecting error packet failure before affecting the data
transfer
Future Plan
73
 Use Erasure Coding
striping
64kB
Originalrawdata
Future Plan
74
 Use Erasure Coding
D6
striping
64kB
Originalrawdata
Raw data
D5D4D3D2D1
Future Plan
75
 Use Erasure Coding
D6
striping
64kB
Originalrawdata
Parity
Raw data
D5D4D3D2D1
P3P2P1
Future Plan
76
 Use Erasure Coding
D6
striping
64kB
Originalrawdata
Parity
Raw data
D5D4D3D2D1
P3P2P1
D6
D5
D4
D3
D2
D1 P1
P2
P3
Future Plan
77
 Use Erasure Coding
D6
striping
64kB
Originalrawdata
Parity
Raw data
D5D4D3D2D1
P3P2P1
D6
D5
D4
D3
D2
D1 P1
P2
P3
Read
Future Plan
78
 Use Erasure Coding
D6
striping
64kB
Originalrawdata
Parity
Raw data
D5D4D3D2D1
P3P2P1
D6
D5
D4
D3
D2
D1 P1
P2
P3
Read
Future Plan
79
 Use Erasure Coding
D6
striping
64kB
Originalrawdata
Parity
Raw data
D5D4D3D2D1
P3P2P1
D6
D5
D4
D3
D2
D1 P1
P2
P3
Low Data Locality
Future Plan
80
・・・・・・・・・・・・
Interconnecting various platforms
… …
BOTTLENECK
Future Plan
81
・・・・・・・・・・・・・・
 Isolation of computing and storage
: Storage Machine
: Computing Machine
Thank You for Listening!
Appendix
Appendix
84
JANOG38
http://www.janog.gr.jp/meeting/janog38/program/clos

More Related Content

What's hot

Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem DataWorks Summit/Hadoop Summit
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 
Next Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache StormNext Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache StormDataWorks Summit
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleYifeng Jiang
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHortonworks
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNDataWorks Summit/Hadoop Summit
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 

What's hot (20)

Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFiMachine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Next Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache StormNext Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache Storm
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and TesseractScalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARN
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course #HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
 

Viewers also liked

From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...DataWorks Summit/Hadoop Summit
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...DataWorks Summit/Hadoop Summit
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEDataWorks Summit/Hadoop Summit
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...DataWorks Summit/Hadoop Summit
 
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...DataWorks Summit/Hadoop Summit
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemDataWorks Summit/Hadoop Summit
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...DataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
 
Comparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBaseComparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBase
 
Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.
 
The truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on HadoopThe truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on Hadoop
 
The real world use of Big Data to change business
The real world use of Big Data to change businessThe real world use of Big Data to change business
The real world use of Big Data to change business
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
 
Rebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for ScaleRebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for Scale
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
SEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile gamesSEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile games
 
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
 
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
 
Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 

Similar to Network for the Large-scale Hadoop cluster at Yahoo! JAPAN

Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceAdding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceSamsung Open Source Group
 
Io t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doinIo t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doinJonny Doin
 
Jorgenson Loki
Jorgenson LokiJorgenson Loki
Jorgenson LokiCarl Ford
 
ARIN 34 IPv6 IAB/IETF Activities Report
ARIN 34 IPv6 IAB/IETF Activities ReportARIN 34 IPv6 IAB/IETF Activities Report
ARIN 34 IPv6 IAB/IETF Activities ReportARIN
 
Emerging Networking Technologies for Industrial Applications
Emerging Networking Technologies for Industrial ApplicationsEmerging Networking Technologies for Industrial Applications
Emerging Networking Technologies for Industrial ApplicationsPrasant Misra
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDKLagopus SDN/OpenFlow switch
 
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro NakajimaDPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro NakajimaJim St. Leger
 
Ieee Transition Of I Pv4 To I Pv6 Network Applications
Ieee Transition Of I Pv4 To I Pv6 Network ApplicationsIeee Transition Of I Pv4 To I Pv6 Network Applications
Ieee Transition Of I Pv4 To I Pv6 Network Applicationsguest0215f3
 
Riverbed Within Local Gov
Riverbed Within Local GovRiverbed Within Local Gov
Riverbed Within Local Govmichaelking
 
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Igalia
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 
LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...
LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...
LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...LINE Corporation
 
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...Ceph Community
 
Internet Protocol Version 6 By Suvo 2002
Internet Protocol Version 6 By Suvo 2002Internet Protocol Version 6 By Suvo 2002
Internet Protocol Version 6 By Suvo 2002suvobgd
 
Update on IPv6 activity in CERNET2
Update on IPv6 activity in CERNET2Update on IPv6 activity in CERNET2
Update on IPv6 activity in CERNET2APNIC
 
Converged IO for HP ProLiant Gen8
Converged IO for HP ProLiant Gen8Converged IO for HP ProLiant Gen8
Converged IO for HP ProLiant Gen8IT Brand Pulse
 

Similar to Network for the Large-scale Hadoop cluster at Yahoo! JAPAN (20)

6LoWPAN: An Open IoT Networking Protocol
6LoWPAN: An Open IoT Networking Protocol6LoWPAN: An Open IoT Networking Protocol
6LoWPAN: An Open IoT Networking Protocol
 
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceAdding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
 
Io t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doinIo t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doin
 
Jorgenson Loki
Jorgenson LokiJorgenson Loki
Jorgenson Loki
 
ARIN 34 IPv6 IAB/IETF Activities Report
ARIN 34 IPv6 IAB/IETF Activities ReportARIN 34 IPv6 IAB/IETF Activities Report
ARIN 34 IPv6 IAB/IETF Activities Report
 
Emerging Networking Technologies for Industrial Applications
Emerging Networking Technologies for Industrial ApplicationsEmerging Networking Technologies for Industrial Applications
Emerging Networking Technologies for Industrial Applications
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
 
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro NakajimaDPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
 
Ieee Transition Of I Pv4 To I Pv6 Network Applications
Ieee Transition Of I Pv4 To I Pv6 Network ApplicationsIeee Transition Of I Pv4 To I Pv6 Network Applications
Ieee Transition Of I Pv4 To I Pv6 Network Applications
 
Riverbed Within Local Gov
Riverbed Within Local GovRiverbed Within Local Gov
Riverbed Within Local Gov
 
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...
LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...
LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...
 
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
 
Internet Protocol Version 6 By Suvo 2002
Internet Protocol Version 6 By Suvo 2002Internet Protocol Version 6 By Suvo 2002
Internet Protocol Version 6 By Suvo 2002
 
Update on IPv6 activity in CERNET2
Update on IPv6 activity in CERNET2Update on IPv6 activity in CERNET2
Update on IPv6 activity in CERNET2
 
Converged IO for HP ProLiant Gen8
Converged IO for HP ProLiant Gen8Converged IO for HP ProLiant Gen8
Converged IO for HP ProLiant Gen8
 
L6 6 lowpan
L6 6 lowpanL6 6 lowpan
L6 6 lowpan
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 

Recently uploaded (20)

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 

Network for the Large-scale Hadoop cluster at Yahoo! JAPAN

Editor's Notes

  1. それではヤフーの深澤から Network for the Large-scale Hadoop cluster at Yahoo! JAPAN と題しまして発表をさせていただきたいと思います。
  2. 本日のアジェンダはこのような内容になっています。 まずは簡単にHadoopとネットワークの関係について説明をさせていただきたいと思います。 その後にヤフーで採用されてきたHadoop用のネットワークについてとネットワークに関連する問題点、 その解決策として、実際に導入したIPClosNetwork をヤフーの構成を交えて説明したいと思っております。
  3. まず Hadoop とネットワークの関連について簡単にお話したいと思います。
  4. Hadoop では様々種類のデータのやり取りが行われます。 例えば、DataNode などの SlaveNode 系のコンポーネントから、 NameNode などのマスター系のコンポーネントへの死活監視のためのHaertbeat。 その他には Job、Block などのReport の送信。さらにはデータのレプリケーションや再配置などによるブロックデータの転送があります。
  5. 特にブロックデータの転送にはより多くのトラフィックが発生します。 それは HDFS ではブロックのレプリケーションや再配置、MapReduce ではShuffle フェーズが該当します。
  6. また、Hadoop では従来のユーザからサーバへのアクセスのような North/South の方向つまり、縦のトラフィックだけでなく
  7. データレプリケーションなどによる、マシン同士での通信が発生します。 そのため台数が多くなるとラック間での通信が発生するためEast/West 横方向での通信が発生します。
  8. Hadoopでは、North/South の方向の通信よりも East/West 横方向の通信より多く発生します。
  9. また、これはFacebookのブログのものですが、このようにマシンToユーザではなく、マシンToマシン つまりマシン同士のトラフィックが増えていると書かれています。 これより、Hadoop に関わらずラック間でのトラフィックの意識は重要だと考えられます。
  10. このようにラック間での通信はHadoopに関わらず重要だと話してきましたが、 ラック間通信ではオーバーサブスクリプションを意識する必要があります。 オーバーサブスクリプションとは求められる帯域と実際に使用できる帯域の比率のことです。 このような1ラックに1Gbps NICのサーバが40台積んであるラックを想定したときに、 ラックスイッチの UpLink が10Gbps の場合は 40:10 つまりオーバーサブスクリプションは 4:1 になります。
  11. 続いてこれまでのヤフーでのHadoop用ネットワークについてお話したいと思います。
  12. まず、こちらのグラフを御覧ください。こちらはヤフーのいままでのHadoopクラスタのクラスタサイズをグラフ化したものです。 横軸がクラスタのリリース時期で、縦軸がクラスタのサイズとなっています。 ヤフーのクラスタは2011年の最初のクラスタから、このようにクラスタサイズが大きくなっています。 このクラスタの変化に合わせてHadoop用に採用されてきたネットワークも変化してきました。 ここからはそのネットワークの変遷についてお話したいと思います。
  13. まずは1番古いクラスタ1のネットワーク構成についてです。 こちらのクラスタのネットワークは複数のラックスイッチをStack構成。 つまり複数のスイッチを仮想的に1台に見せたスイッチをコアスイッチに接続しています。
  14. すべてのスイッチでStack構成を組んでいるわけではなく、4スイッチごとに一つのStackを構成しています。
  15. ひとつのStack構成には1GigabpsのNICのサーバが90台接続されています。 この構成場合上位のコアスイッチを経由して他のStack構成のラックへ通信する場合は
  16. このような経路を通ります。
  17. UpLnkが20Gbpsとなっているため、サーバの台数とUpLinkの数値よりオーバーサブスクリプションを求めると。
  18. 4.5:1という数値になります。
  19. このStack構成の問題点としては、構成が組めるスイッチの数が10台程度までという、スケールアウトに限度があります。
  20. 二つ目のクラスタ2のネットワーク構成はスパニングツリープロトコルを利用した標準的な構成となっています。
  21. このネットワーク構成のHadoopクラスタには、1ラックに1GigabpsのNICのサーバが40台設置されています。 この構成ではラック間で通信する場合、
  22. スパニングツリープロトコルのため、2本のUpLnkのうち片方のUpLinkを利用する形となり、このような経路を通ります。
  23. また、このネットワーク構成ではUpLink が 10Gigabps のため、
  24. オーバーサブスクリプションが 4:1となります。
  25. この構成では、UpLinkの片方がループ防止のためにブロッキングされているため、帯域を活かしきれていません。
  26. こちらは3番目のクラスタにリリースしたクラスタのネットワーク構成です。 こちらの構成は L2 Fabric と Channel という複数の物理ポートを単一の論理ポートにみせる技術を用いた構成になっています。 そのため、先程の構成とは違いUpLinkを片方のみ使うのではなく、UpLinkを両方とも使用する構成となっております。
  27. この構成で1ラックに1Gigabps のサーバが40台設置されています
  28. ラック間の通信経路ですが、こちらの構成では先程お話したように2本のUpLinkを Channel構成にしているため、UpLink2本とも使用するようになっております。
  29. UpLinkの帯域ですが、ラックスイッチのUpLinkは20Gigabps でとなっています。
  30. そのため、オーバーサブスクリプションは2:1となっています。 先程のスパニングツリープロトコルを採用していた構成よりもオーバーサブスクリプションが 良くなっています。
  31. 最後にクラスタ4の構成ですが、こちらは先程のクラスタ3と同じ構成になっています。
  32. ラック間の通信の仕方などは一緒ですが、こちらは10Gbps NICのサーバを1ラックに16台設置されています。
  33. この構成ではアップリンクが80Gigabpsとなっているため、10Giga bps のサーバでもオーバーサブスクリプション 2:1を維持しています。
  34. これまでのクラスタとネットワークに関する情報をまとめるとこのようになります。 クラスタが新しくなるにつれて規模が大きくなっていますが、オーバーサブスクリプションは改善されていることがわかります。
  35. 次にここまで紹介してきたヤフージャパンのHadoop用ネットワークで起きた障害や問題点とその解決策についてお話したいと思います。
  36. これまでのヤフージャパンでのHadoopの運用の中でネットワークに関連した障害と問題点はこちらになります。 上から順番に紹介していきたいと思います。
  37. まず一つ目は、最初にご紹介したクラスタで使用しているStack構成のネットワークでの障害です。 こちらはStackを組んでいるスイッチのうち1台が不調になったことで、同じStackを組んでいるスイッチにも 影響が及んでしまい、90台のサーバに対してネットワークリーチがとれなくなってしまいました。 それにより、計算リソースが不足し処理の停止が発生してしまいました。
  38. 次に BUM Traffic によるスイッチへの負荷です。 ちなみに BUM トラフィックとは、ブロードキャスト、ユニキャスト、マルチキャスト によるトラフィックのことです。 このとき同じネットワークアドレスの中に4000台以上、さきほどのCluster3とCluster4のノードが存在し、それぞれから短い間隔でARPのトラフィックが発生していたことが原因でした。 こちらはネットワーク内のサーバからのARPによるブロードキャストが原因で上位のコアスイッチに負荷を与え、CPUの上昇の原因となっていました。 対応としてサーバ側のARPエントリの保持時間を伸ばす対応でスイッチへ負荷を軽減させました。
  39. 次にこちらの、DataNodeのデコミッション時の制限ですが、 UpLinkの帯域や既存のジョブを考慮し、DataNode のデコミッションを実施する場合に限られた台数で実施していました。 みなさまはご存知だと思いますが、デコミッションではレプリケーションの再配置でデータの転送が発生します。 ちなみに弊社だと、電源のメンテナンスなどのために数ラック単位でのデコミッション処理が通常の運用で発生したりします。 そのような場合全台同時に実施するのではなくある程度数を区切って実施していました。
  40. こちらのスケールアウトの限界ですが、これはクラスタのスケールアウトに合わせて 物理的な制約などで制限がかかってしまうという問題です。 先程もお話しましたように、Stack構成では最大で10台程度が限度です。 L2 ファブリックなどの構成でもシャーシのポートの数に依存してしまうという問題がありました。
  41. また、このような問題があったなかで、去年の春頃にこのようなHadoopチームから ネットワークチームへこのような要件を出しました。規模としては120-200ラック。 10000ノードクラスのクラスタでも問題ないネットワークというものです。 また、場所は国内ではなくアメリカのデータセンターです。
  42. ヤフーはこの問題を解決するために
  43. IP CLOS Networkを選択しました。
  44. そもそもIP CLOS Networkとは世界の技術Top会社が採用しているネットワーク構成です。
  45. IP CLOS Network にはこのような特徴があります。 まずはスケーラビリティや耐障害性の向上、 East-West トラフィックの増大に対応が可能となっています。 1番下の運用コストの軽減ですが、耐障害性の向上による運用負荷の軽減と BGPやOSPFといった一般的なルーティングプロトコルを用いるため Switchメーカーに依存した運用と言ったものがなくなります。 各特徴の詳細はこの後構成を交えて説明させていただきます。
  46. ここからヤフージャパンのIP CLOS Network について説明したいと思います
  47. まず IP CLOS Network の構成にはこのような3層構造のボックススイッチ型の構成があります。 この構成の場合、SpineとLaefと呼ばれるSwitchを追加することでいくらでもスケールアウトが可能となります。
  48. こちらが現在Hadoopで採用しているネットワーク構成は先ほどご紹介したボックススイッチ構成のような 3層ではなくこのような シャーシ型スイッチを用いた2層のSpine-Leaf 構成になっています。
  49. 先程の Spine/Leaf の部分がシャーシに収まった形になります。
  50. 今回なぜこのような構成を採用したかというと、3層構造のBoxSwitchの構成だと管理するIPやケーブルが多くなってしまう点と 1フロア限定など物理的な制約があったためです。 今回初めてヤフーとしてCLOSNetworkの構築だったため、なるべく管理コストを減らす目的がありました。 また、シャーシ型のSwitchのコストが軽減したのも一つの要因です。
  51. このネットワーク構成では、Spine と Leaf の間を一般的なネットワークルーティングプロトコルである BGPで経路広報を行っております。また、SpineとLeafの通信はECMP(イコールコストマルチパス)となっているので 一部の経路のみを使用するのではなく、すべての経路を使用します
  52. Spine-Leaf の接続は各配線のそれぞれのインターフェースにIPを持っているため、 /31 でネットワークアドレスを割り当てています。 また、ラック毎に/26や/27のネットワークアドレスを持っています。
  53. この構成になったことで、ラック毎にネットワークアドレスをもつことによりL2の範囲が小さくなったため、BUMTraffic の影響が小さくなり 先程紹介したBUMTrafficの問題は解消されました。
  54. ラックスイッチからのUpLinkの帯域に関しては40Gigabps×4本で合計160Gigabpsとなっています。
  55. このHadoopクラスタでは 1ラックあたり10GNICのサーバが最大20台設置されているため、
  56. この場合オーバーサブスクリプションがこのように 1.25:1という形になります。
  57. これにより、DataNode のデコミッション処理にたいしてジョブの実行への影響を ネットワークに関して考慮しなくてよくなりました。
  58. さらにUpLinkが4本となり、より冗長になったため耐障害性も向上しました。 先程このHadoopクラスタはアメリカに構築されているとお話したと思いますが、 より冗長化したことでネットワーク障害時の即時対応が求められることが軽減されました。 これは24時間365日在住しているわけではないアメリカのデータセンターでは大きなメリットです。
  59. ここまでのお話でIP CLOS Network を採用したことで、今まで起きていた問題が
  60. このように解決することができました。スケールアウトの限界については 今回は先程お話したとおりフロアなどの別の制約があったため、限界があります。
  61. 先程の表にIP CLOS Network上のHadoopクラスタを加えるとこのような形になります。 オーバーサブスクリプションが大幅に改善されています。
  62. こちらは IP CLOS Network 上に構築した実際のHadoopクラスタで5TB のTeraSort を実施したときのネットワークトラフィックになります。 左がネットワーク機器からみたインプットトラフィックで右がアウトプットトラフィックです。 グラフからわかるように、インプットとアウトプットともに4つのUplinkにほぼ均等にトラフィックが分散されているのがわかります。
  63. また、DistCp で実際にパフォーマンスを出し切れるか実施したときの結果こちらになります。
  64. 1ラックに16Nodeサーバがあり、1台あたり8gigabpsトラフィックが出ていました。
  65. 一つのラックスイッチから約120Gigabps 出すことができました。
  66. ただ、IP CLOS Network にしたことで新たな問題も発生しました。 まずはデータ転送の遅延です。あるときデータのプットやMapReduceのジョブが遅いという報告がユーザからありました。
  67. 原因を調査すると特定のラックで、SlowBlockReceiver というログが出力されていました。 これはデータ転送が遅いときなどに出力されるログです。
  68. 原因はUpLinkの4本中1本でエラーパケットを出してしまっているため、そのラックへのデータ転送が遅延していました。 つまり、本数が増えたことで障害ポイントも増えてしまっています。
  69. また、運用上の注意点として、サーバラック毎にネットワークアドレスをもっているためラックを移動させた場合に サーバのIPアドレスが変わってしまう点があります。 これは、弊社の場合ですとIPアドレス単位でACLを設定しているためサーバの移設によってACLの変更も必要になります。
  70. 最後に今後についてです
  71. まずは、データ転送へ影響が起きる前のネットワーク障害への対応です。 先程お話したような一部のUpLinkにエラーパケットの上昇などが起きた場合に、それを検知し
  72. エラーカウンターの上昇などを検知し自動でインターフェースをShutdownなど実施して影響が出る前に対応することを目指しています。 これはUpLink が4本で冗長化がされているため、1本程度全断のリスクが減っているためです。
  73. 次にErasure Codingの採用です。弊社ではErasure CodingをIP CLOS Network上の Hadoop クラスタでの利用を開始しています。Erasure Coding とは
  74. 次にErasure Codingの採用です。弊社ではErasure CodingをIP CLOS Network上の Hadoop クラスタでの利用を開始しています。Erasure Coding ではブロックのデータを6分割し
  75. そこから3つパリティを作成します。そのため、一つのデータに対して9つのデータが生成されます。
  76. そしてErasure Codingではこの9つのデータを基本的にはすべて別のラックに配置します。
  77. そのため、特定のノードでデータをリードしたいときは
  78. このようにラック間の通信が発生してしまいます。
  79. ここからわかるようにデータローカリティが低くなっているため、通常のレプリケーションよりも一つのブロックを読み込むときに ラック間のデータ転送が多く発生することになります。 そのためラック間のネットワークトラフィックは重要になります。
  80. 次ですが、今後は Hadoop だけでなく様々なプラットフォームをCLOSNetwork上に載せることで ネットワーク帯域を気にせず、プラットフォーム毎の相互接続を可能にしたいと考えています。 現状ですと、プラットフォームごとに別のコアスイッチの配下にいるためネットワークがボトルネックになってしまっているため、 すべてのプラットフォームをCLOS Networkに載せることで解決したいと考えております。
  81. 最後ですが、こちらは今までのようなCPUとストレージをバランス良く考慮したサーバを導入するのではなく・ データローカリティを考慮をしないようにすることで、コンピューティングに特化したマシンと ストレージに特化したマシンを別々に置くことを目指しています。これにより、処理のリソースが足りなければ CPUをたくさん積んだサーバ、容量が足りなければストレージをたくさん積んだサーバを購入といったリソースの効率化を図れます。
  82. 以上となります。ご清聴ありがとうございました。