SlideShare a Scribd company logo
1 of 26
Download to read offline
解構⼤大數據架構
⼤大數據系統的伺服器與網路資源規劃
“How to eat an elephant – one byte at a time”
CP Li 李俊邦
Enterprise Technologist
Enterprise Solutions & Alliances, Greater China
Dell
2
議程
1.  不同的伺服器⾓角⾊色
1.  Manager
2.  Name Nodes
3.  Edge Nodes
4.  Data Nodes
2.  Hadoop Cluster設計
3.  Etu+Dell
4.  Futures / Roadmap
5.  Questions?
3
Server Roles - Manager
•  系統安裝圖形介⾯面/ 主控台
•  ⼤大多安裝在Edge Node
•  常⾒見版本
–  Cloudera Manager
–  Apache Ambari
4
Server Roles – Name Nodes
•  存放HDFS的metadata
•  Job Manager for YARN data-processing framework
•  Primary
–  Heartbeats from data nodes
–  10th heartbeat is a block report from which it generates
metadata
•  Standby
–  Checks in every hour to mirror metadata / block map
–  Not a hot-spare – requires manual fail-over
•  High Availability (HA) can be added in some
distributions
–  Results in a dedicated HA node that acts as a witness
to the Name Node cluster
5
Server Roles - Edge Nodes
•  資料進出Hadoop叢集的主要端⼝口
•  可擴展
•  Hadoop叢集裡唯⼀一的多網段節點
PowerEdge	
  R730	
  –	
  Name	
  Node
PowerEdge	
  R730	
  –	
  Standby	
  Name	
  Node
PowerEdge	
  R730	
  –	
  Edge	
  Node(s)
PowerEdge	
  R730	
  –	
  HA	
  Node
Corporate	
  Network Data	
  Network
Corporate
Data	
  Network
Data	
  Network
Data	
  Network
Data	
  Network
PowerEdge	
  R730XD	
  –	
  Data	
  Nodes
Data	
  Network
6
Server Roles - Data Node
•  HDFS的主要存放處
•  執⾏行YARN資源管理所指定的資料處理
•  主要屬性
–  記憶體
›  標配64GB
›  更多服務(Impala/Spark) 需要更多記憶體
–  很多的本地硬碟 (JBOD / Non-RAID mode)
›  SFF (2.5”) for performance-based workloads
›  LFF (3.5”)for capacity-centric workloads
–  CPUs – legacy recommendation of 1:1 core:spindle ratio
›  SSDs, faster HDD (10K+), and in-memory workloads make this less of an issue
›  10 and 12 core are the best practice default
Hadoop Cluster
Design
8
Hadoop Cluster Design – Hardware Considerations
9
Hadoop Cluster Deployment – Installation Best
Practices
•  Use pre-built, assembled & cabled racks from vendor
•  ⾃自動佈署⼯工具 (ex: Open Crowbar)
•  Purchase nodes in standard size groups for easy capacity growth and ordering, not in single node
increments
–  Common increments are ½ or full rack for easy deployment and sizing
•  For each type of hardware, purchase spare components to keep on site for easy, rapid repair
10
Core Hadoop Use Cases
歸檔
⾼高硬碟/CPU⽐比
記憶體使⽤用低
法規需求
⻑⾧長期歸檔
資料處
理
⾼高硬碟/CPU⽐比
記憶體使⽤用中等
DW offload
ETL offload
EDH
質量分析
IT Log分析
分析
⾼高核⼼心數
記憶體使⽤用⾼高
市場分析
詐欺預防
網路分析
11
Common Hadoop Use Case to Ecosystem Tool Mapping
12
Hadoop Use Case to Ratio Mapping
歸檔
1:2:1
資料處理
1:4:1
分析
2:8:1
CPU (Cores) : Memory (GB) : Disk (數量) – Data Node
13
Node Considerations
Dell PowerEdge R730 Dell PowerEdge R730 Dell PowerEdge R730
Dell PowerEdge R730xd
14
Node Considerations
15
HDFS Capacity
•  HDFS protects information through replication of the data between nodes, the default Replication
Factor is 3, but is configurable.
•  HDFS Raw Capacity = Number of Compute Nodes x Number of Drives x Capacity of Drives
•  HDFS Usable Capacity = HDFS Raw Capacity/Replication Factor
16
Big Data Networking Best Practices
•  Traditional Ethernet is used since it’s affordable and already prevalent.
•  1GbE networking was used initially in early drafts of the solution but with the reduction in cost it’s
much more efficient to go with 10GbE.
•  Multiple ports are teamed both for redundancy and throughput. LACP or software bonding are the
most common methods.
•  IPv4 is most widely used. IPv6 has limited support at the OS and Hadoop level.
17
Attributes of a Good Switch for Big Data
•  Non-blocking backplane
•  Deep per-port packet buffers (shared buffers do not work well). During sort/shuffle phases of
map/reduce operations network traffic is so chaotic that it can saturate any and all shared buffers,
impacting multiple host’s network performance.
•  Good choices:
–  1GbE
›  S55
›  S60
–  10GbE
›  S4810
›  S5000
–  40GbE
›  Z9000
›  Z9500
›  S6000
18
Dell Hadoop Solution Logical Diagram
19
Scale-out Aggregation Layer
20
Dell Points of Integration
•  VLT / VRRP is a very affordable way to team switches both at the ToR and the aggregation tiers.
This makes the Dell Networking Force10 switches a great choice.
•  Active Fabric Manager
–  Speeds up the creation and administration of the required VLT / VRRP configuration on the switches.
–  Helps with capacity-planning as customer scale
21
Big Data Networking Futures
•  40GbE onboard LOMs will begin to be used for high-volume clusters. Right now the cost:benefit
ratio isn’t there yet.
•  As HPC and Big Data converge, we’ll start to see the use of IB for node-to-node connectivity.
•  In-memory (Spark / Impala) workloads are reducing the bottlenecks that used to exist at the disk
and now move to the processor and network. Expect customers to be looking to increase core
counts and network speed to overcome this.
@Dell_Enterprise Enterprise Solutions
Etu+Dell = complete Hadoop/Big Data solution provider
Best of breed
Cloudera partners
- Etu
Analytic software
solutions for Big Data
Dell Professional Services for Big Data
Dell PowerEdge
13G servers
Dell Networking
solutions
Installation and configuration service
Complete end-to-end implementation
Discover Plan ImplementInvestigate
2. Store1. Integrate
4. Act
3. Analyze
Solution architecture
Analytical output
Toad Data Point
Desktop – integrate, cleanse
Dell Boomi
Cloud – integrate, correlate
Toad Intelligence
Central
Data aggregation
and virtualization
Dell STATISTICA
Customer data
Order data
Events
Stock market data
Advanced
Analytics
Marketing campaigns
Dell Statistica Big Data
Desktop – crawl, save
Social Media
24
Futures
•  Speed Improvements in Map / Reduce
•  More in-memory workloads
–  Possible move to Spark to replace Map/Reduce
•  Virtualized Hadoop
–  VMWare Big Data Extensions
–  Openstack Sahara
–  Microsoft HDInsights (Hortonworks)
25
Dell In-Memory Appliance for Cloudera Enterprise
Configurations at a glance
Mid-Size Configuration
16 Node Cluster
PowerEegeR720- 4 Infrastructure Nodes
with ProSupport
PowerEdgeR720XD- 12 Data Nodes with
ProSupport
Cloudera Enterprise
Force10- S4810P
Force10- S55
Dell Rack 42U
~528TB (disk raw space)
Starter Configuration
8 Node Cluster
PowerEdge R720- 4 Infrastructure Nodes
with ProSupport
PowerEdgeR720XD- 4 Data Nodes with
ProSupport
Cloudera Enterprise
Force10- S4810P
Force10- S55
Dell Rack 42U
~176TB (disk raw space)
Small Enterprise
Configuration
24 Node Cluster
PowerEdgeR720- 4 Infrastructure Nodes
with ProSupport
PowerEdgeR720XD- 20 Data Nodes with
ProSupport
Cloudera Enterprise
Force10- S4810P
Force10- S55
Dell Rack 42U
~880TB (disk raw space)
Expansion Unit- PowerEdgeR720XD-4 Data Nodes w ProSupport, Cloudera Enterprise, Scales in
Blocks
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃

More Related Content

What's hot

Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red_Hat_Storage
 
Migration DB2 to EDB - Project Experience
 Migration DB2 to EDB - Project Experience Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project ExperienceEDB
 
Red Hat Storage Day Atlanta - Why Software Defined Storage Matters
Red Hat Storage Day Atlanta - Why Software Defined Storage MattersRed Hat Storage Day Atlanta - Why Software Defined Storage Matters
Red Hat Storage Day Atlanta - Why Software Defined Storage MattersRed_Hat_Storage
 
10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 Kangaroot
 
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...Red_Hat_Storage
 
[db tech showcase Tokyo 2015] D25:The difference between logical and physical...
[db tech showcase Tokyo 2015] D25:The difference between logical and physical...[db tech showcase Tokyo 2015] D25:The difference between logical and physical...
[db tech showcase Tokyo 2015] D25:The difference between logical and physical...Insight Technology, Inc.
 
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'169/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16Kangaroot
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specificationsinside-BigData.com
 
Red Hat Storage Day Atlanta - Persistent Storage for Linux Containers
Red Hat Storage Day Atlanta - Persistent Storage for Linux Containers Red Hat Storage Day Atlanta - Persistent Storage for Linux Containers
Red Hat Storage Day Atlanta - Persistent Storage for Linux Containers Red_Hat_Storage
 
2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...
2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...
2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...Shawn Wells
 
Red hat on_power-ibm _lop_day_2015
Red hat on_power-ibm _lop_day_2015Red hat on_power-ibm _lop_day_2015
Red hat on_power-ibm _lop_day_2015cmilsted
 
Red Hat Storage Day LA - Persistent Storage for Linux Containers
Red Hat Storage Day LA - Persistent Storage for Linux Containers Red Hat Storage Day LA - Persistent Storage for Linux Containers
Red Hat Storage Day LA - Persistent Storage for Linux Containers Red_Hat_Storage
 
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage Red_Hat_Storage
 
Not all open source is the same
Not all open source is the sameNot all open source is the same
Not all open source is the sameEDB
 
Best Practices & Lessons Learned from Deployment of PostgreSQL
 Best Practices & Lessons Learned from Deployment of PostgreSQL Best Practices & Lessons Learned from Deployment of PostgreSQL
Best Practices & Lessons Learned from Deployment of PostgreSQLEDB
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraCeph Community
 
OpenPOWER Foundation Overview
OpenPOWER Foundation OverviewOpenPOWER Foundation Overview
OpenPOWER Foundation OverviewNVIDIA Taiwan
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBlueData, Inc.
 
Why Software-Defined Storage Matters
Why Software-Defined Storage MattersWhy Software-Defined Storage Matters
Why Software-Defined Storage MattersRed_Hat_Storage
 
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverHow to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverEDB
 

What's hot (20)

Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
 
Migration DB2 to EDB - Project Experience
 Migration DB2 to EDB - Project Experience Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project Experience
 
Red Hat Storage Day Atlanta - Why Software Defined Storage Matters
Red Hat Storage Day Atlanta - Why Software Defined Storage MattersRed Hat Storage Day Atlanta - Why Software Defined Storage Matters
Red Hat Storage Day Atlanta - Why Software Defined Storage Matters
 
10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16
 
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
 
[db tech showcase Tokyo 2015] D25:The difference between logical and physical...
[db tech showcase Tokyo 2015] D25:The difference between logical and physical...[db tech showcase Tokyo 2015] D25:The difference between logical and physical...
[db tech showcase Tokyo 2015] D25:The difference between logical and physical...
 
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'169/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
 
Red Hat Storage Day Atlanta - Persistent Storage for Linux Containers
Red Hat Storage Day Atlanta - Persistent Storage for Linux Containers Red Hat Storage Day Atlanta - Persistent Storage for Linux Containers
Red Hat Storage Day Atlanta - Persistent Storage for Linux Containers
 
2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...
2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...
2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...
 
Red hat on_power-ibm _lop_day_2015
Red hat on_power-ibm _lop_day_2015Red hat on_power-ibm _lop_day_2015
Red hat on_power-ibm _lop_day_2015
 
Red Hat Storage Day LA - Persistent Storage for Linux Containers
Red Hat Storage Day LA - Persistent Storage for Linux Containers Red Hat Storage Day LA - Persistent Storage for Linux Containers
Red Hat Storage Day LA - Persistent Storage for Linux Containers
 
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
 
Not all open source is the same
Not all open source is the sameNot all open source is the same
Not all open source is the same
 
Best Practices & Lessons Learned from Deployment of PostgreSQL
 Best Practices & Lessons Learned from Deployment of PostgreSQL Best Practices & Lessons Learned from Deployment of PostgreSQL
Best Practices & Lessons Learned from Deployment of PostgreSQL
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 
OpenPOWER Foundation Overview
OpenPOWER Foundation OverviewOpenPOWER Foundation Overview
OpenPOWER Foundation Overview
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containers
 
Why Software-Defined Storage Matters
Why Software-Defined Storage MattersWhy Software-Defined Storage Matters
Why Software-Defined Storage Matters
 
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverHow to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL server
 

Viewers also liked

豆瓣数据架构实践
豆瓣数据架构实践豆瓣数据架构实践
豆瓣数据架构实践Xupeng Yun
 
Track C-2 洞見未來 - Tableau 創造大數據新價值
Track C-2 洞見未來 - Tableau 創造大數據新價值Track C-2 洞見未來 - Tableau 創造大數據新價值
Track C-2 洞見未來 - Tableau 創造大數據新價值Etu Solution
 
Qualitative Research in Segmentation
Qualitative Research in SegmentationQualitative Research in Segmentation
Qualitative Research in SegmentationSusan Abbott
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Etu Solution
 
Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構
Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構
Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構Etu Solution
 
The Women's March Conversation
The Women's March Conversation The Women's March Conversation
The Women's March Conversation Susan Abbott
 
Trinity BDM - 橋接傳統與未來
Trinity BDM - 橋接傳統與未來Trinity BDM - 橋接傳統與未來
Trinity BDM - 橋接傳統與未來Etu Solution
 
Track C-1 大數據時代的產品 ─ 創新與洞察決策
Track C-1 大數據時代的產品 ─ 創新與洞察決策Track C-1 大數據時代的產品 ─ 創新與洞察決策
Track C-1 大數據時代的產品 ─ 創新與洞察決策Etu Solution
 
Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷
Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷
Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷Etu Solution
 
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...Data Science Thailand
 
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界Etu Solution
 
投客所好:互聯內外,啟動投信藍海數據戰
投客所好:互聯內外,啟動投信藍海數據戰投客所好:互聯內外,啟動投信藍海數據戰
投客所好:互聯內外,啟動投信藍海數據戰Etu Solution
 
Presentation Churn Management
Presentation Churn ManagementPresentation Churn Management
Presentation Churn Managementfarhanmajeed
 
猜你喜歡:虛實並進,贏在全通路
猜你喜歡:虛實並進,贏在全通路猜你喜歡:虛實並進,贏在全通路
猜你喜歡:虛實並進,贏在全通路Etu Solution
 
終歸:分群消費者x多元商機的實現
終歸:分群消費者x多元商機的實現終歸:分群消費者x多元商機的實現
終歸:分群消費者x多元商機的實現Etu Solution
 
Implementing a Segmentation Strategy
Implementing a Segmentation StrategyImplementing a Segmentation Strategy
Implementing a Segmentation StrategySusan Abbott
 
Data without Boundaries - 圍繞第一方數據,找到商業驅動力
Data without Boundaries - 圍繞第一方數據,找到商業驅動力Data without Boundaries - 圍繞第一方數據,找到商業驅動力
Data without Boundaries - 圍繞第一方數據,找到商業驅動力Etu Solution
 
致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡
致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡
致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡Etu Solution
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pubChao Zhu
 

Viewers also liked (20)

豆瓣数据架构实践
豆瓣数据架构实践豆瓣数据架构实践
豆瓣数据架构实践
 
Track C-2 洞見未來 - Tableau 創造大數據新價值
Track C-2 洞見未來 - Tableau 創造大數據新價值Track C-2 洞見未來 - Tableau 創造大數據新價值
Track C-2 洞見未來 - Tableau 創造大數據新價值
 
Qualitative Research in Segmentation
Qualitative Research in SegmentationQualitative Research in Segmentation
Qualitative Research in Segmentation
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台
 
Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構
Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構
Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構
 
The Women's March Conversation
The Women's March Conversation The Women's March Conversation
The Women's March Conversation
 
Trinity BDM - 橋接傳統與未來
Trinity BDM - 橋接傳統與未來Trinity BDM - 橋接傳統與未來
Trinity BDM - 橋接傳統與未來
 
Track C-1 大數據時代的產品 ─ 創新與洞察決策
Track C-1 大數據時代的產品 ─ 創新與洞察決策Track C-1 大數據時代的產品 ─ 創新與洞察決策
Track C-1 大數據時代的產品 ─ 創新與洞察決策
 
Data Science Thailand Meetup#11
Data Science Thailand Meetup#11Data Science Thailand Meetup#11
Data Science Thailand Meetup#11
 
Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷
Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷
Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷
 
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
 
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
 
投客所好:互聯內外,啟動投信藍海數據戰
投客所好:互聯內外,啟動投信藍海數據戰投客所好:互聯內外,啟動投信藍海數據戰
投客所好:互聯內外,啟動投信藍海數據戰
 
Presentation Churn Management
Presentation Churn ManagementPresentation Churn Management
Presentation Churn Management
 
猜你喜歡:虛實並進,贏在全通路
猜你喜歡:虛實並進,贏在全通路猜你喜歡:虛實並進,贏在全通路
猜你喜歡:虛實並進,贏在全通路
 
終歸:分群消費者x多元商機的實現
終歸:分群消費者x多元商機的實現終歸:分群消費者x多元商機的實現
終歸:分群消費者x多元商機的實現
 
Implementing a Segmentation Strategy
Implementing a Segmentation StrategyImplementing a Segmentation Strategy
Implementing a Segmentation Strategy
 
Data without Boundaries - 圍繞第一方數據,找到商業驅動力
Data without Boundaries - 圍繞第一方數據,找到商業驅動力Data without Boundaries - 圍繞第一方數據,找到商業驅動力
Data without Boundaries - 圍繞第一方數據,找到商業驅動力
 
致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡
致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡
致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
 

Similar to Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃

Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopMike Pittaro
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutionssolarisyougood
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAlluxio, Inc.
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Red_Hat_Storage
 
Whd master deck_final
Whd master deck_final Whd master deck_final
Whd master deck_final Juergen Domnik
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureDatabricks
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2hdhappy001
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deckKeithETD_CTO
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineDataWorks Summit
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
 

Similar to Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃 (20)

Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
 
Whd master deck_final
Whd master deck_final Whd master deck_final
Whd master deck_final
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmine
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 

More from Etu Solution

啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道Etu Solution
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Etu Solution
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Etu Solution
 
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享Etu Solution
 
Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展Etu Solution
 
Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動Etu Solution
 
Opening: Big Data+
Opening: Big Data+Opening: Big Data+
Opening: Big Data+Etu Solution
 
數位媒體的客戶洞察行銷術
數位媒體的客戶洞察行銷術數位媒體的客戶洞察行銷術
數位媒體的客戶洞察行銷術Etu Solution
 
Hadoop Big Data 成功案例分享
Hadoop Big Data 成功案例分享Hadoop Big Data 成功案例分享
Hadoop Big Data 成功案例分享Etu Solution
 
打造一個讓企業賣更多的「氣象大數據平台服務」
打造一個讓企業賣更多的「氣象大數據平台服務」打造一個讓企業賣更多的「氣象大數據平台服務」
打造一個讓企業賣更多的「氣象大數據平台服務」Etu Solution
 
那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景Etu Solution
 
Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案
Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案
Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案Etu Solution
 
Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息
Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息
Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息Etu Solution
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
 
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野Etu Solution
 
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Etu Solution
 
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動Etu Solution
 
Big Data Taiwan 2014 Keynote 5: 新聞媒體的大數據應用
Big Data Taiwan 2014 Keynote 5: 新聞媒體的大數據應用Big Data Taiwan 2014 Keynote 5: 新聞媒體的大數據應用
Big Data Taiwan 2014 Keynote 5: 新聞媒體的大數據應用Etu Solution
 
Big Data Taiwan 2014 Keynote 2: Hadoop and the Future of Data Management
Big Data Taiwan 2014 Keynote 2: Hadoop and the Future of Data ManagementBig Data Taiwan 2014 Keynote 2: Hadoop and the Future of Data Management
Big Data Taiwan 2014 Keynote 2: Hadoop and the Future of Data ManagementEtu Solution
 

More from Etu Solution (19)

啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
 
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
 
Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展
 
Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動
 
Opening: Big Data+
Opening: Big Data+Opening: Big Data+
Opening: Big Data+
 
數位媒體的客戶洞察行銷術
數位媒體的客戶洞察行銷術數位媒體的客戶洞察行銷術
數位媒體的客戶洞察行銷術
 
Hadoop Big Data 成功案例分享
Hadoop Big Data 成功案例分享Hadoop Big Data 成功案例分享
Hadoop Big Data 成功案例分享
 
打造一個讓企業賣更多的「氣象大數據平台服務」
打造一個讓企業賣更多的「氣象大數據平台服務」打造一個讓企業賣更多的「氣象大數據平台服務」
打造一個讓企業賣更多的「氣象大數據平台服務」
 
那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景
 
Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案
Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案
Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案
 
Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息
Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息
Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
 
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
 
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動
 
Big Data Taiwan 2014 Keynote 5: 新聞媒體的大數據應用
Big Data Taiwan 2014 Keynote 5: 新聞媒體的大數據應用Big Data Taiwan 2014 Keynote 5: 新聞媒體的大數據應用
Big Data Taiwan 2014 Keynote 5: 新聞媒體的大數據應用
 
Big Data Taiwan 2014 Keynote 2: Hadoop and the Future of Data Management
Big Data Taiwan 2014 Keynote 2: Hadoop and the Future of Data ManagementBig Data Taiwan 2014 Keynote 2: Hadoop and the Future of Data Management
Big Data Taiwan 2014 Keynote 2: Hadoop and the Future of Data Management
 

Recently uploaded

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃

  • 1. 解構⼤大數據架構 ⼤大數據系統的伺服器與網路資源規劃 “How to eat an elephant – one byte at a time” CP Li 李俊邦 Enterprise Technologist Enterprise Solutions & Alliances, Greater China Dell
  • 2. 2 議程 1.  不同的伺服器⾓角⾊色 1.  Manager 2.  Name Nodes 3.  Edge Nodes 4.  Data Nodes 2.  Hadoop Cluster設計 3.  Etu+Dell 4.  Futures / Roadmap 5.  Questions?
  • 3. 3 Server Roles - Manager •  系統安裝圖形介⾯面/ 主控台 •  ⼤大多安裝在Edge Node •  常⾒見版本 –  Cloudera Manager –  Apache Ambari
  • 4. 4 Server Roles – Name Nodes •  存放HDFS的metadata •  Job Manager for YARN data-processing framework •  Primary –  Heartbeats from data nodes –  10th heartbeat is a block report from which it generates metadata •  Standby –  Checks in every hour to mirror metadata / block map –  Not a hot-spare – requires manual fail-over •  High Availability (HA) can be added in some distributions –  Results in a dedicated HA node that acts as a witness to the Name Node cluster
  • 5. 5 Server Roles - Edge Nodes •  資料進出Hadoop叢集的主要端⼝口 •  可擴展 •  Hadoop叢集裡唯⼀一的多網段節點 PowerEdge  R730  –  Name  Node PowerEdge  R730  –  Standby  Name  Node PowerEdge  R730  –  Edge  Node(s) PowerEdge  R730  –  HA  Node Corporate  Network Data  Network Corporate Data  Network Data  Network Data  Network Data  Network PowerEdge  R730XD  –  Data  Nodes Data  Network
  • 6. 6 Server Roles - Data Node •  HDFS的主要存放處 •  執⾏行YARN資源管理所指定的資料處理 •  主要屬性 –  記憶體 ›  標配64GB ›  更多服務(Impala/Spark) 需要更多記憶體 –  很多的本地硬碟 (JBOD / Non-RAID mode) ›  SFF (2.5”) for performance-based workloads ›  LFF (3.5”)for capacity-centric workloads –  CPUs – legacy recommendation of 1:1 core:spindle ratio ›  SSDs, faster HDD (10K+), and in-memory workloads make this less of an issue ›  10 and 12 core are the best practice default
  • 8. 8 Hadoop Cluster Design – Hardware Considerations
  • 9. 9 Hadoop Cluster Deployment – Installation Best Practices •  Use pre-built, assembled & cabled racks from vendor •  ⾃自動佈署⼯工具 (ex: Open Crowbar) •  Purchase nodes in standard size groups for easy capacity growth and ordering, not in single node increments –  Common increments are ½ or full rack for easy deployment and sizing •  For each type of hardware, purchase spare components to keep on site for easy, rapid repair
  • 10. 10 Core Hadoop Use Cases 歸檔 ⾼高硬碟/CPU⽐比 記憶體使⽤用低 法規需求 ⻑⾧長期歸檔 資料處 理 ⾼高硬碟/CPU⽐比 記憶體使⽤用中等 DW offload ETL offload EDH 質量分析 IT Log分析 分析 ⾼高核⼼心數 記憶體使⽤用⾼高 市場分析 詐欺預防 網路分析
  • 11. 11 Common Hadoop Use Case to Ecosystem Tool Mapping
  • 12. 12 Hadoop Use Case to Ratio Mapping 歸檔 1:2:1 資料處理 1:4:1 分析 2:8:1 CPU (Cores) : Memory (GB) : Disk (數量) – Data Node
  • 13. 13 Node Considerations Dell PowerEdge R730 Dell PowerEdge R730 Dell PowerEdge R730 Dell PowerEdge R730xd
  • 15. 15 HDFS Capacity •  HDFS protects information through replication of the data between nodes, the default Replication Factor is 3, but is configurable. •  HDFS Raw Capacity = Number of Compute Nodes x Number of Drives x Capacity of Drives •  HDFS Usable Capacity = HDFS Raw Capacity/Replication Factor
  • 16. 16 Big Data Networking Best Practices •  Traditional Ethernet is used since it’s affordable and already prevalent. •  1GbE networking was used initially in early drafts of the solution but with the reduction in cost it’s much more efficient to go with 10GbE. •  Multiple ports are teamed both for redundancy and throughput. LACP or software bonding are the most common methods. •  IPv4 is most widely used. IPv6 has limited support at the OS and Hadoop level.
  • 17. 17 Attributes of a Good Switch for Big Data •  Non-blocking backplane •  Deep per-port packet buffers (shared buffers do not work well). During sort/shuffle phases of map/reduce operations network traffic is so chaotic that it can saturate any and all shared buffers, impacting multiple host’s network performance. •  Good choices: –  1GbE ›  S55 ›  S60 –  10GbE ›  S4810 ›  S5000 –  40GbE ›  Z9000 ›  Z9500 ›  S6000
  • 18. 18 Dell Hadoop Solution Logical Diagram
  • 20. 20 Dell Points of Integration •  VLT / VRRP is a very affordable way to team switches both at the ToR and the aggregation tiers. This makes the Dell Networking Force10 switches a great choice. •  Active Fabric Manager –  Speeds up the creation and administration of the required VLT / VRRP configuration on the switches. –  Helps with capacity-planning as customer scale
  • 21. 21 Big Data Networking Futures •  40GbE onboard LOMs will begin to be used for high-volume clusters. Right now the cost:benefit ratio isn’t there yet. •  As HPC and Big Data converge, we’ll start to see the use of IB for node-to-node connectivity. •  In-memory (Spark / Impala) workloads are reducing the bottlenecks that used to exist at the disk and now move to the processor and network. Expect customers to be looking to increase core counts and network speed to overcome this.
  • 22. @Dell_Enterprise Enterprise Solutions Etu+Dell = complete Hadoop/Big Data solution provider Best of breed Cloudera partners - Etu Analytic software solutions for Big Data Dell Professional Services for Big Data Dell PowerEdge 13G servers Dell Networking solutions Installation and configuration service Complete end-to-end implementation Discover Plan ImplementInvestigate
  • 23. 2. Store1. Integrate 4. Act 3. Analyze Solution architecture Analytical output Toad Data Point Desktop – integrate, cleanse Dell Boomi Cloud – integrate, correlate Toad Intelligence Central Data aggregation and virtualization Dell STATISTICA Customer data Order data Events Stock market data Advanced Analytics Marketing campaigns Dell Statistica Big Data Desktop – crawl, save Social Media
  • 24. 24 Futures •  Speed Improvements in Map / Reduce •  More in-memory workloads –  Possible move to Spark to replace Map/Reduce •  Virtualized Hadoop –  VMWare Big Data Extensions –  Openstack Sahara –  Microsoft HDInsights (Hortonworks)
  • 25. 25 Dell In-Memory Appliance for Cloudera Enterprise Configurations at a glance Mid-Size Configuration 16 Node Cluster PowerEegeR720- 4 Infrastructure Nodes with ProSupport PowerEdgeR720XD- 12 Data Nodes with ProSupport Cloudera Enterprise Force10- S4810P Force10- S55 Dell Rack 42U ~528TB (disk raw space) Starter Configuration 8 Node Cluster PowerEdge R720- 4 Infrastructure Nodes with ProSupport PowerEdgeR720XD- 4 Data Nodes with ProSupport Cloudera Enterprise Force10- S4810P Force10- S55 Dell Rack 42U ~176TB (disk raw space) Small Enterprise Configuration 24 Node Cluster PowerEdgeR720- 4 Infrastructure Nodes with ProSupport PowerEdgeR720XD- 20 Data Nodes with ProSupport Cloudera Enterprise Force10- S4810P Force10- S55 Dell Rack 42U ~880TB (disk raw space) Expansion Unit- PowerEdgeR720XD-4 Data Nodes w ProSupport, Cloudera Enterprise, Scales in Blocks