SlideShare a Scribd company logo
1 of 62
Cloud Computing Systems
Lin Gu
Hong Kong University of Science and Technology
Sept. 14, 2011
How to effectively compute in a datacenter?
Is MapReduce the best answer to computation in the cloud?
What is the limitation of MapReduce?
How to provide general-purpose parallel processing in
DCs?
• MapReduce—parallel computing for Web-scale
data processing
• Fundamental component in Google’s
technological architecture
– Why didn’t Google use parallel Fortran, MPI, …?
• Followed by many technology firms
The MapReduce Approach
Program Execution on Web-Scale DataProgram Execution on Web-Scale Data
MapReduce
Old ideas can be fabulous, too!
( = Lisp “Lost In Silly Parentheses”) ?
• Map and Fold
– Map: do something to all elements in a list
– Fold: aggregate elements of a list
• Used in functional programming languages
such as Lisp
• Map is a higher-order function: apply an op to all
elements in a list
– Result is a new list
• Parallelizable
f f f f f
MapReduce
(map (lambda (x) (* x x))
'(1 2 3 4 5))
→ '(1 4 9 16 25)
• Reduce is also a higher-order function
• Like “fold”: aggregate elements of a list
– Accumulator set to initial value
– Function applied to list element and the accumulator
– Result stored in the accumulator
– Repeated for every item in the list
– Result is the final value in the accumulator
f f f f f
final result
Initial value
(fold + 0 '(1 2 3 4 5))
→ 15
(fold * 1 '(1 2 3 4 5))
→ 120
The MapReduce Approach
Program Execution on Web-Scale DataProgram Execution on Web-Scale Data
Massive parallel processing made simple
• Example: word count
• Map: parse a document and generate <word, 1> pairs
• Reduce: receive all pairs for a specific word, and count
(sum)
// D is a document
for each word w in D
output <w, 1>
Map Reduce
Reduce for key w:
count = 0
for each input item
count = count + 1
output <w, count>
The MapReduce Approach
Program Execution on Web-Scale DataProgram Execution on Web-Scale Data
Design Context
• Big data, but simple dependence
– Relatively easy to partition data
• Supported by a distributed system
– Distributed OS services across thousands of
commodity PCs (e.g., GFS)
• First users are search oriented
– Crawl, index, search
Designed years ago, still working today, growing adoptions
Single Master node
Worker threads
Worker threads
Workflow
Single master, numerous worker threads
Workflow
• 1. The MapReduce library in the user program first
splits the input files into M pieces of typically 16
megabytes to 64 megabytes (MB) per piece. It then
starts up many copies of the program on a cluster of
machines.
• 2. One of the copies of the program is the master. The
rest are workers that are assigned work by the master.
There are M map tasks and R reduce tasks to assign.
The master picks idle workers and assigns each one a
map task or a reduce task.
Workflow
• 3. A worker who is assigned a map task reads the
contents of the corresponding input split. It parses
key/value pairs out of the input data and passes each
pair to the user-defined Map function. The
intermediate key/value pairs produced by the Map
function are buffered in memory.
• 4. Periodically, the buffered pairs are written to local
disk, partitioned into R regions by the partitioning
function. The locations of these buffered pairs on the
local disk are passed back to the master, who is
responsible for forwarding these locations to the
reduce workers.
Workflow
• 5. When a reduce worker is notified by the master about
these locations, it uses RPCs to read the buffered data
from the local disks of the map workers. When a reduce
worker has read all intermediate data, it sorts it by the
intermediate keys so that all occurrences of the same key
are grouped together.
• 6. The reduce worker iterates over the sorted
intermediate data and for each unique intermediate key
encountered, it passes the key and the corresponding set
of intermediate values to the Reduce function. The output
of the Reduce function is appended to a final output file
for this reduce partition.
• 7. When all map tasks and reduce tasks have been
completed, the MapReduce returns back to the user code.
Programming
• How to write a MapReduce programto
–Generate inverted indices?
–Sort?
• How to express more sophisticated
logic?
• What if some workers (slaves) or the
master fails?
Workflow
Where is the communication-intensive part?
Initial data split
into 64MB blocks
Computed, results
locally stored
Master informed of
result locations
R reducers retrieve
Data from mappers
Final output written
• Distributed, scalable storage for key-value pairs
• Example: Dynamo (Amazon)
• Another example may be P2P storage (e.g., Chord)
• Key-value store can be a general foundation for more
complex data structures
• But performance may suffer
Data Storage – Key-Value Store
Data Storage – Key-Value Store
Dynamo: a decentralized, scalable key-value
store
– Used in Amazon
– Use consistent hashing to distributed data
among nodes
– Replicated, versioning, load balanced
– Easy-to-use interface: put()/get()
• Networked block storage
– ND by SUN Microsystems
• Remote block storage over Internet
– Use S3 as a block device [Brantner]
• Block-level remote storage may become slow in
networks with long latencies
Data Storage – Network Block Device
• PC file systems
• Link together all clusters of a file
– Directory entry: filename, attributes, date/time,
starting cluster, file size
• Boot sector (superblock) : file system wide
information
• File allocation table, root directory, …
Data Storage – Traditional File Systems
Boot
sector
FAT 1 FAT 2
(dup)
ROOT dir Normal directories and files
• NFS—Network File System [Sandberg]
– Designed by SUN Microsystems in the 1980’s
• Transparent remote access to files stored
remotely
– XDR, RPC, VNode, VFS
– Mountable file system, synchronous behavior
• Stateless server
Data Storage – Network File System
NFS organization
Client Server
Data Storage – Network File System
• A distributed file system at work (GFS)
• Single master and numerous slaves communicate with each other
• File data unit, “chunk”, is up to 64MB. Chunks are replicated.
• “master” is a single point of failure and bottleneck of scalability,
the consistency model is difficult to use
Data Storage – Google File System (GFS)
22
E 75656 C
A 42342 E
B 42521 W
C 66354 W
D 12352 E
F 15677 E
E 75656 C
A 42342 E
B 42521 W
C 66354 W
D 12352 E
F 15677 E
CREATE TABLE Parts (
ID VARCHAR,
StockNumber INT,
Status VARCHAR
…
)
CREATE TABLE Parts (
ID VARCHAR,
StockNumber INT,
Status VARCHAR
…
)
Parallel databaseParallel database ReplicationReplication
Indexes and viewsIndexes and views
Structured schemaStructured schema
A 42342 E
B 42521 W
C 66354 W
D 12352 E
E 75656 C
F 15677 E
Data Storage – Database
Designed and used by
Yahoo!
PNUTS – a relational database service
MapReduce/Hadoop
• Around 2004, Google invented MapReduce to
parallelize computation of large data sets. It’s been a
key component in Google’s technology foundation
• Around 2008, Yahoo! developed the open-source
variant of MapReduce named Hadoop
• After 2008, MapReduce/Hadoop become a key
technology component in cloud computing
• In 2010, the U.S. conferred the MapReduce patent to
Google
MapReduce … Hadoop or variants …Hadoop
• MapReduce provides an easy-to-use framework for parallel
programming, but is it the most efficient and best solution to
program execution in datacenters?
• MapReduce has its discontents
– DeWitt and Stonebraker: “MapReduce: A major step backwards” –
MapReduce is far less sophisticated and efficient than parallel query
processing
• MapReduce is a parallel processing framework, not a database
system, nor a query language
– It is possible to use MapReduce to implement some of the parallel query
processing functions
– What are the real limitations?
• Inefficient for general programming (and not designed for that)
– Hard to handle data with complex dependence, frequent updates, etc.
– High overhead, bursty I/O, difficult to handle long streaming data
– Limited opportunity for optimization
MapReduce—LimitationsMapReduce—Limitations
Critiques
MapReduce: A major step backwards
-- David J. DeWitt and Michael Stonebraker
(MapReduce) is
– A giant step backward in the programming paradigm for large-
scale data intensive applications
– A sub-optimal implementation, in that it uses brute force
instead of indexing
– Not novel at all
– Missing features
– Incompatible with all of the tools DBMS users have come to
depend on
• Inefficient for general programming (and not designed
for that)
– Hard to handle data with complex dependence, frequent
updates, etc.
– High overhead, bursty I/O
• Experience with developing a Hadoop-based distributed
compiler
– Workload: compile Linux kernel
– 4 machines available to Hadoop for parallel compiling
– Observation: parallel compiling on 4 nodes with Hadoop can
be even slower than sequential compiling on one node
MapReduce—LimitationsMapReduce—Limitations
• Proprietary solution developed in an environment with
one prevailing application (web search)
– The assumptions introduce several important constraints in
data and logic
– Not a general-purpose parallel execution technology
• Design choices in MapReduce
– Optimizes for throughput rather than latency
– Optimizes for large data set rather than small data structures
– Optimizes for coarse-grained parallelism rather than fine-
grained
Re-thinking MapReduceRe-thinking MapReduce
• A lightweight parallelization framework following the
MapReduce paradigm
– Implemented in C++
– More than just an efficient implementation of MapReduce
– Goal: a lightweight “parallelization” service that programs
can invoke during execution
• MRlite follows several principles
– Memory is media—avoid touching hard drives
– Static facility for dynamic utility—use and reuse threads
for map tasks
MRlite: Lightweight Parallel ProcessingMRlite: Lightweight Parallel Processing
MRlite : Towards Lightweight, Scalable, and
General Parallel Processing
MRlite clientMRlite client
MRlite master
scheduler
MRlite master
scheduler
slaveslave
slaveslave
slaveslave
slaveslave
applicationapplication
Data flow
Command flow
Linked together with the
app, the MRlite client
library accepts calls from
app and submits jobs to
the master
Linked together with the
app, the MRlite client
library accepts calls from
app and submits jobs to
the master High speed distributed
storage, stores
intermediate files
High speed distributed
storage, stores
intermediate files
The MRlite master accepts jobs
from clients and schedules them
to execute on slaves
The MRlite master accepts jobs
from clients and schedules them
to execute on slaves
Distributed nodes
accept tasks from
master and execute
them
Distributed nodes
accept tasks from
master and execute
them
30
Computing Capability
Using MRlite, the parallel compilation jobs, mrcc, is 10Using MRlite, the parallel compilation jobs, mrcc, is 10
times faster than that running on Hadoop!times faster than that running on Hadoop!
Z. Ma and L. Gu. The Limitation of MapReduce: a
Probing Case and a Lightweight Solution. CLOUD
COMPUTING 2010
Network activities under MapReduce/Hadoop workload
• Hadoop: open-source implementation of MapReduce
• Processing data with 3 servers (20 cores)
– 116.8GB input data
• Network activities captured with Xen virtual
machines
Inside MapReduce-Style ComputationInside MapReduce-Style Computation
Workflow
Where is the communication-intensive part?
Initial data split
into 64MB blocks
Computed, results
locally stored
Master informed of
result locations
R reducers retrieve
Data from mappers
Final output written
• Packet reception under MapReduce/Hadoop workload
– Large data volume
– Bursty network traffic
• Genrality—widely observed in MapReduce workloads
Packet reception
on a slave server
Inside MapReduceInside MapReduce
Packet reception on the master server
Inside MapReduceInside MapReduce
Packet transmission on the master server
Inside MapReduceInside MapReduce
Major Components of a Datacenter
• Computing hardware (equipment racks)
• Power supply and distribution hardware
• Cooling hardware and cooling fluid
distribution hardware
• Network infrastructure
• IT Personnel and office equipment
Datacenter Networking
Growth Trends in Datacenters
• Load on network & servers continues to rapidly grow
– Rapid growth: a rough estimate of annual growth rate:
enterprise data centers: ~35%, Internet data centers: 50% -
100%
– Information access anywhere, anytime, from many devices
• Desktops, laptops, PDAs & smart phones, sensor
networks, proliferation of broadband
• Mainstream servers moving towards higher speed links
– 1-GbE to10-GbE in 2008-2009
– 10-GbE to 40-GbE in 2010-2012
• High-speed datacenter-MAN/WAN connectivity
– High-speed datacenter syncing for disaster recovery
Datacenter Networking
• A large part of the total cost of the DC hardware
– Large routers and high-bandwidth switches are very
expensive
• Relatively unreliable – many components may fail.
• Many major operators and companies design their
own datacenter networking to save money and
improve reliability/scalability/performance.
– The topology is often known
– The number of nodes is limited
– The protocols used in the DC are known
• Security is simpler inside the data center, but
challenging at the border
• We can distribute applications to servers to distribute
load and minimize hot spots
Datacenter Networking
Networking components (examples)
• High Performance & High
Density Switches & Routers
– Scaling to 512 10GbE ports per
chassis
– No need for proprietary
protocols to scale
• Highly scalable DC
Border Routers
– 3.2 Tbps capacity in a single
chassis
– 10 Million routes, 1 Million in
hardware
– 2,000 BGP peers
– 2K L3 VPNs, 16K L2 VPNs
– High port density for GE and
10GE application connectivity
– Security
768 1-GE port Downstream
64 10-GE port Upstream
Datacenter Networking
Common data center topology
Internet
Servers
Layer-2 switchAccess
Data Center
Layer-2/3 switchAggregation
Layer-3 routerCore
Datacenter Networking
Data center network design goals
• High network bandwidth, low latency
• Reduce the need for large switches in the core
• Simplify the software, push complexity to the
edge of the network
• Improve reliability
• Reduce capital and operating cost
Datacenter Networking
Avoid this…
Data Center Networking
and simplify this…and simplify this…
??
Can we avoid using high-end switches?
• Expensive high-end switches to
scale up
• Single point of failure and
bandwidth bottleneck
– Experiences from real systems
• One answer: DCell
43
Interconnect
DCell Ideas
• #1: Use mini-switches to scale out
• #2: Leverage servers to be part of the routing
infrastructure
– Servers have multiple ports and need to forward
packets
• #3: Use recursion to scale and build complete
graph to increase capacity
Interconnect
One approach: switched network with
a hypercube interconnect
• Leaf switch: 40 1Gbps ports+2 10 Gbps ports.
– One switch per rack.
– Not replicated (if a switch fails, lose one rack of
capacity)
• Core switch: 10 10Gbps ports
– Form a hypercube
• Hypercube – high-dimensional rectangle
Data Center Networking
Hypercube properties
• Minimum hop count
• Even load distribution for all-all communication.
• Can route around switch/link failures.
• Simple routing:
– Outport = f(Dest xor NodeNum)
– No routing tables
Interconnect
A 16-node (dimension 4) hypercube
0
32
10
0
12
3
3
1
30
0
2
1
5
47
3
2
10
118
1
1
1
1
1
2
2
22
2
0
0
0
0
3
3
3
3
Interconnect
Interconnect
How many servers can
be connected in this
system?
81920 servers with
1Gbps bandwidth
Core switch:
10Gbps port x 10
Leaf switch: 1Gbps port x
40 + 10Gbps port x 2.
The Black Box
Data Center Networking
Shipping Container as Data Center Module
• Data Center Module
– Contains network gear, compute, storage, &
cooling
– Just plug in power, network, & chilled water
• Increased cooling efficiency
– Water & air flow
– Better air flow management
• Meet seasonal load requirements
Data Center Network
Unit of Data Center Growth
• One at a time:
– 1 system
– Racking & networking: 14 hrs ($1,330)
• Rack at a time:
– ~40 systems
– Install & networking: .75 hrs ($60)
• Container at a time:
– ~1,000 systems
– No packaging to remove
– No floor space required
– Power, network, & cooling only
– Weatherproof & easy to transport
• Data center construction takes 24+
months
Data Center Network
Multiple-Site Redundancy and Enhanced
Performance using load balancing
• Handling site failures
transparently
• Providing best site
selection per user
• Leveraging both DNS and
non-DNS methods for
multi-site redundancy
• Providing disaster
recovery and non-stop
operation
LB system
DNS
Datacenter
Datacenter
Datacenter
LB (load balancing) System
• The load balancing systems regulate global data center traffic
• Incorporates site health, load, user proximity, and service response for user
site selection
• Provides transparent site failover in case of disaster or service outage
Global Data Center
Deployment Problems
Data Center Network
Challenges and Research Problems
Hardware
– High-performance, reliable, cost-effective
computing infrastructure
– Cooling, air cleaning, and energy efficiency
[Barraso]
Clusters
[Fan] Power
[Andersen]
FAWN
[Reghavendra]
Power
Challenges and Research Problems
System software
– Operating systems
– Compilers
– Database
– Execution engines and containers
Ghemawat: GFS
Chang: Bigtable
DeCandia:
Dynamo
Brantner: DB on
S3
Cooper: PNUTS
Yu: DryadLINQ
Dean:
MapReduce
Burrows:
Chubby Isard: Quincy
Challenges and Research Problems
Networking
– Interconnect and global network structuring
– Traffic engineering
Al-Fares:
Commodity DC
Guo 2008: DCell
Guo 2009: BCube
Challenges and Research Problems
• Data and programming
– Data consistency mechanisms (e.g., replications)
– Fault tolerance
– Interfaces and semantics
• Software engineering
• User interface
• Application architecture
Pike: Sawzall
Olston: Pig
Latin
Buyya: IT
services
Resources
• [Al-Fares] Al-Fares, M., Loukissas, A., and Vahdat, A. A scalable, commodity data center
network architecture. In Proceedings of the ACM SIGCOMM 2008 Conference on Data
Communication (Seattle, WA, USA, August 17 - 22, 2008). SIGCOMM '08. 63-74.
http://baijia.info/showthread.php?tid=139
• [Andersen] David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee,
Lawrence Tan, Vijay Vasudevan. FAWN: A Fast Array of Wimpy Nodes. SOSP'09.
http://baijia.info/showthread.php?tid=179
• [Barraso] Luiz Barroso, Jeffrey Dean, Urs Hoelzle, "Web Search for a Planet: The Google
Cluster Architecture," IEEE Micro, vol. 23, no. 2, pp. 22-28, Mar./Apr. 2003
http://baijia.info/showthread.php?tid=133
• [Brantner] Brantner, M., Florescu, D., Graf, D., Kossmann, D., and Kraska, T. Building a
database on S3. In Proceedings of the 2008 ACM SIGMOD international Conference on
Management of Data (Vancouver, Canada, June 09 - 12, 2008). SIGMOD '08. 251-264.
http://baijia.info/showthread.php?tid=125
Resources
• [Burrows] Burrows, M. The Chubby lock service for loosely-coupled distributed systems.
In Proceedings of the 7th Symposium on Operating Systems Design and Implementation
(Seattle, Washington, November 06 - 08, 2006). 335-350. .
http://baijia.info/showthread.php?tid=59
• [Buyya] Buyya, R. Chee Shin Yeo Venugopal, S. Market-Oriented Cloud Computing. The
10th IEEE International Conference on High Performance Computing and
Communications, 2008. HPCC '08. http://baijia.info/showthread.php?tid=248
• [Chang] Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M.,
Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: a distributed storage system for
structured data. In Proceedings of the 7th Symposium on Operating Systems Design and
Implementation (Seattle, Washington, November 06 - 08, 2006). 205-218.
http://baijia.info/showthread.php?tid=4
• [Cooper] Cooper, B. F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P.,
Jacobsen, H., Puz, N., Weaver, D., and Yerneni, R. PNUTS: Yahoo!'s hosted data serving
platform. Proc. VLDB Endow. 1, 2 (Aug. 2008), 1277-1288.
http://baijia.info/showthread.php?tid=126
Resources
• [Dean] Dean, J. and Ghemawat, S. 2004. MapReduce: simplified data processing on large
clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems
Design & Implementation - Volume 6 (San Francisco, CA, December 06 - 08, 2004).
http://baijia.info/showthread.php?tid=2
• [DeCandia] DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A.,
Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. 2007. Dynamo: amazon's
highly available key-value store. In Proceedings of Twenty-First ACM SIGOPS Symposium
on Operating Systems Principles (Stevenson, Washington, USA, October 14 - 17, 2007).
SOSP '07. ACM, New York, NY, 205-220. http://baijia.info/showthread.php?tid=120
• [Fan] Fan, X., Weber, W., and Barroso, L. A. Power provisioning for a warehouse-sized
computer. In Proceedings of the 34th Annual international Symposium on Computer
Architecture (San Diego, California, USA, June 09 - 13, 2007). ISCA '07. 13-23.
http://baijia.info/showthread.php?tid=144
Resources
• [Ghemawat] Ghemawat, S., Gobioff, H., and Leung, S. 2003. The Google file system. In
Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles
(Bolton Landing, NY, USA, October 19 - 22, 2003). SOSP '03. ACM, New York, NY, 29-43.
http://baijia.info/showthread.php?tid=1
• [Guo 2008] Chuanxiong Guo, Haitao Wu, Kun Tan, Lei Shi, Yongguang Zhang, and
Songwu Lu, DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers, in
ACM SIGCOMM 08. http://baijia.info/showthread.php?tid=142
• [Guo 2009] Chuanxiong Guo, Guohan Lu, Dan Li, Xuan Zhang, Haitao Wu, Yunfeng Shi,
Chen Tian, Yongguang Zhang, and Songwu Lu, BCube: A High Performance, Server-
centric Network Architecture for Modular Data Centers, in ACM SIGCOMM 09.
http://baijia.info/showthread.php?tid=141
• [Isard] Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar and
Andrew Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. SOSP'09.
http://baijia.info/showthread.php?tid=203
Resources
• [Olston] Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. 2008. Pig Latin: a
not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD
international Conference on Management of Data (Vancouver, Canada, June 09 - 12,
2008). SIGMOD '08. 1099-1110. http://baijia.info/showthread.php?tid=124
• [Pike] Pike, R., Dorward, S., Griesemer, R., and Quinlan, S. 2005. Interpreting the data:
Parallel analysis with Sawzall. Sci. Program. 13, 4 (Oct. 2005), 277-298.
http://baijia.info/showthread.php?tid=60
• [Reghavendra] Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui
Wang, Xiaoyun Zhu. No "Power" Struggles: Coordinated Multi-level Power Management
for the Data Center. In Proceedings of the International Conference on Architectural
Support for Programming Languages and Operating Systems (ASPLOS), Seattle, WA,
March 2008. http://baijia.info/showthread.php?tid=183
• [Yu] Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey.
DryadLINQ: A system for general-purpose distributed data-parallel computing using a
high-level language. In Proceedings of the 8th Symposium on Operating Systems Design
and Implementation (OSDI), December 8-10 2008. http://baijia.info/showthread.php?
tid=5
Thank you!
Questions?

More Related Content

What's hot

Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reducePaladion Networks
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupCsaba Toth
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programsjani shaik
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part IMarin Dimitrov
 

What's hot (20)

Hadoop by sunitha
Hadoop by sunithaHadoop by sunitha
Hadoop by sunitha
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User Group
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
try
trytry
try
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 

Viewers also liked (20)

Diamonds
DiamondsDiamonds
Diamonds
 
FPATH
FPATHFPATH
FPATH
 
E&PP Stats
E&PP StatsE&PP Stats
E&PP Stats
 
Damaaram album
Damaaram albumDamaaram album
Damaaram album
 
James bond
James bondJames bond
James bond
 
Open data citizen engagement
Open data citizen engagementOpen data citizen engagement
Open data citizen engagement
 
02english
02english02english
02english
 
Neuron
NeuronNeuron
Neuron
 
La unificación-de-italia
La unificación-de-italiaLa unificación-de-italia
La unificación-de-italia
 
Fenicios
FeniciosFenicios
Fenicios
 
Mesopotamia expo
Mesopotamia  expoMesopotamia  expo
Mesopotamia expo
 
Sarah monda adam smith character education pioneer
Sarah monda adam smith character education pioneerSarah monda adam smith character education pioneer
Sarah monda adam smith character education pioneer
 
Under construction sign
Under construction signUnder construction sign
Under construction sign
 
Karin Janson Empathy Map
Karin Janson Empathy MapKarin Janson Empathy Map
Karin Janson Empathy Map
 
Coca cola
Coca colaCoca cola
Coca cola
 
Demoppt
DemopptDemoppt
Demoppt
 
Distributor milano usa 2014
Distributor milano usa 2014Distributor milano usa 2014
Distributor milano usa 2014
 
Muckross Hockey Club - Kukri Kit
Muckross Hockey Club - Kukri KitMuckross Hockey Club - Kukri Kit
Muckross Hockey Club - Kukri Kit
 
Disney kuskova katherine
Disney kuskova katherineDisney kuskova katherine
Disney kuskova katherine
 
Ecg line
Ecg lineEcg line
Ecg line
 

Similar to Cloud Computing Systems MapReduce Limitations

Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2Fabio Fumarola
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseLukas Vlcek
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 

Similar to Cloud Computing Systems MapReduce Limitations (20)

Hadoop
HadoopHadoop
Hadoop
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
hadoop
hadoophadoop
hadoop
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
 
Mapreduce Hadop.pptx
Mapreduce Hadop.pptxMapreduce Hadop.pptx
Mapreduce Hadop.pptx
 
big data ppt.ppt
big data ppt.pptbig data ppt.ppt
big data ppt.ppt
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Hadoop
HadoopHadoop
Hadoop
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Cloud Computing Systems MapReduce Limitations

  • 1. Cloud Computing Systems Lin Gu Hong Kong University of Science and Technology Sept. 14, 2011
  • 2. How to effectively compute in a datacenter? Is MapReduce the best answer to computation in the cloud? What is the limitation of MapReduce? How to provide general-purpose parallel processing in DCs?
  • 3. • MapReduce—parallel computing for Web-scale data processing • Fundamental component in Google’s technological architecture – Why didn’t Google use parallel Fortran, MPI, …? • Followed by many technology firms The MapReduce Approach Program Execution on Web-Scale DataProgram Execution on Web-Scale Data
  • 4. MapReduce Old ideas can be fabulous, too! ( = Lisp “Lost In Silly Parentheses”) ? • Map and Fold – Map: do something to all elements in a list – Fold: aggregate elements of a list • Used in functional programming languages such as Lisp
  • 5. • Map is a higher-order function: apply an op to all elements in a list – Result is a new list • Parallelizable f f f f f MapReduce (map (lambda (x) (* x x)) '(1 2 3 4 5)) → '(1 4 9 16 25)
  • 6. • Reduce is also a higher-order function • Like “fold”: aggregate elements of a list – Accumulator set to initial value – Function applied to list element and the accumulator – Result stored in the accumulator – Repeated for every item in the list – Result is the final value in the accumulator f f f f f final result Initial value (fold + 0 '(1 2 3 4 5)) → 15 (fold * 1 '(1 2 3 4 5)) → 120 The MapReduce Approach Program Execution on Web-Scale DataProgram Execution on Web-Scale Data
  • 7. Massive parallel processing made simple • Example: word count • Map: parse a document and generate <word, 1> pairs • Reduce: receive all pairs for a specific word, and count (sum) // D is a document for each word w in D output <w, 1> Map Reduce Reduce for key w: count = 0 for each input item count = count + 1 output <w, count> The MapReduce Approach Program Execution on Web-Scale DataProgram Execution on Web-Scale Data
  • 8. Design Context • Big data, but simple dependence – Relatively easy to partition data • Supported by a distributed system – Distributed OS services across thousands of commodity PCs (e.g., GFS) • First users are search oriented – Crawl, index, search Designed years ago, still working today, growing adoptions
  • 9. Single Master node Worker threads Worker threads Workflow Single master, numerous worker threads
  • 10. Workflow • 1. The MapReduce library in the user program first splits the input files into M pieces of typically 16 megabytes to 64 megabytes (MB) per piece. It then starts up many copies of the program on a cluster of machines. • 2. One of the copies of the program is the master. The rest are workers that are assigned work by the master. There are M map tasks and R reduce tasks to assign. The master picks idle workers and assigns each one a map task or a reduce task.
  • 11. Workflow • 3. A worker who is assigned a map task reads the contents of the corresponding input split. It parses key/value pairs out of the input data and passes each pair to the user-defined Map function. The intermediate key/value pairs produced by the Map function are buffered in memory. • 4. Periodically, the buffered pairs are written to local disk, partitioned into R regions by the partitioning function. The locations of these buffered pairs on the local disk are passed back to the master, who is responsible for forwarding these locations to the reduce workers.
  • 12. Workflow • 5. When a reduce worker is notified by the master about these locations, it uses RPCs to read the buffered data from the local disks of the map workers. When a reduce worker has read all intermediate data, it sorts it by the intermediate keys so that all occurrences of the same key are grouped together. • 6. The reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered, it passes the key and the corresponding set of intermediate values to the Reduce function. The output of the Reduce function is appended to a final output file for this reduce partition. • 7. When all map tasks and reduce tasks have been completed, the MapReduce returns back to the user code.
  • 13. Programming • How to write a MapReduce programto –Generate inverted indices? –Sort? • How to express more sophisticated logic? • What if some workers (slaves) or the master fails?
  • 14. Workflow Where is the communication-intensive part? Initial data split into 64MB blocks Computed, results locally stored Master informed of result locations R reducers retrieve Data from mappers Final output written
  • 15. • Distributed, scalable storage for key-value pairs • Example: Dynamo (Amazon) • Another example may be P2P storage (e.g., Chord) • Key-value store can be a general foundation for more complex data structures • But performance may suffer Data Storage – Key-Value Store
  • 16. Data Storage – Key-Value Store Dynamo: a decentralized, scalable key-value store – Used in Amazon – Use consistent hashing to distributed data among nodes – Replicated, versioning, load balanced – Easy-to-use interface: put()/get()
  • 17. • Networked block storage – ND by SUN Microsystems • Remote block storage over Internet – Use S3 as a block device [Brantner] • Block-level remote storage may become slow in networks with long latencies Data Storage – Network Block Device
  • 18. • PC file systems • Link together all clusters of a file – Directory entry: filename, attributes, date/time, starting cluster, file size • Boot sector (superblock) : file system wide information • File allocation table, root directory, … Data Storage – Traditional File Systems Boot sector FAT 1 FAT 2 (dup) ROOT dir Normal directories and files
  • 19. • NFS—Network File System [Sandberg] – Designed by SUN Microsystems in the 1980’s • Transparent remote access to files stored remotely – XDR, RPC, VNode, VFS – Mountable file system, synchronous behavior • Stateless server Data Storage – Network File System
  • 20. NFS organization Client Server Data Storage – Network File System
  • 21. • A distributed file system at work (GFS) • Single master and numerous slaves communicate with each other • File data unit, “chunk”, is up to 64MB. Chunks are replicated. • “master” is a single point of failure and bottleneck of scalability, the consistency model is difficult to use Data Storage – Google File System (GFS)
  • 22. 22 E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Parallel databaseParallel database ReplicationReplication Indexes and viewsIndexes and views Structured schemaStructured schema A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E Data Storage – Database Designed and used by Yahoo! PNUTS – a relational database service
  • 23. MapReduce/Hadoop • Around 2004, Google invented MapReduce to parallelize computation of large data sets. It’s been a key component in Google’s technology foundation • Around 2008, Yahoo! developed the open-source variant of MapReduce named Hadoop • After 2008, MapReduce/Hadoop become a key technology component in cloud computing • In 2010, the U.S. conferred the MapReduce patent to Google MapReduce … Hadoop or variants …Hadoop
  • 24. • MapReduce provides an easy-to-use framework for parallel programming, but is it the most efficient and best solution to program execution in datacenters? • MapReduce has its discontents – DeWitt and Stonebraker: “MapReduce: A major step backwards” – MapReduce is far less sophisticated and efficient than parallel query processing • MapReduce is a parallel processing framework, not a database system, nor a query language – It is possible to use MapReduce to implement some of the parallel query processing functions – What are the real limitations? • Inefficient for general programming (and not designed for that) – Hard to handle data with complex dependence, frequent updates, etc. – High overhead, bursty I/O, difficult to handle long streaming data – Limited opportunity for optimization MapReduce—LimitationsMapReduce—Limitations
  • 25. Critiques MapReduce: A major step backwards -- David J. DeWitt and Michael Stonebraker (MapReduce) is – A giant step backward in the programming paradigm for large- scale data intensive applications – A sub-optimal implementation, in that it uses brute force instead of indexing – Not novel at all – Missing features – Incompatible with all of the tools DBMS users have come to depend on
  • 26. • Inefficient for general programming (and not designed for that) – Hard to handle data with complex dependence, frequent updates, etc. – High overhead, bursty I/O • Experience with developing a Hadoop-based distributed compiler – Workload: compile Linux kernel – 4 machines available to Hadoop for parallel compiling – Observation: parallel compiling on 4 nodes with Hadoop can be even slower than sequential compiling on one node MapReduce—LimitationsMapReduce—Limitations
  • 27. • Proprietary solution developed in an environment with one prevailing application (web search) – The assumptions introduce several important constraints in data and logic – Not a general-purpose parallel execution technology • Design choices in MapReduce – Optimizes for throughput rather than latency – Optimizes for large data set rather than small data structures – Optimizes for coarse-grained parallelism rather than fine- grained Re-thinking MapReduceRe-thinking MapReduce
  • 28. • A lightweight parallelization framework following the MapReduce paradigm – Implemented in C++ – More than just an efficient implementation of MapReduce – Goal: a lightweight “parallelization” service that programs can invoke during execution • MRlite follows several principles – Memory is media—avoid touching hard drives – Static facility for dynamic utility—use and reuse threads for map tasks MRlite: Lightweight Parallel ProcessingMRlite: Lightweight Parallel Processing
  • 29. MRlite : Towards Lightweight, Scalable, and General Parallel Processing MRlite clientMRlite client MRlite master scheduler MRlite master scheduler slaveslave slaveslave slaveslave slaveslave applicationapplication Data flow Command flow Linked together with the app, the MRlite client library accepts calls from app and submits jobs to the master Linked together with the app, the MRlite client library accepts calls from app and submits jobs to the master High speed distributed storage, stores intermediate files High speed distributed storage, stores intermediate files The MRlite master accepts jobs from clients and schedules them to execute on slaves The MRlite master accepts jobs from clients and schedules them to execute on slaves Distributed nodes accept tasks from master and execute them Distributed nodes accept tasks from master and execute them
  • 30. 30 Computing Capability Using MRlite, the parallel compilation jobs, mrcc, is 10Using MRlite, the parallel compilation jobs, mrcc, is 10 times faster than that running on Hadoop!times faster than that running on Hadoop! Z. Ma and L. Gu. The Limitation of MapReduce: a Probing Case and a Lightweight Solution. CLOUD COMPUTING 2010
  • 31. Network activities under MapReduce/Hadoop workload • Hadoop: open-source implementation of MapReduce • Processing data with 3 servers (20 cores) – 116.8GB input data • Network activities captured with Xen virtual machines Inside MapReduce-Style ComputationInside MapReduce-Style Computation
  • 32. Workflow Where is the communication-intensive part? Initial data split into 64MB blocks Computed, results locally stored Master informed of result locations R reducers retrieve Data from mappers Final output written
  • 33. • Packet reception under MapReduce/Hadoop workload – Large data volume – Bursty network traffic • Genrality—widely observed in MapReduce workloads Packet reception on a slave server Inside MapReduceInside MapReduce
  • 34. Packet reception on the master server Inside MapReduceInside MapReduce
  • 35. Packet transmission on the master server Inside MapReduceInside MapReduce
  • 36. Major Components of a Datacenter • Computing hardware (equipment racks) • Power supply and distribution hardware • Cooling hardware and cooling fluid distribution hardware • Network infrastructure • IT Personnel and office equipment Datacenter Networking
  • 37. Growth Trends in Datacenters • Load on network & servers continues to rapidly grow – Rapid growth: a rough estimate of annual growth rate: enterprise data centers: ~35%, Internet data centers: 50% - 100% – Information access anywhere, anytime, from many devices • Desktops, laptops, PDAs & smart phones, sensor networks, proliferation of broadband • Mainstream servers moving towards higher speed links – 1-GbE to10-GbE in 2008-2009 – 10-GbE to 40-GbE in 2010-2012 • High-speed datacenter-MAN/WAN connectivity – High-speed datacenter syncing for disaster recovery Datacenter Networking
  • 38. • A large part of the total cost of the DC hardware – Large routers and high-bandwidth switches are very expensive • Relatively unreliable – many components may fail. • Many major operators and companies design their own datacenter networking to save money and improve reliability/scalability/performance. – The topology is often known – The number of nodes is limited – The protocols used in the DC are known • Security is simpler inside the data center, but challenging at the border • We can distribute applications to servers to distribute load and minimize hot spots Datacenter Networking
  • 39. Networking components (examples) • High Performance & High Density Switches & Routers – Scaling to 512 10GbE ports per chassis – No need for proprietary protocols to scale • Highly scalable DC Border Routers – 3.2 Tbps capacity in a single chassis – 10 Million routes, 1 Million in hardware – 2,000 BGP peers – 2K L3 VPNs, 16K L2 VPNs – High port density for GE and 10GE application connectivity – Security 768 1-GE port Downstream 64 10-GE port Upstream Datacenter Networking
  • 40. Common data center topology Internet Servers Layer-2 switchAccess Data Center Layer-2/3 switchAggregation Layer-3 routerCore Datacenter Networking
  • 41. Data center network design goals • High network bandwidth, low latency • Reduce the need for large switches in the core • Simplify the software, push complexity to the edge of the network • Improve reliability • Reduce capital and operating cost Datacenter Networking
  • 42. Avoid this… Data Center Networking and simplify this…and simplify this…
  • 43. ?? Can we avoid using high-end switches? • Expensive high-end switches to scale up • Single point of failure and bandwidth bottleneck – Experiences from real systems • One answer: DCell 43 Interconnect
  • 44. DCell Ideas • #1: Use mini-switches to scale out • #2: Leverage servers to be part of the routing infrastructure – Servers have multiple ports and need to forward packets • #3: Use recursion to scale and build complete graph to increase capacity Interconnect
  • 45. One approach: switched network with a hypercube interconnect • Leaf switch: 40 1Gbps ports+2 10 Gbps ports. – One switch per rack. – Not replicated (if a switch fails, lose one rack of capacity) • Core switch: 10 10Gbps ports – Form a hypercube • Hypercube – high-dimensional rectangle Data Center Networking
  • 46. Hypercube properties • Minimum hop count • Even load distribution for all-all communication. • Can route around switch/link failures. • Simple routing: – Outport = f(Dest xor NodeNum) – No routing tables Interconnect
  • 47. A 16-node (dimension 4) hypercube 0 32 10 0 12 3 3 1 30 0 2 1 5 47 3 2 10 118 1 1 1 1 1 2 2 22 2 0 0 0 0 3 3 3 3 Interconnect
  • 48. Interconnect How many servers can be connected in this system? 81920 servers with 1Gbps bandwidth Core switch: 10Gbps port x 10 Leaf switch: 1Gbps port x 40 + 10Gbps port x 2.
  • 49. The Black Box Data Center Networking
  • 50. Shipping Container as Data Center Module • Data Center Module – Contains network gear, compute, storage, & cooling – Just plug in power, network, & chilled water • Increased cooling efficiency – Water & air flow – Better air flow management • Meet seasonal load requirements Data Center Network
  • 51. Unit of Data Center Growth • One at a time: – 1 system – Racking & networking: 14 hrs ($1,330) • Rack at a time: – ~40 systems – Install & networking: .75 hrs ($60) • Container at a time: – ~1,000 systems – No packaging to remove – No floor space required – Power, network, & cooling only – Weatherproof & easy to transport • Data center construction takes 24+ months Data Center Network
  • 52. Multiple-Site Redundancy and Enhanced Performance using load balancing • Handling site failures transparently • Providing best site selection per user • Leveraging both DNS and non-DNS methods for multi-site redundancy • Providing disaster recovery and non-stop operation LB system DNS Datacenter Datacenter Datacenter LB (load balancing) System • The load balancing systems regulate global data center traffic • Incorporates site health, load, user proximity, and service response for user site selection • Provides transparent site failover in case of disaster or service outage Global Data Center Deployment Problems Data Center Network
  • 53. Challenges and Research Problems Hardware – High-performance, reliable, cost-effective computing infrastructure – Cooling, air cleaning, and energy efficiency [Barraso] Clusters [Fan] Power [Andersen] FAWN [Reghavendra] Power
  • 54. Challenges and Research Problems System software – Operating systems – Compilers – Database – Execution engines and containers Ghemawat: GFS Chang: Bigtable DeCandia: Dynamo Brantner: DB on S3 Cooper: PNUTS Yu: DryadLINQ Dean: MapReduce Burrows: Chubby Isard: Quincy
  • 55. Challenges and Research Problems Networking – Interconnect and global network structuring – Traffic engineering Al-Fares: Commodity DC Guo 2008: DCell Guo 2009: BCube
  • 56. Challenges and Research Problems • Data and programming – Data consistency mechanisms (e.g., replications) – Fault tolerance – Interfaces and semantics • Software engineering • User interface • Application architecture Pike: Sawzall Olston: Pig Latin Buyya: IT services
  • 57. Resources • [Al-Fares] Al-Fares, M., Loukissas, A., and Vahdat, A. A scalable, commodity data center network architecture. In Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication (Seattle, WA, USA, August 17 - 22, 2008). SIGCOMM '08. 63-74. http://baijia.info/showthread.php?tid=139 • [Andersen] David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan. FAWN: A Fast Array of Wimpy Nodes. SOSP'09. http://baijia.info/showthread.php?tid=179 • [Barraso] Luiz Barroso, Jeffrey Dean, Urs Hoelzle, "Web Search for a Planet: The Google Cluster Architecture," IEEE Micro, vol. 23, no. 2, pp. 22-28, Mar./Apr. 2003 http://baijia.info/showthread.php?tid=133 • [Brantner] Brantner, M., Florescu, D., Graf, D., Kossmann, D., and Kraska, T. Building a database on S3. In Proceedings of the 2008 ACM SIGMOD international Conference on Management of Data (Vancouver, Canada, June 09 - 12, 2008). SIGMOD '08. 251-264. http://baijia.info/showthread.php?tid=125
  • 58. Resources • [Burrows] Burrows, M. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (Seattle, Washington, November 06 - 08, 2006). 335-350. . http://baijia.info/showthread.php?tid=59 • [Buyya] Buyya, R. Chee Shin Yeo Venugopal, S. Market-Oriented Cloud Computing. The 10th IEEE International Conference on High Performance Computing and Communications, 2008. HPCC '08. http://baijia.info/showthread.php?tid=248 • [Chang] Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (Seattle, Washington, November 06 - 08, 2006). 205-218. http://baijia.info/showthread.php?tid=4 • [Cooper] Cooper, B. F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H., Puz, N., Weaver, D., and Yerneni, R. PNUTS: Yahoo!'s hosted data serving platform. Proc. VLDB Endow. 1, 2 (Aug. 2008), 1277-1288. http://baijia.info/showthread.php?tid=126
  • 59. Resources • [Dean] Dean, J. and Ghemawat, S. 2004. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6 (San Francisco, CA, December 06 - 08, 2004). http://baijia.info/showthread.php?tid=2 • [DeCandia] DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. 2007. Dynamo: amazon's highly available key-value store. In Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles (Stevenson, Washington, USA, October 14 - 17, 2007). SOSP '07. ACM, New York, NY, 205-220. http://baijia.info/showthread.php?tid=120 • [Fan] Fan, X., Weber, W., and Barroso, L. A. Power provisioning for a warehouse-sized computer. In Proceedings of the 34th Annual international Symposium on Computer Architecture (San Diego, California, USA, June 09 - 13, 2007). ISCA '07. 13-23. http://baijia.info/showthread.php?tid=144
  • 60. Resources • [Ghemawat] Ghemawat, S., Gobioff, H., and Leung, S. 2003. The Google file system. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (Bolton Landing, NY, USA, October 19 - 22, 2003). SOSP '03. ACM, New York, NY, 29-43. http://baijia.info/showthread.php?tid=1 • [Guo 2008] Chuanxiong Guo, Haitao Wu, Kun Tan, Lei Shi, Yongguang Zhang, and Songwu Lu, DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers, in ACM SIGCOMM 08. http://baijia.info/showthread.php?tid=142 • [Guo 2009] Chuanxiong Guo, Guohan Lu, Dan Li, Xuan Zhang, Haitao Wu, Yunfeng Shi, Chen Tian, Yongguang Zhang, and Songwu Lu, BCube: A High Performance, Server- centric Network Architecture for Modular Data Centers, in ACM SIGCOMM 09. http://baijia.info/showthread.php?tid=141 • [Isard] Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar and Andrew Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. SOSP'09. http://baijia.info/showthread.php?tid=203
  • 61. Resources • [Olston] Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. 2008. Pig Latin: a not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD international Conference on Management of Data (Vancouver, Canada, June 09 - 12, 2008). SIGMOD '08. 1099-1110. http://baijia.info/showthread.php?tid=124 • [Pike] Pike, R., Dorward, S., Griesemer, R., and Quinlan, S. 2005. Interpreting the data: Parallel analysis with Sawzall. Sci. Program. 13, 4 (Oct. 2005), 277-298. http://baijia.info/showthread.php?tid=60 • [Reghavendra] Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, Xiaoyun Zhu. No "Power" Struggles: Coordinated Multi-level Power Management for the Data Center. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Seattle, WA, March 2008. http://baijia.info/showthread.php?tid=183 • [Yu] Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), December 8-10 2008. http://baijia.info/showthread.php? tid=5

Editor's Notes

  1. Standards-Based Data Center Structured Cabling System Design 3/20/06 Copyright (c) 2006 Ortronics/Legrand. All rights reserved. JS