Ensuring Technical Readiness For Copilot in Microsoft 365
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
1. An Overview of Cloud Computing:
My Other Computer is a Data Center
Robert Grossman
Open Data Group &
University of Illinois at Chicago
IEEE New Technologies Conference
August 6, 2009
6. Are There Other Types of Clouds?
6
Large Data Cloud Services
ad targeting
7. One Definition
Clouds provide on-demand resources or
services over a network, often the Internet,
with the scale and reliability of a data center.
No standard definition.
Cloud architectures are not new.
What is new:
– Scale
– Ease of use
– Pricing model.
7
9. Elastic, Usage Based Pricing Is New
9
1 computer in a rack
for 120 hours
120 computers in three
racks for 1 hour
costs the same as
Elastic, usage based pricing turns capex into opex.
Clouds can be used to manage surges in computing needs.
10. Simplicity Offered By the Cloud is New
10
+ .. and you have a computer
ready to work.
A new programmer can develop a
program to process a container full of
data with less than day of training
using MapReduce.
12. Varieties of Clouds
Architectural Model
– On-demand computing instances
vs large data cloud services
Payment Model
– Elastic, usage based pricing,
lease/own, …
Management Model
– Private vs Public; Single vs
Multiple Tenant; …
Programming Model
– Queue Service, MPI,
MapReduce, Distributed UDF
12
Computing instances
vs large data cloud
services
Private internal vs
public external
Elastic, usage-
based pricing or not
All combinations
occur.
13. Architectural Models:
How Do You Fill a Data Center?
Cloud Storage Services
Cloud Compute Services
(MapReduce & Generalizations)
Cloud Data Services
(BigTable, etc.)
Quasi-relational
Data Services
App App App App App
App App
App App
large data cloud
services
App App App
…
on-demand
computing instances
14. Payment Models
Buying racks, containers and data centers
Leasing racks containers and data centers
Utility based computing (pay as you go)
– Moves cap ex to op ex
– Handle surge requirements (use 1000 servers for 1
hour vs 1 server for 1000 hours)
14
15. Management Models
Public, private and hybrid models
Single tenant vs multiple tenant (shared vs
non-shared hardware)
Owned vs leased
Manage yourself vs outsource management
All combinations are possible
15
16. Programming Models
Amazon’s Simple
Queue Service
MPI, sockets, FIFO
16
MapReduce
Distributed UDF
on-demand
computing
instances
large data
cloud services
DryadLINQ
Azure services
17. Part 3. Cloud Computing Industry
“Cloud computing has become the center of
investment and innovation.”
Nicholas Carr, 2009 IDC Directions
17
Cloud computing is
approaching the top of
the Gartner hype cycle.
18. IaaS, PaaS and SaaS Point of View
SaaS
PaaS
IaaS
Infrastructure as a Service
PRODUCT: Compute power, storage
and networking infrastructure over the
internet, provided as a virtual machine
image
USERS: Developers
Platform as a Service
PRODUCT: storage, compute and
other services to simplify application
development, especially of web
applications.
USERS: Application Developers
Software as a Service
PRODUCT: Finished
application available on
demand to end user
USERS: Software consumer
19. Building Data Centers
Sun’s Modular
Data Center (MD)
Formerly Project
Blackbox
Containers used by
Google, Microsoft
& others
Data center
consists of 10-60+
containers.
19
20. Data Center Operating Systems
Data center services include: VM management
services, business continuity services, security
services, power management services, etc.
20
workstatio
n
VM 1 VM 5
…
VM 1 VM 50,000
…
Data Center Operating System
21. Berkeley View of Cloud Computing
21
Providers of Cloud Services
Consumers of Cloud Services
Providers of Software as a Service
Consumers of Software as a Service
Berkeley Report on cloud computing divides industry
into these layers & concentrates on public clouds.
Data Centers
22. Transition Taking Place
A hand full of players are building multiple data
centers a year and improving with each one.
This includes Google, Microsoft, Yahoo, …
A data center today costs $200 M – $400+ M
Berkeley RAD Report points out analogy with
semiconductor industry as companies stopped
building their own Fabs and starting leasing
Fabs from others as Fabs approached $1B
22
23. Mindmeister Map of Cloud Computing
Dupont’s Mindmeister Map divides the industry:
– IaaS, PaaS, Management, Community
http://www.mindmeister.com/maps/show_public/15936058
23
25. Virtualization
Virtualization separates logical infrastructure
from the underlying physical resources to
decrease time to make changes, improve
flexibility, improve utilization and reduce costs
Example - server virtualization. Use one
physical server to support multiple logical
virtual machines (VMs), which are sometimes
called logical partitions.
Technology pioneered by IBM in 1960s to
better utilize mainframes
25
26. Idea Dates Back to the 1960s
26
IBM Mainframe
IBM VM/370
CMS
App
Native (Full) Virtualization
Examples: Vmware ESX
MVS
App
CMS
App
27. Two Types of Virtualization
Using the hypervisor, each guest OS sees its own
independent copy of the CPU, memory, IO, etc.
27
Physical Hardware
Hyperviser
Unmodified
Guest OS 1
Unmodified
Guest OS 2
Native (Full) Virtualization
Examples: Vmware ESX
Apps
Physical Hardware
Hyperviser
Modified
Guest OS 1
Modified
Guest OS 2
Para Virtualization
Examples: Xen
Apps
28. Four Key Properties
1. Partitioning: run multiple VMs on one
physical server; one VM doesn’t know about
the others
2. Isolation: security isolation is at the hardware
level.
3. Encapsulation: entire state of the machine
can be copied to files and moved around
4. Hardware abstraction: provision and migrate
VM to another server
28
31. The Google Data Stack
The Google File System (2003)
MapReduce: Simplified Data Processing… (2004)
BigTable: A Distributed Storage System… (2006)
31
32. Map-Reduce Example
Input is file with one document per record
User specifies map function
– key = document URL
– Value = terms that document contains
(“doc cdickens”,
“it was the best of times”)
“it”, 1
“was”, 1
“the”, 1
“best”, 1
map
33. Example (cont’d)
MapReduce library gathers together all pairs
with the same key value (shuffle/sort phase)
The user-defined reduce function combines all
the values associated with the same key
key = “it”
values = 1, 1
key = “was”
values = 1, 1
key = “best”
values = 1
key = “worst”
values = 1
“it”, 2
“was”, 2
“best”, 1
“worst”, 1reduce
34. Generalization: Apply User Defined
Functions (UDF) to Files in Storage Cloud
34
map/shuffle reduce
UDFUDF
37. Sector’s Layered Cloud Services
Storage Services
Table Services
Compute Services
37
Sector’s Stack
Applications
Sector’s Distributed File
System (SDFS)
Sphere’s UDF
Routing &
Transport Services
UDP-based Data Transport
Protocol (UDT)
38. Hadoop & Sector
Hadoop Sector
Storage Cloud Block-based file
system
File-based
Programming
Model
MapReduce UDF &
MapReduce
Protocol TCP UDP-based
protocol (UDT)
Replication At time of writing Periodically
Security Not yet HIPAA capable
Language Java C++
38
39. MalStone Benchmark
Benchmark developed by Open Cloud
Consortium for clouds supporting data
intensive computing.
Code to generate synthetic data required is
available from code.google.com/p/malgen
Stylized analytic computation that is easy to
implement in MapReduce and its
generalizations.
39
41. MalStone B Benchmark
41
MalStone B
Hadoop v0.18.3 799 min
Hadoop Streaming v0.18.3 142 min
Sector v1.19 44 min
# Nodes 20 nodes
# Records 10 Billion
Size of Dataset 1 TB
42. Trading Functionality for Scalability
Databases Data Clouds
Scalability 100’s TB 100’s PB
Functionalit
y
Full SQL-based queries,
including joins
Optimized access to sorted
tables (tables with single keys)
Optimized Databases are optimized for
safe writes
Clouds optimized for efficient
reads
Consistency
model
ACID (Atomicity, Consistency,
Isolation & Durability) –
database always consist
Eventual consistency – updates
eventually propagate through
system
Parallelism Difficult because of ACID
model; shared nothing is
possible (Graywolf)
Basic design incorporates
parallelism over commodity
components
Scale Racks Data center
42
43. Not Everyone Agrees
David J. DeWitt and Michael Stonebraker,
MapReduce: A Major Step Backwards,
Database Column, Jane 17, 2008
43
44. Part 6. Standards Efforts
44
Change of gauge at Ussuriisk (near
Vladivostok) at the Chinese –Russian border
Train gauge
in China is
1435 mm
Train gauge
in Russia is
1520 mm
How can a
cloud
application
move from
one cloud
storage
service to
another?
45. Standards Efforts for Clouds
Cloud Computing Interoperability Forum (CCIF)
Open Cloud Consortium (OCC)
Open Grid Forum (OGF)
Distributed Management Task Force (DMTF)
Storage Network Industrial Association (SNIA)
Plus several others…
45
46. www.opencloudconsortium.org
1. Supports the development of standards.
2. Supports reference implementations for
cloud computing, preferably open source.
3. Manages a testbed for cloud computing
called the Open Cloud Testbed.
4. Supports the development of benchmarks.
5. Sponsors workshops and other events related
to cloud computing.
46
47. Activities Currently Focused Around
Five Use Cases
1. Moving an existing cloud application from Cloud
1 to Cloud 2 without changing the application.
2. Providing surge capacity for an application on
Cloud 1 using any of the Clouds 2, 3, … (without
changing the application).
Cloud 1 Cloud 2
1. Migrate / port
2. Surge / burst
48. Large Data Cloud Use Cases
3. Moving a large data cloud application from
one large data cloud storage service to
another.
4. Moving a large data cloud application from
one large data cloud compute service to
another.
Large Data Cloud Storage Services
Large Data Cloud Compute Services
App 1 App 2
49. Inter-Cloud Use Case
5. Inter-cloud communication between two
HIPAA compliant clouds.
Cloud 1 Cloud 2
50. OCC Welcomes New Members
Companies and organizations are welcome to
join the Open Cloud Consortium (OCC)
www.opencloudconsortium.org/membership.html
Join one of our working groups
– Large Data Clouds Working Group
– Standard Cloud Performance Measurement
(SCPM) Working Group
– Information Sharing & Security Working Group
51. For More Information
Contact information:
Robert Grossman
rlg@opendatagroup.com
blog.rgrossman.com
Web sites
– www.opendatagroup.com
– www.ncdm.uic.edu
– www.opencloudconsortium.org
51