SlideShare a Scribd company logo
1 of 17
Download to read offline
© 2014 IBM Corporation
Best Practices Building a
Multi-tenant Big Data Infrastructure
STAC Summit 2014 - NYC
Gord Sissons, gsissons@ca.ibm.com @GJSissons
© 2014 IBM Corporation2
Agenda
What do we mean by multi-tenancy?
Our evolving view - from HPC to HPA
Enter Big Data
Client example – multi-tenant Hadoop
New frameworks & Benchmarking Hadoop
Closing thoughts
© 2014 IBM Corporation3
Multi-tenancy is an over-loaded term
Virtualization
Multiple users, lines-of-business
Multiple application instances & versions
Multi-tenant datastores – security isolation
Multiple distributed frameworks
Multiple instances of the same framework
Our viewpoint shaped by managing scaled-out cluster
infrastructure for the Financial Services Community
Means different things to different people
© 2014 IBM Corporation4
HPC, HPA
IBM Platform
Symphony
Low latency scheduling
Dynamic resource sharing
ISV applications
Extensive APIs High-performance SOA
A high-performance, shared
grid infrastructure for risk
analytics
From a shared infrastructure for risk analytics to born-in-the-cloud frameworks
Batch
IBM Platform
LSF
Multi-headed
Configurations
Batch workloads
On a shared infrastructure,
sharing resources according
to policy – a broad set of
workloads
Our evolving view of multi-tenancy
© 2014 IBM Corporation5
Client requirements
Need for guaranteed service levels, notion of ownership
Time-variant, directed sharing policies
Dynamic, transparent service orchestration
Support for multiple concurrent applications
Agile flexing & resource reclaim
A simple value proposition to the business – sign on to a shared
infrastructure and have guaranteed resource ownership, and a better
quality of service than you could realize on dedicated infrastructure
© 2014 IBM Corporation6
split 0
split 1
split 2
split 3
split 4
split 5
Map
Map
Map
Reduce
Reduce
Reduce
C Client
output 0
output 1
output 2
M Master
Input
Files
Map
Phase
Intermediate
Files
Reduce
Phase
Output
Files
Enter Hadoop - much attention for new workloads
 Data warehouse modernization
 Fraud analytics
 Audit & compliance
 Social media analytics
 360 view of the customer
 Machine data analytics
 Text analytics
 Tick analytics
 Trade visibility
 Click-stream analytics
 Vehicle telematics
History repeating itself - Much as distributed system dominate large-
scale HPC, the same is becoming true in data management
© 2014 IBM Corporation7
HPC, HPA
IBM Platform
Symphony
Low latency scheduling
Dynamic resource sharing
ISV applications
Extensive APIs High-performance SOA
A high-performance, shared
grid infrastructure for risk
analytics
From a shared infrastructure for risk analytics to born-in-the-cloud frameworks
Batch
IBM Platform
LSF
Multi-headed
Configurations
Batch workloads
On a shared infrastructure,
sharing resources according
to policy
Big Data
IBM Platform
Symphony
Advanced Edition
MapReduce
Multitenancy
Agile Scheduling
Hadoop MapReduce
Advanced, high-performance
MapReduce framework with
Hadoop compatibility and
multitenancy
Our evolving view of multi-tenancy
© 2014 IBM Corporation8
Cluster Sprawl – The Elephant in the Room
 Diverse applications with different dependencies
 Different distributions, versions & tools
 Life cycle management challenges – dev, QA, test, production
 Big Data is more than just Hadoop – multiple projects and frameworks
© 2014 IBM Corporation9
HPC, HPA
IBM Platform
Symphony
Low latency scheduling
Dynamic resource sharing
ISV applications
Extensive APIs High-performance SOA
A high-performance, shared
grid infrastructure for risk
analytics
From a shared infrastructure for risk analytics to born-in-the-cloud frameworks
Batch
IBM Platform
LSF
Multi-headed
Configurations
Batch workloads
On a shared infrastructure,
sharing resources according
to policy
Big Data
IBM Platform
Symphony
Advanced Edition
Low latency MapReduce
Multitenancy
Agile Scheduling
Hadoop MapReduce
Advanced, high-performance
MapReduce framework with 100%
Hadoop compatibility and
sophisticated multitenancy
Application
Frameworks
IBM Application
Services Controller
Complex Service
Orchestration
Advanced Services
“Born in the cloud”
application frameworks
Our evolving view of multi-tenancy
© 2014 IBM Corporation10
Customer example
US financial institution, approx 9M customers
 Retail banking, credit cards, insurance, portfolio mgmt, real-estate, retirement
planning & more
Began Hadoop journey in ~2010
 Deliver new services, reduce costs, off-load warehouse, provide timely data
access to analysts & data scientists
Target application areas
 CRM, click-stream analytics, fraud alerting, actuarial underwriting, social data
analytics, vehicle telematics / geo-spatial analytics
Rapid success, internal demand & security requirements
drove the need for an architecture re-think in ~2012
 Deployed IBM Platform Symphony MapReduce + Elastic Storage
(based on IBM GPFS) realizing a shared, multi-tenant analytics grid
© 2014 IBM Corporation11
App #1
User Group #1
App #2
User Group #2
App #3
User Group #3
App #4
User Group #4
App #5
User Group #5
App #6
User Group #6
App #7
User Group #7
App #n
User Group #n
…
Shared infrastructure – current state
 Over two-dozen lines of business sharing production cluster
 1 PB deployed, rapid growth trajectory - ~ 40% reduction in storage requirement
 Security isolation, guaranteed service-levels, show-back accounting
 Significant performance & operational gains, higher infrastructure utilization
 Avoided the need for additional production clusters
InfoSphere BigInsights - Enterprise-grade Hadoop
Platform Symphony MapReduce – Multi-tenancy, high-performance, service level guarantees
IBM Elastic Storage (based on IBM GPFS) - HDFS compatible, POSIX, enterprise-features
© 2014 IBM Corporation12
Planned cluster expansion – early 2015
Expanding the Hadoop infrastructure
Deploying Spark to support new applications
Big R deployment serving data scientists community
Pilot Hadoop-as-a-service on cloud
SQL-on-Hadoop deployment to serve demand from analysts
© 2014 IBM Corporation13
Hadoop-DS Benchmark – October 2014
 IBM developed benchmark reflecting growing interest in SQL-on-Hadoop
 Showcase IBM’s Big SQL capability
 Big Data DS benchmark - based on TPC-DS
 Fully complies with the TPC-DS schema requirement
 Uses all 99 queries
 Meets the multi-user requirement
 Has been audited by a TPC-DS auditor but as a non-TPC benchmark
 Select deviations from TPC-DS due to Hadoop limitations:
 No data maintenance operations, referential integrity enforcement, or ACID
property validation as these are not feasible with HDFS
 Additional statistics used
 Metric adjustments
 No price/performance measures included
 Not an official TPC benchmark result
© 2014 IBM Corporation14
Benchmarking SQL language compatibility
Key points
 With competing solutions, many
queries needed to be re-written
 Owing to various restrictions,
some queries could not be re-
written or failed at run-time
 Re-writing queries in a
benchmark scenario where
results are known is one thing –
doing this against real production
databases is another
 Minimum 3.6x speed advantage
across 46 common query set
InfoSphere BigInsights runs all queries with 12 allowable modifications
Detailed presentation on SlideShare: http://www.slideshare.net/IBM_IM/hadoop-ds-benchmark-results
Audited by InfoSizing, certified TPC auditors – letter of attestation available
© 2014 IBM Corporation15
Resource manager included in Hadoop 2.x and later
Decouples Hadoop workload & resource management
Introduces a general purpose application container
Enjoys broad industry support
By all means use it, but understand current limitations
 Missing flexible resource sharing policies, not yet widely deployed
outside Hadoop contexts, limited application service orchestration
capabilities
What about YARN?
Yet Another Resource Negotiator
© 2014 IBM Corporation16
Closing thoughts
http://ibm.com/platformcomputing
http://ibm.com/hadoop
Be clear on what you mean by multi-tenancy
The right approach to building a shared
infrastructure will depend on what you have
Consider the need for policy management and the
ability to orchestrate services for a wide variety of
distributed frameworks
© 2014 IBM Corporation17

More Related Content

What's hot

MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Data Con LA
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonMapR Technologies
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationAbdelkrim Hadjidj
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm DataWorks Summit/Hadoop Summit
 
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)
FOD Paris Meetup -  Global Data Management with DataPlane Services (DPS)FOD Paris Meetup -  Global Data Management with DataPlane Services (DPS)
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)Abdelkrim Hadjidj
 

What's hot (20)

Hadoop and other animals
Hadoop and other animalsHadoop and other animals
Hadoop and other animals
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Production Grade Data Science for Hadoop
Production Grade Data Science for HadoopProduction Grade Data Science for Hadoop
Production Grade Data Science for Hadoop
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
 
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)
FOD Paris Meetup -  Global Data Management with DataPlane Services (DPS)FOD Paris Meetup -  Global Data Management with DataPlane Services (DPS)
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)
 
MapR & Skytree:
MapR & Skytree: MapR & Skytree:
MapR & Skytree:
 

Viewers also liked

Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
 
Multi tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafkaMulti tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafkaTodd Palino
 
Solving Multi-tenancy and G1GC in Apache HBase
Solving Multi-tenancy and G1GC in Apache HBase Solving Multi-tenancy and G1GC in Apache HBase
Solving Multi-tenancy and G1GC in Apache HBase HBaseCon
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionNGDATA
 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Sematext Group, Inc.
 
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Behar Veliqi
 
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureThe Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureDataWorks Summit/Hadoop Summit
 
Hadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyHadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyTreasure Data, Inc.
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 

Viewers also liked (20)

Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase Performance
 
Multi tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafkaMulti tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafka
 
Solving Multi-tenancy and G1GC in Apache HBase
Solving Multi-tenancy and G1GC in Apache HBase Solving Multi-tenancy and G1GC in Apache HBase
Solving Multi-tenancy and G1GC in Apache HBase
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
 
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
 
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureThe Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data Architecture
 
Hadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyHadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-Tenancy
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 

Similar to Building a Multi-Tenant Big Data Infrastructure

2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
Accelerating Innovation with Hybrid Cloud
Accelerating Innovation with Hybrid CloudAccelerating Innovation with Hybrid Cloud
Accelerating Innovation with Hybrid CloudJeff Jakubiak
 
IMS integration 2017
IMS integration 2017IMS integration 2017
IMS integration 2017Helene Lyon
 
Cloud Computing Introduction - 2018
Cloud Computing Introduction - 2018Cloud Computing Introduction - 2018
Cloud Computing Introduction - 2018Lucas Lopez
 
7 steps to Enterprise PaaS
7 steps to Enterprise PaaS7 steps to Enterprise PaaS
7 steps to Enterprise PaaSVMware vFabric
 
IBM APM for Hybrid Applications
IBM APM for Hybrid ApplicationsIBM APM for Hybrid Applications
IBM APM for Hybrid ApplicationsMatthew Cheah
 
The intersection of Traditional IT and New-Generation IT
The intersection of Traditional IT and New-Generation ITThe intersection of Traditional IT and New-Generation IT
The intersection of Traditional IT and New-Generation ITKangaroot
 
High Value Business Intelligence for IBM Platform compute environments
High Value Business Intelligence for IBM Platform compute environmentsHigh Value Business Intelligence for IBM Platform compute environments
High Value Business Intelligence for IBM Platform compute environmentsGabor Samu
 
Software Technology Trends in 2013-2014
Software Technology Trends in 2013-2014Software Technology Trends in 2013-2014
Software Technology Trends in 2013-2014KMS Technology
 
Accelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data StrategyAccelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data StrategyMongoDB
 
Towards Application Portability in Platform as a Service
Towards Application Portability in Platform as a ServiceTowards Application Portability in Platform as a Service
Towards Application Portability in Platform as a ServiceStefan Kolb
 
Gartner EA Architecting for DevOps and Hybrid Cloud
Gartner EA Architecting for DevOps and Hybrid CloudGartner EA Architecting for DevOps and Hybrid Cloud
Gartner EA Architecting for DevOps and Hybrid CloudRosalind Radcliffe
 
Cloud adoption patterns
Cloud adoption patternsCloud adoption patterns
Cloud adoption patternsKyle Brown
 
Cloud adoption patterns April 11 2016
Cloud adoption patterns April 11 2016Cloud adoption patterns April 11 2016
Cloud adoption patterns April 11 2016Kyle Brown
 
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBM
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBMBuild end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBM
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBMCodemotion Tel Aviv
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 

Similar to Building a Multi-Tenant Big Data Infrastructure (20)

2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Accelerating Innovation with Hybrid Cloud
Accelerating Innovation with Hybrid CloudAccelerating Innovation with Hybrid Cloud
Accelerating Innovation with Hybrid Cloud
 
IMS integration 2017
IMS integration 2017IMS integration 2017
IMS integration 2017
 
Cloud Computing Introduction - 2018
Cloud Computing Introduction - 2018Cloud Computing Introduction - 2018
Cloud Computing Introduction - 2018
 
7 steps to Enterprise PaaS
7 steps to Enterprise PaaS7 steps to Enterprise PaaS
7 steps to Enterprise PaaS
 
IBM APM for Hybrid Applications
IBM APM for Hybrid ApplicationsIBM APM for Hybrid Applications
IBM APM for Hybrid Applications
 
The intersection of Traditional IT and New-Generation IT
The intersection of Traditional IT and New-Generation ITThe intersection of Traditional IT and New-Generation IT
The intersection of Traditional IT and New-Generation IT
 
Upmc tpdev3
Upmc tpdev3Upmc tpdev3
Upmc tpdev3
 
High Value Business Intelligence for IBM Platform compute environments
High Value Business Intelligence for IBM Platform compute environmentsHigh Value Business Intelligence for IBM Platform compute environments
High Value Business Intelligence for IBM Platform compute environments
 
Software Technology Trends in 2013-2014
Software Technology Trends in 2013-2014Software Technology Trends in 2013-2014
Software Technology Trends in 2013-2014
 
Overview of SaaS
Overview of SaaSOverview of SaaS
Overview of SaaS
 
Adopting the Cloud
Adopting the CloudAdopting the Cloud
Adopting the Cloud
 
Accelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data StrategyAccelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data Strategy
 
Towards Application Portability in Platform as a Service
Towards Application Portability in Platform as a ServiceTowards Application Portability in Platform as a Service
Towards Application Portability in Platform as a Service
 
Gartner EA Architecting for DevOps and Hybrid Cloud
Gartner EA Architecting for DevOps and Hybrid CloudGartner EA Architecting for DevOps and Hybrid Cloud
Gartner EA Architecting for DevOps and Hybrid Cloud
 
Cloud adoption patterns
Cloud adoption patternsCloud adoption patterns
Cloud adoption patterns
 
Hadoop in the Cloud
Hadoop in the CloudHadoop in the Cloud
Hadoop in the Cloud
 
Cloud adoption patterns April 11 2016
Cloud adoption patterns April 11 2016Cloud adoption patterns April 11 2016
Cloud adoption patterns April 11 2016
 
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBM
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBMBuild end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBM
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBM
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 

Recently uploaded

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Recently uploaded (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Building a Multi-Tenant Big Data Infrastructure

  • 1. © 2014 IBM Corporation Best Practices Building a Multi-tenant Big Data Infrastructure STAC Summit 2014 - NYC Gord Sissons, gsissons@ca.ibm.com @GJSissons
  • 2. © 2014 IBM Corporation2 Agenda What do we mean by multi-tenancy? Our evolving view - from HPC to HPA Enter Big Data Client example – multi-tenant Hadoop New frameworks & Benchmarking Hadoop Closing thoughts
  • 3. © 2014 IBM Corporation3 Multi-tenancy is an over-loaded term Virtualization Multiple users, lines-of-business Multiple application instances & versions Multi-tenant datastores – security isolation Multiple distributed frameworks Multiple instances of the same framework Our viewpoint shaped by managing scaled-out cluster infrastructure for the Financial Services Community Means different things to different people
  • 4. © 2014 IBM Corporation4 HPC, HPA IBM Platform Symphony Low latency scheduling Dynamic resource sharing ISV applications Extensive APIs High-performance SOA A high-performance, shared grid infrastructure for risk analytics From a shared infrastructure for risk analytics to born-in-the-cloud frameworks Batch IBM Platform LSF Multi-headed Configurations Batch workloads On a shared infrastructure, sharing resources according to policy – a broad set of workloads Our evolving view of multi-tenancy
  • 5. © 2014 IBM Corporation5 Client requirements Need for guaranteed service levels, notion of ownership Time-variant, directed sharing policies Dynamic, transparent service orchestration Support for multiple concurrent applications Agile flexing & resource reclaim A simple value proposition to the business – sign on to a shared infrastructure and have guaranteed resource ownership, and a better quality of service than you could realize on dedicated infrastructure
  • 6. © 2014 IBM Corporation6 split 0 split 1 split 2 split 3 split 4 split 5 Map Map Map Reduce Reduce Reduce C Client output 0 output 1 output 2 M Master Input Files Map Phase Intermediate Files Reduce Phase Output Files Enter Hadoop - much attention for new workloads  Data warehouse modernization  Fraud analytics  Audit & compliance  Social media analytics  360 view of the customer  Machine data analytics  Text analytics  Tick analytics  Trade visibility  Click-stream analytics  Vehicle telematics History repeating itself - Much as distributed system dominate large- scale HPC, the same is becoming true in data management
  • 7. © 2014 IBM Corporation7 HPC, HPA IBM Platform Symphony Low latency scheduling Dynamic resource sharing ISV applications Extensive APIs High-performance SOA A high-performance, shared grid infrastructure for risk analytics From a shared infrastructure for risk analytics to born-in-the-cloud frameworks Batch IBM Platform LSF Multi-headed Configurations Batch workloads On a shared infrastructure, sharing resources according to policy Big Data IBM Platform Symphony Advanced Edition MapReduce Multitenancy Agile Scheduling Hadoop MapReduce Advanced, high-performance MapReduce framework with Hadoop compatibility and multitenancy Our evolving view of multi-tenancy
  • 8. © 2014 IBM Corporation8 Cluster Sprawl – The Elephant in the Room  Diverse applications with different dependencies  Different distributions, versions & tools  Life cycle management challenges – dev, QA, test, production  Big Data is more than just Hadoop – multiple projects and frameworks
  • 9. © 2014 IBM Corporation9 HPC, HPA IBM Platform Symphony Low latency scheduling Dynamic resource sharing ISV applications Extensive APIs High-performance SOA A high-performance, shared grid infrastructure for risk analytics From a shared infrastructure for risk analytics to born-in-the-cloud frameworks Batch IBM Platform LSF Multi-headed Configurations Batch workloads On a shared infrastructure, sharing resources according to policy Big Data IBM Platform Symphony Advanced Edition Low latency MapReduce Multitenancy Agile Scheduling Hadoop MapReduce Advanced, high-performance MapReduce framework with 100% Hadoop compatibility and sophisticated multitenancy Application Frameworks IBM Application Services Controller Complex Service Orchestration Advanced Services “Born in the cloud” application frameworks Our evolving view of multi-tenancy
  • 10. © 2014 IBM Corporation10 Customer example US financial institution, approx 9M customers  Retail banking, credit cards, insurance, portfolio mgmt, real-estate, retirement planning & more Began Hadoop journey in ~2010  Deliver new services, reduce costs, off-load warehouse, provide timely data access to analysts & data scientists Target application areas  CRM, click-stream analytics, fraud alerting, actuarial underwriting, social data analytics, vehicle telematics / geo-spatial analytics Rapid success, internal demand & security requirements drove the need for an architecture re-think in ~2012  Deployed IBM Platform Symphony MapReduce + Elastic Storage (based on IBM GPFS) realizing a shared, multi-tenant analytics grid
  • 11. © 2014 IBM Corporation11 App #1 User Group #1 App #2 User Group #2 App #3 User Group #3 App #4 User Group #4 App #5 User Group #5 App #6 User Group #6 App #7 User Group #7 App #n User Group #n … Shared infrastructure – current state  Over two-dozen lines of business sharing production cluster  1 PB deployed, rapid growth trajectory - ~ 40% reduction in storage requirement  Security isolation, guaranteed service-levels, show-back accounting  Significant performance & operational gains, higher infrastructure utilization  Avoided the need for additional production clusters InfoSphere BigInsights - Enterprise-grade Hadoop Platform Symphony MapReduce – Multi-tenancy, high-performance, service level guarantees IBM Elastic Storage (based on IBM GPFS) - HDFS compatible, POSIX, enterprise-features
  • 12. © 2014 IBM Corporation12 Planned cluster expansion – early 2015 Expanding the Hadoop infrastructure Deploying Spark to support new applications Big R deployment serving data scientists community Pilot Hadoop-as-a-service on cloud SQL-on-Hadoop deployment to serve demand from analysts
  • 13. © 2014 IBM Corporation13 Hadoop-DS Benchmark – October 2014  IBM developed benchmark reflecting growing interest in SQL-on-Hadoop  Showcase IBM’s Big SQL capability  Big Data DS benchmark - based on TPC-DS  Fully complies with the TPC-DS schema requirement  Uses all 99 queries  Meets the multi-user requirement  Has been audited by a TPC-DS auditor but as a non-TPC benchmark  Select deviations from TPC-DS due to Hadoop limitations:  No data maintenance operations, referential integrity enforcement, or ACID property validation as these are not feasible with HDFS  Additional statistics used  Metric adjustments  No price/performance measures included  Not an official TPC benchmark result
  • 14. © 2014 IBM Corporation14 Benchmarking SQL language compatibility Key points  With competing solutions, many queries needed to be re-written  Owing to various restrictions, some queries could not be re- written or failed at run-time  Re-writing queries in a benchmark scenario where results are known is one thing – doing this against real production databases is another  Minimum 3.6x speed advantage across 46 common query set InfoSphere BigInsights runs all queries with 12 allowable modifications Detailed presentation on SlideShare: http://www.slideshare.net/IBM_IM/hadoop-ds-benchmark-results Audited by InfoSizing, certified TPC auditors – letter of attestation available
  • 15. © 2014 IBM Corporation15 Resource manager included in Hadoop 2.x and later Decouples Hadoop workload & resource management Introduces a general purpose application container Enjoys broad industry support By all means use it, but understand current limitations  Missing flexible resource sharing policies, not yet widely deployed outside Hadoop contexts, limited application service orchestration capabilities What about YARN? Yet Another Resource Negotiator
  • 16. © 2014 IBM Corporation16 Closing thoughts http://ibm.com/platformcomputing http://ibm.com/hadoop Be clear on what you mean by multi-tenancy The right approach to building a shared infrastructure will depend on what you have Consider the need for policy management and the ability to orchestrate services for a wide variety of distributed frameworks
  • 17. © 2014 IBM Corporation17