SlideShare a Scribd company logo
1 of 30
Download to read offline
Empowering YOU with Democratized Data Access,
Data Science and Machine Learning
2
Acknowledgements and Disclaimers
Availability. References in this presentationto IBM products, programs, or services do not imply that they will be available in all countries in which
IBM operates.
The workshops, sessions and materials have beenprepared by IBM or the session speakers and reflecttheir own views. They are provided for
informational purposes only, and are neither intended to,nor shall have the effect of being,legal or other guidanceor advice to any participant.
While efforts were made to verify the completeness and accuracy of theinformationcontained in this presentation, it is provided AS-IS without
warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the useof, or otherwise relatedto, this
presentation or any other materials. Nothingcontainedin this presentationis intendedto, nor shall havethe effect of, creating any warranties or
representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicablelicense agreementgoverning theuse of
IBM software.
All customer examples described are presentedas illustrations of how thosecustomers have usedIBM products and theresults they may have
achieved. Actual environmental costs andperformancecharacteristics may vary by customer. Nothing containedin these materials is intended to,
nor shall have the effect of, statingor implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.
© Copyright IBM Corporation 2017. All rights reserved.
— U.S. Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM Corp.
IBM, the IBM logo, ibm.com, Big SQL are trademarks or registered trademarks of International Business Machines Corporationin the United
States, other countries, or both. If these andother IBM trademarked terms are marked on their first occurrence in this information with a trademark
symbol (® or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published.
Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at
§“Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
§TPC Benchmark, TPC-DS, and QphDS are trademarks of Transaction Processing Performance Council
§Cloudera, the Cloudera logo, Cloudera Impala are trademarks of Cloudera.
§Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the UnitedStates andother
countries.
§Other company, product,or service names may be trademarks or service marks of others.
3
Operationalizing machine learning and getting actionable insights from
disparate sources of data has been a huge challenge
Data	Integration/Data	Engineering	Team Data	Science	Team	and	Data	Engineering	Team Application	 Development	Team
Line	of	Business
Data still lives in Silos
IBM Db2
Operationalize Machine Learning Organization needs to act fast
ACT
NOW!
4
Amy
Chris
Data	Engineer
Ryan
Data		Scientist
Chris
Data	Engineer
Nick
Application	Developer
Amy needs to work with different teams who perform specific
tasks to execute the campaign
4
Product
details
Customer
details
Sales
campaign
Operationalize
Machine
Learning
5
With Big SQL, Amy’s team can save time on execution and
enhance Productivity
Federation Spark Integration Application Integration
Ryan
Data		
Scientist
Nick
Application
Developer
ONE	TIME	
DEVELOPMENT
5
Product	Details
Sales	Campaign
Customer	Details
IBM Big SQL
Chris
Data	Engineer
6
Democratize Data Science and Machine Learning
Data Ingestion
Data Transformation/
Data Science/
Machine Learning
Data Visualization
Virtualize disparate data
sources like Hadoop, RDBMS,
and Object Stores (S3) to join
data in a single query
Manipulate data and
operationalize data science
models written in various
languages
Perform data discovery,
analyze, and visualize business
results in notebooks or other BI
tools
7
How are other companies benefiting by using Big SQL
8
Want to modernize
your EDW without
long and costly
migration efforts
Offloading historical
data from Oracle,
Db2, Netezza
because reaching
capacity
Operationalize
machine learning
Need to query,
optimize and
integrate multiple
data sources from
one single endpoint
Slow query
performance for SQL
workloads
Require skill set to
migrate data from
RDBMS to
Hadoop / Hive
Do you have any of these challenges?
9
Big Fish Games – Uses Big SQL to combine disparate data to drive product
innovation through customer feedback with the use of analytics
“The ability to answer complicated questions with data from
disparate sources will allow our analysts to focus on
answering business questions without having to worry
about where the data lives or waiting on a project to
perform the data integration for them.” -- David Darden,
BI Engineering Manager, Big Fish Games
Business need:
• Understand which product features resonate the best with
the gaming community
• Increase cross sell and up-sell opportunities
Solution:
• Combines structured (customer) data from PureData System
for Analytics with semi-structured (game log) data in Hadoop
Beta Experience and Outcomes:
Puts users in charge of data analysis - Access to data without
technology getting in the way
Faster insights – fast data movement to Hadoop,
3X faster than Sqoop
Leverage existing skills - IBM Big SQL enables and leverages
existing SQL skill set
10
Southwest Power Pool – Uses Big SQL to federate and reuse applications
while taking first steps in establishing an enterprise data lake
Business need:
• Near term requirements: Offload less frequently used data
• Long term vision: Build a data lake infrastructure
Solution:
• Offloaded cold data to Hadoop and reuse applications
• Combine cold data on Hadoop with hot data on Netezza
to derive insights
Business Benefits:
Application portability – SQL compatibility enables reuse
of application with minimal business query modifications -
Support for Netezza functions
Federation – query capabilities between Netezza and
Hadoop
Leverage existing skills - IBM Big SQL enables and
leverages existing SQL skill set
Big SQL provides SQL query, federation, transactional,
performance, and security capabilities, which will combine
with streaming data, governance, and BI analytics in later
phases.
11
EY – Query data from different sources using Big SQL to help prevent fraud
Business need:
• Analyze data to quickly detect fraud threats
• Forensic data analytics is seen as a key capability to
invest in
Solution:
• EY is now able to detect potential threats before they
escalate
Business Benefits:
High performance helps queries to run in minutes, not
hours, helping clients rapidly identify and eliminate threats
Gathers data from multiple sources and applies real-
time analytics to identify hidden patterns and anomalies
Ability to process a wide variety of data types, from
journal entries and payment streams to email, news feeds
and social media.
EY is a global leader in assurance,tax, transaction and advisory services.
The insights and quality services EY delivers help build trust and confidence
in the capital markets and in economies the world over, and help to build a
better working world for EY’s people,clients and communities.
Transformation
EY provides its clients with comprehensive protection against fraud and security risks.
IBM Analytics helps EY rapidly detect potential threats before they escalate.
12
Vestas – Leverages complex queries and performance capabilities to turn
climate into capital with Big data using IBM Big SQL
“In our development strategy, we see growing our library in
the range of 18 to 24 petabytes of data. And while it’s fairly
easy to build that library, we needed to make sure that we
could gain knowledge from that data.”
— Lars Christian Christensen, vice president, Vestas Wind Systems
The transformation:
Successful analysis resulted in 97% decrease in response
times for wind forecasting information to pinpoint optimal
turbine placement, maximizes power generation and reduces
energy costs.
Business Benefits:
High performance helps queries to run in minutes, not
hours, helping clients rapidly identify and eliminate threats
Complex query processing helps manage and analyze
weather and location data for calculating the right location
for turbines
Business need:
• Need to process complex queries on large volume of data
• Analyze wind forecasting information for ideal placement of
turbines
Solution:
• Analyzed petabytes of data using complex queries on weather
and location data for turbine placements
• Successful placements of turbines lead to increased customer’s
ROI
13
"In a half-day workshop, we were able to show the
chemical company how big data analytics work and were
able to identify four new customers."
—Dr. Michael Kowolenko, Senior Research Scholar, Poole
College of Management, NC State University
The transformation: The Poole College of Management at NC
State University is developing the next generation of data-
driven decision makers, utilizing an IBM big data solution based
on PowerLinux technology. This system allows its students to
effectively manage and analyze large volumes of structured
and unstructured data from a variety of sources.
NC State University Poole College of Management – Helping businesses
uncover new opportunities with IBM on PowerLinux
Business Benefits:
Application portability – SQL compatibility enables reuse
of application with minimal business query modifications -
Support for Netezza functions
Federation – query capabilities between Netezza and
Hadoop
Leverage existing skills - IBM Big SQL enables and
leverages existing SQL skill set
Business need:
• Make data driven decision on what new courses to add
• Enable students to get real-world experience on Big Data
Solution:
• Created a curriculum that enables students to apply Big
data analytics to real-world problems
• Help businesses identify new opportunities
14
Major North American Food Retailer, implements HDP on IBM POWER
Business need:
• Gain a competitive advantage by retaining and analyzing
their store level loyalty program data
• Bring outsourced analytics back in-house
Solution:
• Consolidation of client transaction data into a Hortonworks
Data Platform on Linux on IBM Power Systems.
• SAP Customer Activity Repository (CAR) application,
powered by SAP HANA, connected to the data lake to
enable real-time insights.
Business Benefits:
• More efficient and flexible in-store experiences for their
clients to increase client loyalty and purchases.
Time to Value
HDP 2.6 running on a cluster of 9 IBM Power System servers
Full solution deployed by IBM lab services and an IBM Business
Partner in < 2 weeks
Trial to production in 2 months
15
Big SQL is the only SQL-on-Hadoop
solution to understand SQL syntax from
other vendors and products, including:
Oracle, IBM DB2 and Netezza.
For this reason, Big SQL is the ultimate
hybrid engine to optimize EDW workloads
on an open Hadoop platform
What is IBM Big SQL?
16
Federation
and
Spark
Performance
Enterprise
and
Security
SQL Compatibility
Relational
Databases
Leads performance
metrics on high
volumes of data and
concurrent streams
Automatic memory
management
Role and Column
level Security
Ranger Integration
NoSQL Object
Stores
Core Themes of Big SQL
17
Big SQL queries heterogeneous systems in a single query - only SQL-on-Hadoop that virtualizes more than 10
different data sources: RDBMS, NoSQL, HDFS or Object Store
Big SQL
Fluid Query (federation)
Oracle
SQL
Server
Teradata DB2
Netezza
(PDA) Informix
Microsoft
SQL Server
Hive HBase HDFS
Object Store
(S3)
WebHDFS
Big SQL allows query federation by virtualizing data sources and processing where data resides
Hortonworks Data Platform (HDP)
Data Virtualization
18
§ Easy porting of enterprise applications
§ Ability to work seamlessly with Business Intelligence tools like Cognos to
gain insights
§ Big SQL integrates with Information Governance Catalog by enabling easy
shared imports to InfoSphere Metadata Asset Manager, which allows:
-Analyze assets
-Utilize assets in jobs
-Designate stewards for the assets
Oracle
SQL
DB2
SQL
Netezza
SQL
Big SQL
SQL syntax tolerance (ANSI SQL Compliant)
Cognos Analytics
InfoSphere Metadata Asset Manager
Big SQL is a synergetic SQL engine that offers SQL compatibility, portability and
collaborative ability to get composite analysis on data
Data Offloading and Analytics
19
BRANCH_A FINANCE
(security admin)BRANCH_B
Role Based Access Control
enables separation
of Duties / Audit
Row Level Security
Row and Column Level Security
Big SQL offers row and column level access control (RBAC) among other security settings
Data Security
20
PERFORMANCE
Big SQL 5.0 is 3.2x faster than Spark SQL 2.1
(4 ConcurrentStreams)SNAPSHOT OF 100TB HADOOP-DS
I/O (vs Spark)
Big SQL reads 12x less data
Big SQL writes 30x less data
COMPRESSION
60%
SPACE SAVED
WITH PARQUET
AVERAGE CPU USAGE
76.4%
MAX I/O THROUGHPUT
READ 4.4 GB/SEC
WRITE 2.8 GB/SEC
WORKING QUERIES
Big SQL’s Performance at a Glance
Leads performance metrics on high volumes of data and concurrent streams
21
Right Tool for the Right Job
Not Mutually Exclusive. Hive, Big SQL & Spark SQL can co-exist and complement each other in a cluster
Big SQL
Federation
Complex Queries
High Concurrency
Enterprise ready
Application portability
All open source file formats
Spark SQL
Machine learning
Data exploration
Simpler SQL
Hive
In-memory cache
Geospatial analytics
ACID capabilities
Fast ingest
Ideal tool for Data Scientists
and discovery
Ideal tool for BI Data Analysts
and production workloads
Ideal tool for simple BI Data Analysts
and production workloads
22
Summary – Get more for Less with Big SQL
§Big	SQL	is	really	a	powerful	runtime	that	makes	access	to	Hive	Tables	fast	and	secure.
§Big	SQL	supports	ANSI	SQL	2003,	2008,	and	even	parts	of	SQL	2011!	
§Though	SQL	for	HBase	can	be	achieved	with	projects	like	Apache	Phoenix,	Big	SQL	provides	
seamless	access	to	both	HBase	and	Hive	tables	with	the	ability	to	join	them	too!
§Big	SQL	can	start	working	with	existing	Hive	and	HBase	tables	in	Hadoop	
§With	Big	SQL	Nicknames,	you	can	provide	seamless	access	to	remote	data	sources	to	allow	
users	to	see	and	experiment	with	data	– without	the	time	and	cost	associated	with	building	
ingest	processes.
§All	of	this	capability,	provided	with	a	single	driver,	one	connection,	a	single	robust	and	
consistent	ANSI	compliant	SQL	dialect	– with	unified	security	managed	for	all	object	types.
Big	SQL	will	save	you	Time	and	Money
23
Act Now - Available until Dec 31, 2017!
24
Innovation Pervasive in the Design
Power Systems S822LC for Big Data
Not Just Another Intel Server
NVIDIA:
Tesla K80 GPU Accelerator
Linux by Redhat:
Redhat7.2 Linux OS
Mellanox:InfiniBand/Ethernet
Connectivity in and out of server
HGST:OptionalNVMe Adapters
Alpha Data with Xilinx FPGA:
OptionalCAPIAccelerator
Broadcom:OptionalPCIe Adapters
QLogic:OptionalFiberChannel PCIe
Samsung:SSDs & NVMe
Hynix,Samsung,Micron:DDR4
IBM: POWER8 CPU
25
IBM Power S822LC for Big Data and Hortonworks Combine to Deliver Leadership in
Hadoop Environments
• Hive/Tez Performance results are basedonIBM Internal Testing of 10 queries (simple, medium, andcomplex) withvaryingruntimes runningagainst a 10TB database. Thetests were run on10x IBM Power System S822LC for Big Data
20 cores / 40 threads, 2 X POWER8 2.92GHz, 256 GB memory, RHEL 7.2,, HDP 2.5.3 comparedto the published x86/Hortonworks results running on10x AWS d2.8xlarge EC2 nodes running HDP 2.5; details canbe foundat
https://hortonworks.com/blog/apache-hive-going-memory-computing/ . Data as of February 28, 2017)
• BigSQL Performance results are basedonIBM Internal Testing of all 50TPC-DS queries selected by Hortonworks. There is a majordifferencewith the 10 longest running queries basedonthe 10TB result Hortonworks teamachieved
with 10 x AWS d2.8xlarge EC2 dataNodes runningwith HDP 2.5.2 (details can befound at https://hortonworks.com/blog/apache-hive-going-memory-computing/). 11 x S822LC for BigData Power servers wereusedas dataNodes running
BigSQL and IOP 4.2.5. Data as of July 12, 2017.
• Conducted under laboratory condition,individual result can vary based on workloadsize, use of storagesubsystems & other conditions.
• POWER8 and Hortonworksdeliver1.70X the throughput
comparedto Hortonworks Hive/Tez running on x86
– 70% More QpH based on the averageresponsetime –
complete the same amountofwork with less system
resources
– 41% Reduction on averagein query response time –
reduced response time enablesmaking business
decisionsfaster.
• IBM BigSQLon IBM PowerSystems can deliver3.5X faster
query times on average for the mostcomplex queries
70%
More
Throughput
26
Data lake with IBM Spectrum Scale
Unleash new storage economies on a global scale.
Block
iSCSI
Client
workstations Users and
applications
Compute
farm
Traditional
applications
GLOBAL Namespace
Analytics
Transparent
HDFS
OpenStack
Cinder
Glance
Manilla
Object
Swift S3
Transparent Cloud
Powered by
IBM Spectrum Scale
Automated data placement and data migration
Disk Tape Shared Nothing
Cluster
Flash
New Gen
applications
Transparent Cloud
Tier
Worldwide Data
Distribution (R/W)
Site B
Site A
Site C
SMBNFS
POSIX
File
Consolidate all your unstructured data storage on Spectrum Scale with unlimited and painless scaling of capacity and performance
Encryption DR Site
AFM-DR
JBOD/JBOF
Spectrum Scale RAID
Compression
4000+
clients
27
Why Spectrum Scale for Big Data & Analytics
Extreme scalability with parallel file system architecture
Data + Metadata
Node
Data + Metadata
Node
Data + Metadata
Node
Data + Metadata
Node
No centralized metadata node bottleneck. Every node can serve as data and metadata in the cluster.
Global namespace that can span geographies
Active – Active replicas of data for real time global collaboration
Reduce datacenter footprint with industry’s best in-place analytics
True software defined storage that can be purchased as software only OR pre-integrated system
Data
NFS
SMB POSIX Object
HDFS API
Access to the data using any of the industry standard protocols
IBM Elastic Storage Server (ESS)
(Pre-integrated system)
IBM Spectrum Scale
(SW only)
28
Reduce data center footprint with Spectrum Scale
HDFS
Raw
data
ext4
ext4
write move copy
Traditional
applicationsCopies in both HDFS and ext4
Spectrum
Scale
Application
writes direct
to Hadoop
Path with
NFS/SMB/
Object/POSIX
direct-read with
NFS/SMB/
Object/POSIX
Raw
data
Traditional
applications
Multiple copies with HDFS based workflow
Spectrum Scale in-place analytics (No copies required)
Hadoop
analysis Jobs Hadoop
analysis Jobs
Direct read,one version
Data Scientists waste daysjust copying data to HDFS No copies required with Spectrum Scale
Costly data protection - Default uses 3-wayreplication with HDFS
IBM ESS Software RAID eliminatesneed for 3 way replication. Just 30%
extra storage requirement.
[Erasure coding in HDFS has limitations and is good only for cold data]
Traditional
applications Traditional
applications
HDFS
APIs
HDFS
APIs
Example: For 5PB of data, HDFS requires 15PB of storage Example: For 5PB of data, ESS requires 6.5PB of storage
Copy process can take hours/days & eventually results are based on stale data.
29
Page 29
HDP and IBM Systems – Better Together
5 Mission Critical Support
Ø Stable, trusted Hadoop platform on proven Power System with outstanding client support
Performance and Price/Performance – Leading performance for SQL and Spark workloads
Ø 1.70X the throughput compared to Hortonworks running on x86 and a 3X price performance
guarantee
2
TCO at Scale with HDP on Power Systems :
Ø Host mixed application workloads on a single global filesystem with IBM Spectrum Scale
Ø Up to 3X reduction of storage and compute infrastructure moving to Power Systems and IBM
Elastic Storage Server vs commodity scale out x86
4
Flexibility – Richest family of Linux servers to match your workload’s scale and reliability needs1
Designed for Cognitive + AI – Obtain your ML/DL results faster with AI on Power servers
Ø PowerAI is the only commercial offering containing all key deep learning frameworks
§ Caffe, TensorFlow, Torch, Theano, OpenBLAS, NCCL, NVIDIA DIGITS*
3
30
Scaling Data Science on Big Data
Date: Wed, 9/20 @ 11:00 AM
Room: C2.3
1
Ingesting Data at Blazing Speed using Apache
ORC
Data: Wed, 9/20 @ 4:20 PM
Room: C4.7
2
Open metadata and governance with Apache
Atlas
Date: Wed, 9/20 @ 5:10 PM
Room: C4.6
Empowering YOU with Democratized Data
Access, Data Science and Machine Learning
Date: Wednesday, 9/20 @ 6:00 PM
Room: C4.5
Breaching the 100TB mark with SQL over
Hadoop
Date: Thurs, 9/21 @ 2:20 PM
Room: C2.3
Apache Spark, Apache Zeppelin
and Data Science
Date: Thurs, 9/21 @ 6:00 PM
Room: C4.5
Visit IBM Booth for More Information!
Check out the Breakout Sessions

More Related Content

What's hot

Delivering Data Science to the Business
Delivering Data Science to the BusinessDelivering Data Science to the Business
Delivering Data Science to the BusinessDataWorks Summit
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at WalgreensDataWorks Summit
 
Cost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationCost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationDataWorks Summit
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep duttaCapgemini
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureMicrosoft
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...DataWorks Summit/Hadoop Summit
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreCloudera, Inc.
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 

What's hot (20)

Delivering Data Science to the Business
Delivering Data Science to the BusinessDelivering Data Science to the Business
Delivering Data Science to the Business
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at Walgreens
 
Cost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationCost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop Implementation
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Hadoop for the Masses
Hadoop for the MassesHadoop for the Masses
Hadoop for the Masses
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data Store
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 

Similar to Empowering you with Democratized Data Access, Data Science and Machine Learning

Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...Seeling Cheung
 
Why You Need to Govern Big Data
Why You Need to Govern Big DataWhy You Need to Govern Big Data
Why You Need to Govern Big DataIBM Analytics
 
IMS10 unleash the capabilities of new technologies
IMS10   unleash the capabilities of new technologiesIMS10   unleash the capabilities of new technologies
IMS10 unleash the capabilities of new technologiesRobert Hain
 
Insight2014 ibm client_center_4_adv_analytics_7171
Insight2014 ibm client_center_4_adv_analytics_7171Insight2014 ibm client_center_4_adv_analytics_7171
Insight2014 ibm client_center_4_adv_analytics_7171IBMgbsNA
 
Advanced Analytics Platform for Big Data Analytics
Advanced Analytics Platform for Big Data AnalyticsAdvanced Analytics Platform for Big Data Analytics
Advanced Analytics Platform for Big Data AnalyticsArvind Sathi
 
IBM MQ on cloud and containers
IBM MQ on cloud and containersIBM MQ on cloud and containers
IBM MQ on cloud and containersRobert Parker
 
[IBM Pulse 2014] #1579 DevOps Technical Strategy and Roadmap
[IBM Pulse 2014] #1579 DevOps Technical Strategy and Roadmap[IBM Pulse 2014] #1579 DevOps Technical Strategy and Roadmap
[IBM Pulse 2014] #1579 DevOps Technical Strategy and RoadmapDaniel Berg
 
BigInsights For Telecom
BigInsights For TelecomBigInsights For Telecom
BigInsights For TelecomSeeling Cheung
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Jeffrey T. Pollock
 
Integrating Structure and Analytics with Unstructured Data
Integrating Structure and Analytics with Unstructured DataIntegrating Structure and Analytics with Unstructured Data
Integrating Structure and Analytics with Unstructured DataDATAVERSITY
 
Benchmarking Hadoop - Which hadoop sql engine leads the herd
Benchmarking Hadoop - Which hadoop sql engine leads the herdBenchmarking Hadoop - Which hadoop sql engine leads the herd
Benchmarking Hadoop - Which hadoop sql engine leads the herdGord Sissons
 
Entry Points – How to Get Rolling with Big Data Analytics
Entry Points – How to Get Rolling with Big Data AnalyticsEntry Points – How to Get Rolling with Big Data Analytics
Entry Points – How to Get Rolling with Big Data AnalyticsInside Analysis
 
DMA 2014: 6 Steps to Integrate Your Big Data
DMA 2014: 6 Steps to Integrate Your Big DataDMA 2014: 6 Steps to Integrate Your Big Data
DMA 2014: 6 Steps to Integrate Your Big DataSameer Khan
 
Indonesia new default short msp client presentation partnership with isv
Indonesia new default short msp client presentation   partnership with isvIndonesia new default short msp client presentation   partnership with isv
Indonesia new default short msp client presentation partnership with isvPandu W Sastrowardoyo
 
Enabling Big Data with IBM InfoSphere Optim
Enabling Big Data with IBM InfoSphere OptimEnabling Big Data with IBM InfoSphere Optim
Enabling Big Data with IBM InfoSphere OptimVineet
 
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessWhere the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessInside Analysis
 
App store and SAM strategy
App store and SAM strategyApp store and SAM strategy
App store and SAM strategyRMayo22
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedCloudera, Inc.
 
Make from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your businessMake from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your businessMarcos Quezada
 

Similar to Empowering you with Democratized Data Access, Data Science and Machine Learning (20)

Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...
 
Why You Need to Govern Big Data
Why You Need to Govern Big DataWhy You Need to Govern Big Data
Why You Need to Govern Big Data
 
IMS10 unleash the capabilities of new technologies
IMS10   unleash the capabilities of new technologiesIMS10   unleash the capabilities of new technologies
IMS10 unleash the capabilities of new technologies
 
Iod 2013 Jackman Schwenger
Iod 2013 Jackman SchwengerIod 2013 Jackman Schwenger
Iod 2013 Jackman Schwenger
 
Insight2014 ibm client_center_4_adv_analytics_7171
Insight2014 ibm client_center_4_adv_analytics_7171Insight2014 ibm client_center_4_adv_analytics_7171
Insight2014 ibm client_center_4_adv_analytics_7171
 
Advanced Analytics Platform for Big Data Analytics
Advanced Analytics Platform for Big Data AnalyticsAdvanced Analytics Platform for Big Data Analytics
Advanced Analytics Platform for Big Data Analytics
 
IBM MQ on cloud and containers
IBM MQ on cloud and containersIBM MQ on cloud and containers
IBM MQ on cloud and containers
 
[IBM Pulse 2014] #1579 DevOps Technical Strategy and Roadmap
[IBM Pulse 2014] #1579 DevOps Technical Strategy and Roadmap[IBM Pulse 2014] #1579 DevOps Technical Strategy and Roadmap
[IBM Pulse 2014] #1579 DevOps Technical Strategy and Roadmap
 
BigInsights For Telecom
BigInsights For TelecomBigInsights For Telecom
BigInsights For Telecom
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
 
Integrating Structure and Analytics with Unstructured Data
Integrating Structure and Analytics with Unstructured DataIntegrating Structure and Analytics with Unstructured Data
Integrating Structure and Analytics with Unstructured Data
 
Benchmarking Hadoop - Which hadoop sql engine leads the herd
Benchmarking Hadoop - Which hadoop sql engine leads the herdBenchmarking Hadoop - Which hadoop sql engine leads the herd
Benchmarking Hadoop - Which hadoop sql engine leads the herd
 
Entry Points – How to Get Rolling with Big Data Analytics
Entry Points – How to Get Rolling with Big Data AnalyticsEntry Points – How to Get Rolling with Big Data Analytics
Entry Points – How to Get Rolling with Big Data Analytics
 
DMA 2014: 6 Steps to Integrate Your Big Data
DMA 2014: 6 Steps to Integrate Your Big DataDMA 2014: 6 Steps to Integrate Your Big Data
DMA 2014: 6 Steps to Integrate Your Big Data
 
Indonesia new default short msp client presentation partnership with isv
Indonesia new default short msp client presentation   partnership with isvIndonesia new default short msp client presentation   partnership with isv
Indonesia new default short msp client presentation partnership with isv
 
Enabling Big Data with IBM InfoSphere Optim
Enabling Big Data with IBM InfoSphere OptimEnabling Big Data with IBM InfoSphere Optim
Enabling Big Data with IBM InfoSphere Optim
 
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessWhere the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information Access
 
App store and SAM strategy
App store and SAM strategyApp store and SAM strategy
App store and SAM strategy
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Make from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your businessMake from your it department a competitive differentiator for your business
Make from your it department a competitive differentiator for your business
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Recently uploaded (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Empowering you with Democratized Data Access, Data Science and Machine Learning

  • 1. Empowering YOU with Democratized Data Access, Data Science and Machine Learning
  • 2. 2 Acknowledgements and Disclaimers Availability. References in this presentationto IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. The workshops, sessions and materials have beenprepared by IBM or the session speakers and reflecttheir own views. They are provided for informational purposes only, and are neither intended to,nor shall have the effect of being,legal or other guidanceor advice to any participant. While efforts were made to verify the completeness and accuracy of theinformationcontained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the useof, or otherwise relatedto, this presentation or any other materials. Nothingcontainedin this presentationis intendedto, nor shall havethe effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicablelicense agreementgoverning theuse of IBM software. All customer examples described are presentedas illustrations of how thosecustomers have usedIBM products and theresults they may have achieved. Actual environmental costs andperformancecharacteristics may vary by customer. Nothing containedin these materials is intended to, nor shall have the effect of, statingor implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. © Copyright IBM Corporation 2017. All rights reserved. — U.S. Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM, the IBM logo, ibm.com, Big SQL are trademarks or registered trademarks of International Business Machines Corporationin the United States, other countries, or both. If these andother IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at §“Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml §TPC Benchmark, TPC-DS, and QphDS are trademarks of Transaction Processing Performance Council §Cloudera, the Cloudera logo, Cloudera Impala are trademarks of Cloudera. §Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the UnitedStates andother countries. §Other company, product,or service names may be trademarks or service marks of others.
  • 3. 3 Operationalizing machine learning and getting actionable insights from disparate sources of data has been a huge challenge Data Integration/Data Engineering Team Data Science Team and Data Engineering Team Application Development Team Line of Business Data still lives in Silos IBM Db2 Operationalize Machine Learning Organization needs to act fast ACT NOW!
  • 4. 4 Amy Chris Data Engineer Ryan Data Scientist Chris Data Engineer Nick Application Developer Amy needs to work with different teams who perform specific tasks to execute the campaign 4 Product details Customer details Sales campaign Operationalize Machine Learning
  • 5. 5 With Big SQL, Amy’s team can save time on execution and enhance Productivity Federation Spark Integration Application Integration Ryan Data Scientist Nick Application Developer ONE TIME DEVELOPMENT 5 Product Details Sales Campaign Customer Details IBM Big SQL Chris Data Engineer
  • 6. 6 Democratize Data Science and Machine Learning Data Ingestion Data Transformation/ Data Science/ Machine Learning Data Visualization Virtualize disparate data sources like Hadoop, RDBMS, and Object Stores (S3) to join data in a single query Manipulate data and operationalize data science models written in various languages Perform data discovery, analyze, and visualize business results in notebooks or other BI tools
  • 7. 7 How are other companies benefiting by using Big SQL
  • 8. 8 Want to modernize your EDW without long and costly migration efforts Offloading historical data from Oracle, Db2, Netezza because reaching capacity Operationalize machine learning Need to query, optimize and integrate multiple data sources from one single endpoint Slow query performance for SQL workloads Require skill set to migrate data from RDBMS to Hadoop / Hive Do you have any of these challenges?
  • 9. 9 Big Fish Games – Uses Big SQL to combine disparate data to drive product innovation through customer feedback with the use of analytics “The ability to answer complicated questions with data from disparate sources will allow our analysts to focus on answering business questions without having to worry about where the data lives or waiting on a project to perform the data integration for them.” -- David Darden, BI Engineering Manager, Big Fish Games Business need: • Understand which product features resonate the best with the gaming community • Increase cross sell and up-sell opportunities Solution: • Combines structured (customer) data from PureData System for Analytics with semi-structured (game log) data in Hadoop Beta Experience and Outcomes: Puts users in charge of data analysis - Access to data without technology getting in the way Faster insights – fast data movement to Hadoop, 3X faster than Sqoop Leverage existing skills - IBM Big SQL enables and leverages existing SQL skill set
  • 10. 10 Southwest Power Pool – Uses Big SQL to federate and reuse applications while taking first steps in establishing an enterprise data lake Business need: • Near term requirements: Offload less frequently used data • Long term vision: Build a data lake infrastructure Solution: • Offloaded cold data to Hadoop and reuse applications • Combine cold data on Hadoop with hot data on Netezza to derive insights Business Benefits: Application portability – SQL compatibility enables reuse of application with minimal business query modifications - Support for Netezza functions Federation – query capabilities between Netezza and Hadoop Leverage existing skills - IBM Big SQL enables and leverages existing SQL skill set Big SQL provides SQL query, federation, transactional, performance, and security capabilities, which will combine with streaming data, governance, and BI analytics in later phases.
  • 11. 11 EY – Query data from different sources using Big SQL to help prevent fraud Business need: • Analyze data to quickly detect fraud threats • Forensic data analytics is seen as a key capability to invest in Solution: • EY is now able to detect potential threats before they escalate Business Benefits: High performance helps queries to run in minutes, not hours, helping clients rapidly identify and eliminate threats Gathers data from multiple sources and applies real- time analytics to identify hidden patterns and anomalies Ability to process a wide variety of data types, from journal entries and payment streams to email, news feeds and social media. EY is a global leader in assurance,tax, transaction and advisory services. The insights and quality services EY delivers help build trust and confidence in the capital markets and in economies the world over, and help to build a better working world for EY’s people,clients and communities. Transformation EY provides its clients with comprehensive protection against fraud and security risks. IBM Analytics helps EY rapidly detect potential threats before they escalate.
  • 12. 12 Vestas – Leverages complex queries and performance capabilities to turn climate into capital with Big data using IBM Big SQL “In our development strategy, we see growing our library in the range of 18 to 24 petabytes of data. And while it’s fairly easy to build that library, we needed to make sure that we could gain knowledge from that data.” — Lars Christian Christensen, vice president, Vestas Wind Systems The transformation: Successful analysis resulted in 97% decrease in response times for wind forecasting information to pinpoint optimal turbine placement, maximizes power generation and reduces energy costs. Business Benefits: High performance helps queries to run in minutes, not hours, helping clients rapidly identify and eliminate threats Complex query processing helps manage and analyze weather and location data for calculating the right location for turbines Business need: • Need to process complex queries on large volume of data • Analyze wind forecasting information for ideal placement of turbines Solution: • Analyzed petabytes of data using complex queries on weather and location data for turbine placements • Successful placements of turbines lead to increased customer’s ROI
  • 13. 13 "In a half-day workshop, we were able to show the chemical company how big data analytics work and were able to identify four new customers." —Dr. Michael Kowolenko, Senior Research Scholar, Poole College of Management, NC State University The transformation: The Poole College of Management at NC State University is developing the next generation of data- driven decision makers, utilizing an IBM big data solution based on PowerLinux technology. This system allows its students to effectively manage and analyze large volumes of structured and unstructured data from a variety of sources. NC State University Poole College of Management – Helping businesses uncover new opportunities with IBM on PowerLinux Business Benefits: Application portability – SQL compatibility enables reuse of application with minimal business query modifications - Support for Netezza functions Federation – query capabilities between Netezza and Hadoop Leverage existing skills - IBM Big SQL enables and leverages existing SQL skill set Business need: • Make data driven decision on what new courses to add • Enable students to get real-world experience on Big Data Solution: • Created a curriculum that enables students to apply Big data analytics to real-world problems • Help businesses identify new opportunities
  • 14. 14 Major North American Food Retailer, implements HDP on IBM POWER Business need: • Gain a competitive advantage by retaining and analyzing their store level loyalty program data • Bring outsourced analytics back in-house Solution: • Consolidation of client transaction data into a Hortonworks Data Platform on Linux on IBM Power Systems. • SAP Customer Activity Repository (CAR) application, powered by SAP HANA, connected to the data lake to enable real-time insights. Business Benefits: • More efficient and flexible in-store experiences for their clients to increase client loyalty and purchases. Time to Value HDP 2.6 running on a cluster of 9 IBM Power System servers Full solution deployed by IBM lab services and an IBM Business Partner in < 2 weeks Trial to production in 2 months
  • 15. 15 Big SQL is the only SQL-on-Hadoop solution to understand SQL syntax from other vendors and products, including: Oracle, IBM DB2 and Netezza. For this reason, Big SQL is the ultimate hybrid engine to optimize EDW workloads on an open Hadoop platform What is IBM Big SQL?
  • 16. 16 Federation and Spark Performance Enterprise and Security SQL Compatibility Relational Databases Leads performance metrics on high volumes of data and concurrent streams Automatic memory management Role and Column level Security Ranger Integration NoSQL Object Stores Core Themes of Big SQL
  • 17. 17 Big SQL queries heterogeneous systems in a single query - only SQL-on-Hadoop that virtualizes more than 10 different data sources: RDBMS, NoSQL, HDFS or Object Store Big SQL Fluid Query (federation) Oracle SQL Server Teradata DB2 Netezza (PDA) Informix Microsoft SQL Server Hive HBase HDFS Object Store (S3) WebHDFS Big SQL allows query federation by virtualizing data sources and processing where data resides Hortonworks Data Platform (HDP) Data Virtualization
  • 18. 18 § Easy porting of enterprise applications § Ability to work seamlessly with Business Intelligence tools like Cognos to gain insights § Big SQL integrates with Information Governance Catalog by enabling easy shared imports to InfoSphere Metadata Asset Manager, which allows: -Analyze assets -Utilize assets in jobs -Designate stewards for the assets Oracle SQL DB2 SQL Netezza SQL Big SQL SQL syntax tolerance (ANSI SQL Compliant) Cognos Analytics InfoSphere Metadata Asset Manager Big SQL is a synergetic SQL engine that offers SQL compatibility, portability and collaborative ability to get composite analysis on data Data Offloading and Analytics
  • 19. 19 BRANCH_A FINANCE (security admin)BRANCH_B Role Based Access Control enables separation of Duties / Audit Row Level Security Row and Column Level Security Big SQL offers row and column level access control (RBAC) among other security settings Data Security
  • 20. 20 PERFORMANCE Big SQL 5.0 is 3.2x faster than Spark SQL 2.1 (4 ConcurrentStreams)SNAPSHOT OF 100TB HADOOP-DS I/O (vs Spark) Big SQL reads 12x less data Big SQL writes 30x less data COMPRESSION 60% SPACE SAVED WITH PARQUET AVERAGE CPU USAGE 76.4% MAX I/O THROUGHPUT READ 4.4 GB/SEC WRITE 2.8 GB/SEC WORKING QUERIES Big SQL’s Performance at a Glance Leads performance metrics on high volumes of data and concurrent streams
  • 21. 21 Right Tool for the Right Job Not Mutually Exclusive. Hive, Big SQL & Spark SQL can co-exist and complement each other in a cluster Big SQL Federation Complex Queries High Concurrency Enterprise ready Application portability All open source file formats Spark SQL Machine learning Data exploration Simpler SQL Hive In-memory cache Geospatial analytics ACID capabilities Fast ingest Ideal tool for Data Scientists and discovery Ideal tool for BI Data Analysts and production workloads Ideal tool for simple BI Data Analysts and production workloads
  • 22. 22 Summary – Get more for Less with Big SQL §Big SQL is really a powerful runtime that makes access to Hive Tables fast and secure. §Big SQL supports ANSI SQL 2003, 2008, and even parts of SQL 2011! §Though SQL for HBase can be achieved with projects like Apache Phoenix, Big SQL provides seamless access to both HBase and Hive tables with the ability to join them too! §Big SQL can start working with existing Hive and HBase tables in Hadoop §With Big SQL Nicknames, you can provide seamless access to remote data sources to allow users to see and experiment with data – without the time and cost associated with building ingest processes. §All of this capability, provided with a single driver, one connection, a single robust and consistent ANSI compliant SQL dialect – with unified security managed for all object types. Big SQL will save you Time and Money
  • 23. 23 Act Now - Available until Dec 31, 2017!
  • 24. 24 Innovation Pervasive in the Design Power Systems S822LC for Big Data Not Just Another Intel Server NVIDIA: Tesla K80 GPU Accelerator Linux by Redhat: Redhat7.2 Linux OS Mellanox:InfiniBand/Ethernet Connectivity in and out of server HGST:OptionalNVMe Adapters Alpha Data with Xilinx FPGA: OptionalCAPIAccelerator Broadcom:OptionalPCIe Adapters QLogic:OptionalFiberChannel PCIe Samsung:SSDs & NVMe Hynix,Samsung,Micron:DDR4 IBM: POWER8 CPU
  • 25. 25 IBM Power S822LC for Big Data and Hortonworks Combine to Deliver Leadership in Hadoop Environments • Hive/Tez Performance results are basedonIBM Internal Testing of 10 queries (simple, medium, andcomplex) withvaryingruntimes runningagainst a 10TB database. Thetests were run on10x IBM Power System S822LC for Big Data 20 cores / 40 threads, 2 X POWER8 2.92GHz, 256 GB memory, RHEL 7.2,, HDP 2.5.3 comparedto the published x86/Hortonworks results running on10x AWS d2.8xlarge EC2 nodes running HDP 2.5; details canbe foundat https://hortonworks.com/blog/apache-hive-going-memory-computing/ . Data as of February 28, 2017) • BigSQL Performance results are basedonIBM Internal Testing of all 50TPC-DS queries selected by Hortonworks. There is a majordifferencewith the 10 longest running queries basedonthe 10TB result Hortonworks teamachieved with 10 x AWS d2.8xlarge EC2 dataNodes runningwith HDP 2.5.2 (details can befound at https://hortonworks.com/blog/apache-hive-going-memory-computing/). 11 x S822LC for BigData Power servers wereusedas dataNodes running BigSQL and IOP 4.2.5. Data as of July 12, 2017. • Conducted under laboratory condition,individual result can vary based on workloadsize, use of storagesubsystems & other conditions. • POWER8 and Hortonworksdeliver1.70X the throughput comparedto Hortonworks Hive/Tez running on x86 – 70% More QpH based on the averageresponsetime – complete the same amountofwork with less system resources – 41% Reduction on averagein query response time – reduced response time enablesmaking business decisionsfaster. • IBM BigSQLon IBM PowerSystems can deliver3.5X faster query times on average for the mostcomplex queries 70% More Throughput
  • 26. 26 Data lake with IBM Spectrum Scale Unleash new storage economies on a global scale. Block iSCSI Client workstations Users and applications Compute farm Traditional applications GLOBAL Namespace Analytics Transparent HDFS OpenStack Cinder Glance Manilla Object Swift S3 Transparent Cloud Powered by IBM Spectrum Scale Automated data placement and data migration Disk Tape Shared Nothing Cluster Flash New Gen applications Transparent Cloud Tier Worldwide Data Distribution (R/W) Site B Site A Site C SMBNFS POSIX File Consolidate all your unstructured data storage on Spectrum Scale with unlimited and painless scaling of capacity and performance Encryption DR Site AFM-DR JBOD/JBOF Spectrum Scale RAID Compression 4000+ clients
  • 27. 27 Why Spectrum Scale for Big Data & Analytics Extreme scalability with parallel file system architecture Data + Metadata Node Data + Metadata Node Data + Metadata Node Data + Metadata Node No centralized metadata node bottleneck. Every node can serve as data and metadata in the cluster. Global namespace that can span geographies Active – Active replicas of data for real time global collaboration Reduce datacenter footprint with industry’s best in-place analytics True software defined storage that can be purchased as software only OR pre-integrated system Data NFS SMB POSIX Object HDFS API Access to the data using any of the industry standard protocols IBM Elastic Storage Server (ESS) (Pre-integrated system) IBM Spectrum Scale (SW only)
  • 28. 28 Reduce data center footprint with Spectrum Scale HDFS Raw data ext4 ext4 write move copy Traditional applicationsCopies in both HDFS and ext4 Spectrum Scale Application writes direct to Hadoop Path with NFS/SMB/ Object/POSIX direct-read with NFS/SMB/ Object/POSIX Raw data Traditional applications Multiple copies with HDFS based workflow Spectrum Scale in-place analytics (No copies required) Hadoop analysis Jobs Hadoop analysis Jobs Direct read,one version Data Scientists waste daysjust copying data to HDFS No copies required with Spectrum Scale Costly data protection - Default uses 3-wayreplication with HDFS IBM ESS Software RAID eliminatesneed for 3 way replication. Just 30% extra storage requirement. [Erasure coding in HDFS has limitations and is good only for cold data] Traditional applications Traditional applications HDFS APIs HDFS APIs Example: For 5PB of data, HDFS requires 15PB of storage Example: For 5PB of data, ESS requires 6.5PB of storage Copy process can take hours/days & eventually results are based on stale data.
  • 29. 29 Page 29 HDP and IBM Systems – Better Together 5 Mission Critical Support Ø Stable, trusted Hadoop platform on proven Power System with outstanding client support Performance and Price/Performance – Leading performance for SQL and Spark workloads Ø 1.70X the throughput compared to Hortonworks running on x86 and a 3X price performance guarantee 2 TCO at Scale with HDP on Power Systems : Ø Host mixed application workloads on a single global filesystem with IBM Spectrum Scale Ø Up to 3X reduction of storage and compute infrastructure moving to Power Systems and IBM Elastic Storage Server vs commodity scale out x86 4 Flexibility – Richest family of Linux servers to match your workload’s scale and reliability needs1 Designed for Cognitive + AI – Obtain your ML/DL results faster with AI on Power servers Ø PowerAI is the only commercial offering containing all key deep learning frameworks § Caffe, TensorFlow, Torch, Theano, OpenBLAS, NCCL, NVIDIA DIGITS* 3
  • 30. 30 Scaling Data Science on Big Data Date: Wed, 9/20 @ 11:00 AM Room: C2.3 1 Ingesting Data at Blazing Speed using Apache ORC Data: Wed, 9/20 @ 4:20 PM Room: C4.7 2 Open metadata and governance with Apache Atlas Date: Wed, 9/20 @ 5:10 PM Room: C4.6 Empowering YOU with Democratized Data Access, Data Science and Machine Learning Date: Wednesday, 9/20 @ 6:00 PM Room: C4.5 Breaching the 100TB mark with SQL over Hadoop Date: Thurs, 9/21 @ 2:20 PM Room: C2.3 Apache Spark, Apache Zeppelin and Data Science Date: Thurs, 9/21 @ 6:00 PM Room: C4.5 Visit IBM Booth for More Information! Check out the Breakout Sessions