Take control of your SAP testing with UiPath Test Suite
Vanilla Hadoop vs. the rest
1. Vanilla Hadoop vs. the RestVanilla Hadoop vs. the Rest
Viet-Trung TranViet-Trung Tran
2. Why HadoopWhy Hadoop
2012,2012,
Worldwide Hadoop-MapReduce Ecosystem Software 2012-2016Worldwide Hadoop-MapReduce Ecosystem Software 2012-2016
(IDC #234294) forecasts Hadoop ecosystem win be worth(IDC #234294) forecasts Hadoop ecosystem win be worth
of 813 millions by 2016of 813 millions by 2016
01.2013, IDC predicts a big data market that will grow01.2013, IDC predicts a big data market that will grow
revenue at 31.7 percent a year until it hits the $23.8 billionrevenue at 31.7 percent a year until it hits the $23.8 billion
mark in 2016mark in 2016
The majority of Fortune 500 companies are at least experimentingThe majority of Fortune 500 companies are at least experimenting
..
3. Hadoop in mainstreamHadoop in mainstream
production appsproduction apps
Companies are using Hadoop toCompanies are using Hadoop to off load data warehouse-bound data.off load data warehouse-bound data. Hadoop,onHadoop,on
average, provides at least a 10x cost savings over data warehouse solutionsaverage, provides at least a 10x cost savings over data warehouse solutions
Financial institutions are using HadoopFinancial institutions are using Hadoop as a critical part of their securityas a critical part of their security
architecture — to predict phishing behavior and payments fraud in real time andarchitecture — to predict phishing behavior and payments fraud in real time and
minimize their impact. They hold on to data for longer periods and run moreminimize their impact. They hold on to data for longer periods and run more
detailed analytics and forensics.detailed analytics and forensics.
An online advertising companyAn online advertising company provides real-time trading technology to its usersprovides real-time trading technology to its users
and relies on Hadoop to store and analyze petabytes worth of data. 90 billionand relies on Hadoop to store and analyze petabytes worth of data. 90 billion
realtime ad auctions are processed each day on their Hadoop distribution.realtime ad auctions are processed each day on their Hadoop distribution.
A digital marketing intelligenceA digital marketing intelligence provider uses Hadoop to process over 1.7 trillionprovider uses Hadoop to process over 1.7 trillion
Internet and mobile records per month providing syndicated and custom digitalInternet and mobile records per month providing syndicated and custom digital
marketing intelligence.marketing intelligence.
14. Choice of Hadoop distributionChoice of Hadoop distribution
Vanilla HadoopVanilla Hadoop
Opensource Hadoop + SupportOpensource Hadoop + Support
Opensource Hadoop + Support +Opensource Hadoop + Support +
proprietary improvedproprietary improved
managementsmanagements
Opensouce Hadoop + Support +Opensouce Hadoop + Support +
Proprietary architecturalProprietary architectural
ImprovementsImprovements
15. Big data begs a big question:Big data begs a big question:
does Hadoop replace yourdoes Hadoop replace your
enterprise data? warehouse orenterprise data? warehouse or
augment it?augment it?Cloudera: RevolutionCloudera: Revolution
Hadoop first vendorHadoop first vendor
Introducing the Enterprise Data Hub in which Hadoop replaces theIntroducing the Enterprise Data Hub in which Hadoop replaces the
data warehousedata warehouse
+ Commercial software+ Commercial software
Hortonworks: EvolutionHortonworks: Evolution
Partnering with leading commercial data management and analyticsPartnering with leading commercial data management and analytics
vendorsvendors
Opensource puristOpensource purist
16. "Increasingly, our customers are not viewing the relevant"Increasingly, our customers are not viewing the relevant
comparison as Cloudera versus Hortonworks,"comparison as Cloudera versus Hortonworks,"
"They're viewing it as Cloudera versus Hortonworks plus"They're viewing it as Cloudera versus Hortonworks plus
Teradata Aster, or, if you're talking to an IBM shop, ClouderaTeradata Aster, or, if you're talking to an IBM shop, Cloudera
versus IBM BigInsights plus Netezza."versus IBM BigInsights plus Netezza."
Cloudera director of product marketingCloudera director of product marketing
17. Cloudera vs. HortonworksCloudera vs. Hortonworks
business modelbusiness model
First mover momentumFirst mover momentum
The old “Nobody got fired for buying IBM” routineThe old “Nobody got fired for buying IBM” routine
03.2014, Intel ditchs its-own distro, invest $740 millions to buy 18% Cloudera03.2014, Intel ditchs its-own distro, invest $740 millions to buy 18% Cloudera
Hortonworks Wants To Own Big Data Without Owning AnythingHortonworks Wants To Own Big Data Without Owning Anything
““History repeats itself”History repeats itself”
““This is Red Hat versus Oracle and IBM”This is Red Hat versus Oracle and IBM”
““We are focused on not moving up the stack”We are focused on not moving up the stack”
““not stepping on the toes of anyone with the capacity to crush us”not stepping on the toes of anyone with the capacity to crush us”
Teradata, for instance, finds itself reselling Hortonworks’ cheaper product against itsTeradata, for instance, finds itself reselling Hortonworks’ cheaper product against its
own higher margin ones – a relationship that may not be built to lastown higher margin ones – a relationship that may not be built to last
18. MacOS/SUSE vs. RedHatMacOS/SUSE vs. RedHat
Business model?Business model?
Hortonworks to have steady,Hortonworks to have steady,
long-term growthlong-term growth
Red Hat win? CommunityRed Hat win? Community
Red Hat contributes more toRed Hat contributes more to
the Linux kernel than anythe Linux kernel than any
single individual or company.single individual or company.
Red Hat attractsRed Hat attracts
"professional developer""professional developer"
The platform with the biggestThe platform with the biggest
community wins.community wins.
21. Spark the disrupterSpark the disrupter
Cloudera impala vs Hortonworks Stinger vs. SparkSQLCloudera impala vs Hortonworks Stinger vs. SparkSQL
08.2014, Hortonworks: A shared vision for Apache Spark on08.2014, Hortonworks: A shared vision for Apache Spark on
HadoopHadoop
25. Are there any reasons for using vendor specificAre there any reasons for using vendor specific
Hadoop distributions like Cloudera/HortonworksHadoop distributions like Cloudera/Hortonworks
instead of vanilla Apache Hadoop if I'm not usinginstead of vanilla Apache Hadoop if I'm not using
their support services?their support services?
Ask yourself, is your business to manage trillions of data objects, analyzeAsk yourself, is your business to manage trillions of data objects, analyze
customer behaviour, device behavior or other analytic tasks, to find thatcustomer behaviour, device behavior or other analytic tasks, to find that
strategic advantage, prevent fraud, prevent failure, and improvestrategic advantage, prevent fraud, prevent failure, and improve
customer satisfaction?customer satisfaction?
Or is your mission build and maintain dozens and dozens of openOr is your mission build and maintain dozens and dozens of open
source components, troubleshoot arcane bugs and to answer urgentsource components, troubleshoot arcane bugs and to answer urgent
questions at 2am?questions at 2am?
Which is higher value to you and your organization?Which is higher value to you and your organization?
Will your organization derive more benefit from you writing that one keyWill your organization derive more benefit from you writing that one key
Hive Query, or from distributing packages across a cluster of machines?Hive Query, or from distributing packages across a cluster of machines?
26. Based on the alliances that Hortonworks and Cloudera haveBased on the alliances that Hortonworks and Cloudera have
achieved, it makes sense to use their packages to insureachieved, it makes sense to use their packages to insure
compatibility and integration with those tools. Cloudera forcompatibility and integration with those tools. Cloudera for
Oracle and Hortonworks for Teradata and SAS to name aOracle and Hortonworks for Teradata and SAS to name a
few. This is by no means saying that you can't use otherfew. This is by no means saying that you can't use other
distributions for those integrations, but the alliance ensures adistributions for those integrations, but the alliance ensures a
compatibility and integration testing you won't find withcompatibility and integration testing you won't find with
straight open sourcestraight open source
32. HortonworksHortonworks
Hortonworks (NASDAQ:HDP) opened at $24 and is now atHortonworks (NASDAQ:HDP) opened at $24 and is now at
$24.13, up 50.8%from its $16 IPO price.$24.13, up 50.8%from its $16 IPO price.
Hortonworks, one of the two most prominent developersHortonworks, one of the two most prominent developers
(along with Intel-backed Cloudera) of software distributions(along with Intel-backed Cloudera) of software distributions
for the Hadoop big data framework, is now worth just overfor the Hadoop big data framework, is now worth just over
$1B, or ~15x gross billings from the 12 months ending Sep.$1B, or ~15x gross billings from the 12 months ending Sep.
30.30.
33. Cloudera: CustomersCloudera: Customers
100 clients: AMD, Ebay, Western Union100 clients: AMD, Ebay, Western Union
TechnologyTechnology
Retail ecommerceRetail ecommerce
HealthcareHealthcare
EnergyEnergy
TelecommunicationTelecommunication
Financial servicesFinancial services
263 partners: Teradata (data warehouse), Microsoft (Azure cloud service), Intel263 partners: Teradata (data warehouse), Microsoft (Azure cloud service), Intel
Analytics & Business IntelligenceAnalytics & Business Intelligence
ApplicationsApplications
DatabaseDatabase
Data integrationData integration
CloudCloud
VirtualizationVirtualization
SecuritySecurity