SlideShare a Scribd company logo
1 of 43
Download to read offline
Building A Modern Data Architecture (MDA) 
Using Enterprise Hadoop 
Slim Baltagi, Systems Architect 
Hortonworks Inc. 
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Open-BDA Hadoop Summit 2014 
November 18th, 2014
Your Presenter 
Slim Baltagi 
• Currently a Systems Architect in the Professional Services Organization of 
Hortonworks in the central region (US and Canada). 
• Over 4 years of Hadoop experience working on 9 Big Data projects. 
• Slim has over 16 years of IT experience working in various architecture, 
design, development and consulting roles. 
• Slim Baltagi holds a master’s degree in Mathematics and is an ABD in 
computer science from Université Laval, Québec, Canada. 
• Twitter: @SlimBaltagi 
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Outline 
1. Drivers 
for an 
MDA 
2. What’s 
an MDA 
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
3. 
Hadoop’s 
role in an 
MDA 
4. Use 
Cases 
related to 
an MDA 
5. Learn 
More 6. Q&A 
Page 3
Traditional Data Architecture Under Pressure 
DATA 
SYSTEM 
APPLICATIONS 
SOURCES 
Business 
Analy:cs 
Custom 
Applica:ons 
Packaged 
Applica:ons 
RDBMS 
SILO 
SILO 
EDW 
MPP 
SILO 
SILO 
SILO 
SILO 
Exis:ng 
Sources 
(CRM, 
ERP, 
Clickstream, 
Logs) 
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
85% 
Data 
growth: 
New 
Data 
Types 
OLTP, 
ERP, 
CRM 
Systems 
Unstructured 
docs, 
emails 
Server 
logs 
Social/Web 
Data 
Sensor. 
Machine 
Data 
Geoloca:on 
Clickstream 
Source: IDC 
?? 
" Can’t manage new 
data paradigm 
" Constrains data to 
specific schema 
" Siloed data 
" Limited scalability 
" Economically 
unfeasible 
" Limited analytics
A Modern Data Architecture for New Data 
DATA 
SYSTEM 
APPLICATIONS 
Business 
Analy:cs 
Custom 
Applica:ons 
Packaged 
Applica:ons 
RDBMS 
EDW 
MPP 
REPOSITORIES 
SOURCES 
Exis:ng 
Sources 
(CRM, 
ERP, 
Clickstream, 
Logs) 
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
OLTP, 
ERP, 
CRM 
Systems 
Unstructured 
documents, 
emails 
Clickstream 
Server 
logs 
Sen>ment, 
Web 
Data 
Sensor. 
Machine 
Data 
Geoloca>on 
New Data Requirements: 
• Scale 
• Economics 
• Flexibility 
Traditional Data Architecture
Enterprise Goals for the Modern Data Architecture 
Batch Interactive Real-Time 
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
ü Centrally manage new and existing data 
ü Provide single view of the customer, 
product, supply chain 
ü Run batch, interactive & real time analytic 
applications on shared datasets 
ü Assure enterprise-grade security, 
operations and governance 
ü Leverage new and existing data center 
infrastructure investments 
ü Scalable and affordable; low cost per TB 
ü Deployment flexibility 
DATA SYSTEM APPLICATIONS 
Business 
Analytics 
Custom 
Applications 
Packaged 
Applications 
RDBMS 
EDW 
MPP 
YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° 
° ° ° ° ° ° ° ° N 
CRM 
ERP 
Other 
1 ° ° ° 
° ° ° HDFS 
(Hadoop Distributed File System) 
SOURCES 
EXISTING 
Systems 
Clickstream 
Web 
& 
Social 
Geoloca:on 
Sensor 
& 
Machine 
Server 
Logs 
Unstructured
1. Drivers for a Modern Data Architecture (MDA) 
• Semi-Structured and Unstructured – NEW DATA 
Unstructured documents, emails, Sentiment, Web Data, Sensor, Machine Data, 
Geolocation, ... 
• Enterprise Data Warehouse Optimization – REDUCED COSTS 
Low-value computing tasks such as ETL consume significant EDW resources. 
When offloaded to Hadoop, these ETL processes can be performed much 
more efficiently, freeing up your data warehouse to perform high-value 
functions like analytics and operations. 
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
1. Drivers for a Modern Data Architecture 
(Continued) 
• A dvanced Analytics – NEW ANALYTICS APPS 
Unlike schema-on-write, which transforms data into specified schema upon 
load, Hadoop empowers you to store data in any format, and then create 
schema at that moment when you choose to analyze your data. This 
unprecedented flexibility opens up new possibilities for iterative analytics and 
delivers new business value. 
• Single Cluster, Multiple Workloads – ANY WORKLOAD 
With Apache Hadoop YARN supporting multiple access methods (such as 
batch, interactive, streaming and real-time) on a common data set, Hadoop 
enables you to transform and view data in multiple ways simultaneously, 
dramatically reducing time to insight. 
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Outline 
1. Drivers 
for an 
MDA 
2. What’s 
an MDA 
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
3. 
Hadoop’s 
role in an 
MDA 
4. Use 
Cases 
related to 
an MDA 
5. Learn 
More 6. Q&A 
Page 9
2. What’s a Modern Data Architecture (MDA)? 
• Apache Hadoop is a core component of a Modern Data Architecture, 
allowing organizations to collect, store, analyze and manipulate massive 
quantities of data on their own terms—regardless of the source of that data, 
how old it is, where it is stored, or under what format. 
• The Hortonworks Data Platform (HDP) delivers Enterprise Apache Hadoop, 
deeply integrated with existing systems to create a highly efficient, highly 
scalable way to manage all your enterprise data. 
• Integrate new & existing data sets, with existing tools & skills. 
• Make all data available for shared access and processing in multitenant 
infrastructure 
• Batch, interactive & real-time use cases 
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4. Hadoop’s role in an MDA 
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
3. What’s a Modern Data Architecture (MDA)? 
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Outline 
1. Drivers 
for an 
MDA 
2. What’s 
an MDA 
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
3. 
Hadoop’s 
role in an 
MDA 
4. Use 
Cases 
related to 
an MDA 
5. Learn 
More 6. Q&A 
Page 15
Key Drivers of Hadoop 
Batch Interactive Real-Time 
YARN: Data Operating System 
HDFS: ° ° Hadoop ° ° Distributed ° ° File ° System 
° ° 
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
DEV 
& 
DATA 
TOOLS 
Build & 
Test 
OPERATIONS 
TOOLS 
Provision, 
Manage & 
Monitor 
DATA 
SYSTEM 
REPOSITORIES 
SOURCES 
RDBMS 
EDW 
MPP 
APPLICATIONS 
Business 
Analy:cs 
Custom 
Applica:ons 
Packaged 
Applica:ons 
Unlock 
New 
Approach 
to 
Analy:cs 
• Agile 
analy>cs 
via 
“Schema 
on 
Read” 
with 
ability 
to 
store 
all 
data 
in 
na>ve 
format 
• Create 
new 
apps 
from 
new 
types 
of 
data 
A 
Op:mize 
Investments, 
Cut 
Costs 
• Focus 
EDW 
on 
high 
value 
workloads 
• Use 
commodity 
servers 
& 
storage 
to 
enable 
all 
data 
(original 
and 
historical) 
to 
be 
accessible 
for 
ongoing 
explora>on 
B 
Enable 
a 
Modern 
Data 
Architecture 
• Integrate 
new 
& 
exis>ng 
data 
sets 
• Make 
all 
data 
available 
for 
shared 
access 
and 
processing 
in 
mul>tenant 
infrastructure 
• Batch, 
interac>ve 
& 
real-­‐>me 
use 
cases 
• Integrated 
with 
exis>ng 
tools 
& 
skills 
C 
EXISTING 
Systems 
Clickstream 
Web 
& 
Social 
Geoloca:on 
Sensor 
& 
Machine 
Server 
Logs 
Unstructured
Hadoop: It’s About Scale & Structure 
Required on write Required on read 
Standards and structured Multiple Structures 
processing 
Limited, no data processing Processing coupled with data 
Structured data types Multi and unstructured 
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Hadoop 
schema 
governance 
best fit use 
Complex ACID Transactions 
Operational Data Store 
Data Discovery 
Processing unstructured data 
Interactive Analytics 
Traditional 
RDBMS SCALE 
(storage & processing) 
Optimized, reliable transactions Optimized for analytics
YARN and HDP Enables the Modern Data Architecture 
Hortonworks Data Platform 2.2 
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS 
Java 
Scala 
Cascading 
Tez 
Stream 
Storm 
NoSQL 
HBase 
Accumulo 
Sli der 
Slider 
In- 
Memory 
Spark 
YARN: Data Operating System 
(Cluster Resource Management) 
Script 
Pig 
SQL 
Hive 
TezTez 
1 ° ° ° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
° ° ° ° ° ° ° ° 
° ° 
° ° 
Search 
Solr 
Others 
ISV 
Engines 
° ° ° ° ° 
° ° ° ° ° 
Data Workflow, 
Lifecycle & 
Governance 
Falcon 
Sqoop 
Flume 
Kafka 
NFS 
WebHDFS 
Linux Windows Deployment Choice On-Premises 
Cloud 
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
YARN is the architectural center of 
Hadoop and HDP 
• YARN enables a common data set 
across all applications 
• Batch, interactive & real-time 
workloads 
• Support multi-tenant access & 
processing 
HDP enables Apache Hadoop to 
become Enterprise Viable Data 
Platform with centralized services 
• Security 
• Governance 
• Operations 
• Productization 
Enabled broad ecosystem 
adoption 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Authentication 
Authorization 
Audit 
Data Protection 
Storage: HDFS 
Resources: YARN 
Access: Hive 
Pipeline: Falcon 
Cluster: Ranger 
Cluster: Knox 
Hortonworks drove this innovation of Hadoop through YARN
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Outline 
1. Drivers 
for an 
MDA 
2. What’s 
an MDA 
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
3. 
Hadoop’s 
role in an 
MDA 
4. Use 
Cases 
related to 
an MDA 
5. Learn 
More 6. Q&A 
Page 20
Shift to Data-driven Means Treating Data like 
Capital 
A shift in Advertising 
From mass branding …to 1x1 targeting 
A shift in Financial Services 
From educated investing …to automated algorithms 
A shift in Healthcare 
From mass treatment …to designer medicine 
A shift in Retail 
…to real-t From static branding ime personalization 
A shift in Manufacturing 
From break then fix …to repair before break 
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Hadoop enables 
organizations to cost 
effectively store and use 
all of the data available 
in a way that shifts the 
business from… 
Reactive 
Proactive
Create New Applications from New Types of Data 
INDUSTRY USE CASE Sentiment 
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
& Web 
Clickstream 
& Behavior 
Machine 
& Sensor Geographic Server Logs Structured & 
Unstructured 
Financial Services 
New Account Risk Screens ✔ ✔ 
Trading Risk ✔ ✔ 
Insurance Underwriting ✔ ✔ ✔ 
Telecom 
Call Detail Records (CDR) ✔ ✔ 
Infrastructure Investment ✔ ✔ 
Real-time Bandwidth Allocation ✔ ✔ ✔ ✔ ✔ 
Retail 
360° View of the Customer ✔ ✔ ✔ 
Localized, Personalized Promotions ✔ 
Website Optimization ✔ 
Manufacturing 
Supply Chain and Logistics ✔ 
Assembly Line Quality Assurance ✔ 
Crowd-sourced Quality Assurance ✔ 
Healthcare 
Use Genomic Data in Medical Trials ✔ ✔ 
Monitor Patient Vitals in Real-Time ✔ ✔ 
Pharmaceuticals 
Recruit and Retain Patients for Drug Trials ✔ ✔ 
Improve Prescription Adherence ✔ ✔ ✔ ✔ 
Oil & Gas 
Unify Exploration & Production Data ✔ ✔ ✔ ✔ 
Monitor Rig Safety in Real-Time ✔ ✔ ✔ 
Government 
ETL Offload/Federal Budgetary Pressures ✔ ✔ 
Sentiment Analysis for Government Programs ✔
4.1 Advertising 
• Mine Grocery & Drug Store POS Data to Identify High-Value Shoppers 
• Target Ads to Customers in Specific Cultural or Linguistic Segments 
• Syndicate Videos According to Behavior, Demographics & Channel 
• ETL Toy Market Research Data for Longer Retention & Deeper Insight 
• Optimize Online Ad Placement for Retail Websites 
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
5. Use Cases related to an MDA (Continued) 
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.2 Financial Services 
• Screen New Account Applications for Risk of Default 
• Monetize Anonymous Banking Data in Secondary Markets 
• Improve Underwriting Efficiency for Usage-Based Auto Insurance 
• Analyze Insurance Claims with a Shared Data Lake 
• Maintain Sub-Second SLAs with a Hadoop “Ticker Plant” 
• Surveillance of Trading Logs for Anti-Laundering Analysis 
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.3 Healthcare 
• Access Genomic Data for Medical Trials 
• Monitor Patient Vitals in Real-Time 
• Reduce Cardiac Re-Admittance Rates 
• Machine Learning to Screen for Autism with In-Home Testing 
• Store Medical Research Data Forever 
• Recruit Research Cohorts for Pharmaceutical Trials 
• Track Equipment and Medicines with RFID Data 
• Improve Prescription Adherence 
Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.4 Manufacturing 
• Assure Just-In-Time Delivery of Raw Materials 
• Control Quality with Real-Time & Historical Assembly Line Data 
• Avoid Stoppages with Proactive Equipment Maintenance 
• Increase Yields in Drug Manufacturing 
• Crowdsource Quality Assurance 
Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.5 Oil & Gas 
• Slow Decline Curves with Production Parameter Optimization 
• Define Operational Set Points for Each Well & Receive Alerts on Deviations 
• Optimize Lease Bidding with Reliable Yield Predictions 
• Report on Compliance with Environmental , Health and Safety Regulations 
• Repair Equipment Preventatively with Targeted Maintenance 
• Integrate Exploration with Seismic Image Processing 
Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.6 Public Sector 
• Understand Public Sentiment About Government Performance 
• Protect Critical Networks from Threats (Both Internal and External) 
• Prevent Fraud and Waste 
• Analyze Social Media to Identify Terrorist Threats 
• Decrease Budget Pressures by Offloading Expensive SQL Workloads 
• Crowdsource Reporting for Repairs to Roads and Public Infrastructure 
• Fulfill “Open Records” and Freedom of Information Requests 
Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.7 Retail 
• Build a 360 degrees View of the Customer 
• Analyze Brand Sentiment 
• Localize & Personalize Promotions 
• Optimize Websites 
• Optimize Store Layouts 
Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.8 Telecom 
• Analyze Call Detail Records (CDRs) 
• Service Equipment Proactively 
• Rationalize Infrastructure Investments 
• Recommend Next Product to Buy (NPTB) 
• Allocate Bandwidth in Real-time 
• Develop New Products 
Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What is a Data Lake? 
• An architectural pattern in the data center that uses Hadoop to deliver 
deeper insight across a large, broad, diverse set of data at efficient scale 
§ But What is it? 
– It is a PLATFORM for your data. (It is not a database) 
– Multipurpose open PLATFORM to land all data in a single place and interact with it many 
ways. 
§ A platform that allows for the ecosystem to provide higher level services (SAS, SAP, 
Microsoft, Streaming, MPP, In-memory, etc..) 
§ Provides first class APIs and frameworks to enable this integration 
§ Provides first class data management capabilities (metadata management, security, 
transformation pipelines, replication, retention, etc..) 
Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
Page 38
HDP Data Lake Reference Architecture 
NFS 
Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Manage Steps 1-4: Data Management with Falcon 
Step 4: Schedule and Orchestrate 
HIVE PIG Mahout 
Step 3: Transform, Aggregate & Materialize 
compute 
& 
storage 
. . . 
SOLR 
MR2 
. . . 
. . 
compute 
AMBARI 
Knox – Perimeter Level Security 
& 
storage 
. 
. 
YARN 
Data Lake HDP Grid 
INTERACTIVE 
Hive Server 
(Tez/Stinger) 
Stream Processing, 
Real-time Search, 
MPI 
YARN 
Apps 
Page 39 
HCATALOG 
(table & user-defined metadata) 
Step 2: Model/Apply Metadata 
Use Case Type 1: 
Materialize & Exchange 
Opens up Hadoop to many 
new use cases 
Query/ 
Analytics/Reporting 
Tools 
Tableau/Excel 
Datameer/Platfora/SAP 
Use Case Type 2: 
Explore/Visualize 
FALCON (data pipeline & flow management) 
Oozie (Batch scheduler) 
(data processing) 
Exchange 
HBase 
Client 
Sqoop/Hive 
Downstream 
Data Sources 
OLTP 
HBase 
EDW 
(Teradata) 
Storm 
SAS 
TEZ 
Ingestion 
SQOOP 
FLUME 
Web HDFS 
SOURCE DATA 
ClickStream Data 
Sales 
Transaction/Data 
Product Data 
Marketing/ 
Inventory 
Social Data 
EDW 
File 
JMS 
REST 
HTTP 
Streaming 
STORM 
Step 1:Extract & Load
Outline 
1. Drivers 
for an 
MDA 
2. What’s 
an MDA 
Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
3. 
Hadoop’s 
role in an 
MDA 
4. Use 
Cases 
related to 
an MDA 
5. Learn 
More 6. Q&A 
Page 40
5. Learn More … 
Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
Page 41 
Resource Location 
MDA White Paper http://info.hortonworks.com/data-lake-hadoop-whitepaper.html 
Learn more about Modern Data Architecture (MDA) 
MDA Web Page http://hortonworks.com/hadoop-modern-data-architecture/ 
Explore Use Cases by Industry 
Hortonworks 
Sandbox 
http://hortonworks.com/products/hortonworks-sandbox/ 
Get Started on Hadoop with Hortonworks Sandbox 
Hadoop Tutorials http://info.hortonworks.com/On-demand-Tutorials_Sign-Up-Page.html 
On-Demand Hadoop Tutorials Delivered to Your Inbox 
Enterprise Data 
Lake 
http://hortonworks.com/blog/enterprise-hadoop-journey-data-lake/ 
Enterprise Hadoop and the journey to Data Lake
Outline 
1. Drivers 
for an 
MDA 
2. What’s 
an MDA 
Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
3. 
Hadoop’s 
role in an 
MDA 
4. Use 
Cases 
related to 
an MDA 
5. Learn 
More 6. Q&A 
Page 42
6. Q&A… 
Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Thank you!

More Related Content

What's hot

Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalHortonworks
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsHortonworks
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageHortonworks
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceHortonworks
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache HadoopHortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Hortonworks
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your BudgetHortonworks
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2DataWorks Summit
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 

What's hot (20)

Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data London
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your Budget
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 

Viewers also liked

Openerp Project Slides
Openerp Project SlidesOpenerp Project Slides
Openerp Project SlidesOdoo
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks
 
OpenERP Benchmark : How to test performance and robustness against your volum...
OpenERP Benchmark : How to test performance and robustness against your volum...OpenERP Benchmark : How to test performance and robustness against your volum...
OpenERP Benchmark : How to test performance and robustness against your volum...Odoo
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Intégration OpenErp par Targa
 Intégration OpenErp par Targa Intégration OpenErp par Targa
Intégration OpenErp par TargaNabil Majoul
 

Viewers also liked (6)

Openerp Project Slides
Openerp Project SlidesOpenerp Project Slides
Openerp Project Slides
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
XDS - Cross-Enterprise Document Sharing
XDS - Cross-Enterprise Document SharingXDS - Cross-Enterprise Document Sharing
XDS - Cross-Enterprise Document Sharing
 
OpenERP Benchmark : How to test performance and robustness against your volum...
OpenERP Benchmark : How to test performance and robustness against your volum...OpenERP Benchmark : How to test performance and robustness against your volum...
OpenERP Benchmark : How to test performance and robustness against your volum...
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Intégration OpenErp par Targa
 Intégration OpenErp par Targa Intégration OpenErp par Targa
Intégration OpenErp par Targa
 

Similar to Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Architecture with Enterprise Hadoop)

Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - finalHortonworks
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksHortonworks
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Pactera_US
 

Similar to Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Architecture with Enterprise Hadoop) (20)

Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinar
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Architecture with Enterprise Hadoop)

  • 1. Building A Modern Data Architecture (MDA) Using Enterprise Hadoop Slim Baltagi, Systems Architect Hortonworks Inc. Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Open-BDA Hadoop Summit 2014 November 18th, 2014
  • 2. Your Presenter Slim Baltagi • Currently a Systems Architect in the Professional Services Organization of Hortonworks in the central region (US and Canada). • Over 4 years of Hadoop experience working on 9 Big Data projects. • Slim has over 16 years of IT experience working in various architecture, design, development and consulting roles. • Slim Baltagi holds a master’s degree in Mathematics and is an ABD in computer science from Université Laval, Québec, Canada. • Twitter: @SlimBaltagi Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 3. Outline 1. Drivers for an MDA 2. What’s an MDA Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 3
  • 4. Traditional Data Architecture Under Pressure DATA SYSTEM APPLICATIONS SOURCES Business Analy:cs Custom Applica:ons Packaged Applica:ons RDBMS SILO SILO EDW MPP SILO SILO SILO SILO Exis:ng Sources (CRM, ERP, Clickstream, Logs) Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 85% Data growth: New Data Types OLTP, ERP, CRM Systems Unstructured docs, emails Server logs Social/Web Data Sensor. Machine Data Geoloca:on Clickstream Source: IDC ?? " Can’t manage new data paradigm " Constrains data to specific schema " Siloed data " Limited scalability " Economically unfeasible " Limited analytics
  • 5. A Modern Data Architecture for New Data DATA SYSTEM APPLICATIONS Business Analy:cs Custom Applica:ons Packaged Applica:ons RDBMS EDW MPP REPOSITORIES SOURCES Exis:ng Sources (CRM, ERP, Clickstream, Logs) Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved OLTP, ERP, CRM Systems Unstructured documents, emails Clickstream Server logs Sen>ment, Web Data Sensor. Machine Data Geoloca>on New Data Requirements: • Scale • Economics • Flexibility Traditional Data Architecture
  • 6. Enterprise Goals for the Modern Data Architecture Batch Interactive Real-Time Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved ü Centrally manage new and existing data ü Provide single view of the customer, product, supply chain ü Run batch, interactive & real time analytic applications on shared datasets ü Assure enterprise-grade security, operations and governance ü Leverage new and existing data center infrastructure investments ü Scalable and affordable; low cost per TB ü Deployment flexibility DATA SYSTEM APPLICATIONS Business Analytics Custom Applications Packaged Applications RDBMS EDW MPP YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N CRM ERP Other 1 ° ° ° ° ° ° HDFS (Hadoop Distributed File System) SOURCES EXISTING Systems Clickstream Web & Social Geoloca:on Sensor & Machine Server Logs Unstructured
  • 7. 1. Drivers for a Modern Data Architecture (MDA) • Semi-Structured and Unstructured – NEW DATA Unstructured documents, emails, Sentiment, Web Data, Sensor, Machine Data, Geolocation, ... • Enterprise Data Warehouse Optimization – REDUCED COSTS Low-value computing tasks such as ETL consume significant EDW resources. When offloaded to Hadoop, these ETL processes can be performed much more efficiently, freeing up your data warehouse to perform high-value functions like analytics and operations. Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 8. 1. Drivers for a Modern Data Architecture (Continued) • A dvanced Analytics – NEW ANALYTICS APPS Unlike schema-on-write, which transforms data into specified schema upon load, Hadoop empowers you to store data in any format, and then create schema at that moment when you choose to analyze your data. This unprecedented flexibility opens up new possibilities for iterative analytics and delivers new business value. • Single Cluster, Multiple Workloads – ANY WORKLOAD With Apache Hadoop YARN supporting multiple access methods (such as batch, interactive, streaming and real-time) on a common data set, Hadoop enables you to transform and view data in multiple ways simultaneously, dramatically reducing time to insight. Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 9. Outline 1. Drivers for an MDA 2. What’s an MDA Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 9
  • 10. 2. What’s a Modern Data Architecture (MDA)? • Apache Hadoop is a core component of a Modern Data Architecture, allowing organizations to collect, store, analyze and manipulate massive quantities of data on their own terms—regardless of the source of that data, how old it is, where it is stored, or under what format. • The Hortonworks Data Platform (HDP) delivers Enterprise Apache Hadoop, deeply integrated with existing systems to create a highly efficient, highly scalable way to manage all your enterprise data. • Integrate new & existing data sets, with existing tools & skills. • Make all data available for shared access and processing in multitenant infrastructure • Batch, interactive & real-time use cases Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 11. 4. Hadoop’s role in an MDA Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 12. 3. What’s a Modern Data Architecture (MDA)? Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 13. Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 14. Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 15. Outline 1. Drivers for an MDA 2. What’s an MDA Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 15
  • 16. Key Drivers of Hadoop Batch Interactive Real-Time YARN: Data Operating System HDFS: ° ° Hadoop ° ° Distributed ° ° File ° System ° ° Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved DEV & DATA TOOLS Build & Test OPERATIONS TOOLS Provision, Manage & Monitor DATA SYSTEM REPOSITORIES SOURCES RDBMS EDW MPP APPLICATIONS Business Analy:cs Custom Applica:ons Packaged Applica:ons Unlock New Approach to Analy:cs • Agile analy>cs via “Schema on Read” with ability to store all data in na>ve format • Create new apps from new types of data A Op:mize Investments, Cut Costs • Focus EDW on high value workloads • Use commodity servers & storage to enable all data (original and historical) to be accessible for ongoing explora>on B Enable a Modern Data Architecture • Integrate new & exis>ng data sets • Make all data available for shared access and processing in mul>tenant infrastructure • Batch, interac>ve & real-­‐>me use cases • Integrated with exis>ng tools & skills C EXISTING Systems Clickstream Web & Social Geoloca:on Sensor & Machine Server Logs Unstructured
  • 17. Hadoop: It’s About Scale & Structure Required on write Required on read Standards and structured Multiple Structures processing Limited, no data processing Processing coupled with data Structured data types Multi and unstructured Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop schema governance best fit use Complex ACID Transactions Operational Data Store Data Discovery Processing unstructured data Interactive Analytics Traditional RDBMS SCALE (storage & processing) Optimized, reliable transactions Optimized for analytics
  • 18. YARN and HDP Enables the Modern Data Architecture Hortonworks Data Platform 2.2 GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS Java Scala Cascading Tez Stream Storm NoSQL HBase Accumulo Sli der Slider In- Memory Spark YARN: Data Operating System (Cluster Resource Management) Script Pig SQL Hive TezTez 1 ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) ° ° ° ° ° ° ° ° ° ° ° ° Search Solr Others ISV Engines ° ° ° ° ° ° ° ° ° ° Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Linux Windows Deployment Choice On-Premises Cloud Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN is the architectural center of Hadoop and HDP • YARN enables a common data set across all applications • Batch, interactive & real-time workloads • Support multi-tenant access & processing HDP enables Apache Hadoop to become Enterprise Viable Data Platform with centralized services • Security • Governance • Operations • Productization Enabled broad ecosystem adoption Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Authentication Authorization Audit Data Protection Storage: HDFS Resources: YARN Access: Hive Pipeline: Falcon Cluster: Ranger Cluster: Knox Hortonworks drove this innovation of Hadoop through YARN
  • 19. Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 20. Outline 1. Drivers for an MDA 2. What’s an MDA Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 20
  • 21. Shift to Data-driven Means Treating Data like Capital A shift in Advertising From mass branding …to 1x1 targeting A shift in Financial Services From educated investing …to automated algorithms A shift in Healthcare From mass treatment …to designer medicine A shift in Retail …to real-t From static branding ime personalization A shift in Manufacturing From break then fix …to repair before break Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop enables organizations to cost effectively store and use all of the data available in a way that shifts the business from… Reactive Proactive
  • 22. Create New Applications from New Types of Data INDUSTRY USE CASE Sentiment Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved & Web Clickstream & Behavior Machine & Sensor Geographic Server Logs Structured & Unstructured Financial Services New Account Risk Screens ✔ ✔ Trading Risk ✔ ✔ Insurance Underwriting ✔ ✔ ✔ Telecom Call Detail Records (CDR) ✔ ✔ Infrastructure Investment ✔ ✔ Real-time Bandwidth Allocation ✔ ✔ ✔ ✔ ✔ Retail 360° View of the Customer ✔ ✔ ✔ Localized, Personalized Promotions ✔ Website Optimization ✔ Manufacturing Supply Chain and Logistics ✔ Assembly Line Quality Assurance ✔ Crowd-sourced Quality Assurance ✔ Healthcare Use Genomic Data in Medical Trials ✔ ✔ Monitor Patient Vitals in Real-Time ✔ ✔ Pharmaceuticals Recruit and Retain Patients for Drug Trials ✔ ✔ Improve Prescription Adherence ✔ ✔ ✔ ✔ Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ ✔ Monitor Rig Safety in Real-Time ✔ ✔ ✔ Government ETL Offload/Federal Budgetary Pressures ✔ ✔ Sentiment Analysis for Government Programs ✔
  • 23. 4.1 Advertising • Mine Grocery & Drug Store POS Data to Identify High-Value Shoppers • Target Ads to Customers in Specific Cultural or Linguistic Segments • Syndicate Videos According to Behavior, Demographics & Channel • ETL Toy Market Research Data for Longer Retention & Deeper Insight • Optimize Online Ad Placement for Retail Websites Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 24. 5. Use Cases related to an MDA (Continued) Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 25. 4.2 Financial Services • Screen New Account Applications for Risk of Default • Monetize Anonymous Banking Data in Secondary Markets • Improve Underwriting Efficiency for Usage-Based Auto Insurance • Analyze Insurance Claims with a Shared Data Lake • Maintain Sub-Second SLAs with a Hadoop “Ticker Plant” • Surveillance of Trading Logs for Anti-Laundering Analysis Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 26. Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 27. 4.3 Healthcare • Access Genomic Data for Medical Trials • Monitor Patient Vitals in Real-Time • Reduce Cardiac Re-Admittance Rates • Machine Learning to Screen for Autism with In-Home Testing • Store Medical Research Data Forever • Recruit Research Cohorts for Pharmaceutical Trials • Track Equipment and Medicines with RFID Data • Improve Prescription Adherence Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 28. Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 29. 4.4 Manufacturing • Assure Just-In-Time Delivery of Raw Materials • Control Quality with Real-Time & Historical Assembly Line Data • Avoid Stoppages with Proactive Equipment Maintenance • Increase Yields in Drug Manufacturing • Crowdsource Quality Assurance Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 30. Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 31. 4.5 Oil & Gas • Slow Decline Curves with Production Parameter Optimization • Define Operational Set Points for Each Well & Receive Alerts on Deviations • Optimize Lease Bidding with Reliable Yield Predictions • Report on Compliance with Environmental , Health and Safety Regulations • Repair Equipment Preventatively with Targeted Maintenance • Integrate Exploration with Seismic Image Processing Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 32. Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 33. 4.6 Public Sector • Understand Public Sentiment About Government Performance • Protect Critical Networks from Threats (Both Internal and External) • Prevent Fraud and Waste • Analyze Social Media to Identify Terrorist Threats • Decrease Budget Pressures by Offloading Expensive SQL Workloads • Crowdsource Reporting for Repairs to Roads and Public Infrastructure • Fulfill “Open Records” and Freedom of Information Requests Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 34. 4.7 Retail • Build a 360 degrees View of the Customer • Analyze Brand Sentiment • Localize & Personalize Promotions • Optimize Websites • Optimize Store Layouts Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 35. Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 36. 4.8 Telecom • Analyze Call Detail Records (CDRs) • Service Equipment Proactively • Rationalize Infrastructure Investments • Recommend Next Product to Buy (NPTB) • Allocate Bandwidth in Real-time • Develop New Products Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 37. Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 38. What is a Data Lake? • An architectural pattern in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale § But What is it? – It is a PLATFORM for your data. (It is not a database) – Multipurpose open PLATFORM to land all data in a single place and interact with it many ways. § A platform that allows for the ecosystem to provide higher level services (SAS, SAP, Microsoft, Streaming, MPP, In-memory, etc..) § Provides first class APIs and frameworks to enable this integration § Provides first class data management capabilities (metadata management, security, transformation pipelines, replication, retention, etc..) Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 Page 38
  • 39. HDP Data Lake Reference Architecture NFS Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Manage Steps 1-4: Data Management with Falcon Step 4: Schedule and Orchestrate HIVE PIG Mahout Step 3: Transform, Aggregate & Materialize compute & storage . . . SOLR MR2 . . . . . compute AMBARI Knox – Perimeter Level Security & storage . . YARN Data Lake HDP Grid INTERACTIVE Hive Server (Tez/Stinger) Stream Processing, Real-time Search, MPI YARN Apps Page 39 HCATALOG (table & user-defined metadata) Step 2: Model/Apply Metadata Use Case Type 1: Materialize & Exchange Opens up Hadoop to many new use cases Query/ Analytics/Reporting Tools Tableau/Excel Datameer/Platfora/SAP Use Case Type 2: Explore/Visualize FALCON (data pipeline & flow management) Oozie (Batch scheduler) (data processing) Exchange HBase Client Sqoop/Hive Downstream Data Sources OLTP HBase EDW (Teradata) Storm SAS TEZ Ingestion SQOOP FLUME Web HDFS SOURCE DATA ClickStream Data Sales Transaction/Data Product Data Marketing/ Inventory Social Data EDW File JMS REST HTTP Streaming STORM Step 1:Extract & Load
  • 40. Outline 1. Drivers for an MDA 2. What’s an MDA Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 40
  • 41. 5. Learn More … Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 Page 41 Resource Location MDA White Paper http://info.hortonworks.com/data-lake-hadoop-whitepaper.html Learn more about Modern Data Architecture (MDA) MDA Web Page http://hortonworks.com/hadoop-modern-data-architecture/ Explore Use Cases by Industry Hortonworks Sandbox http://hortonworks.com/products/hortonworks-sandbox/ Get Started on Hadoop with Hortonworks Sandbox Hadoop Tutorials http://info.hortonworks.com/On-demand-Tutorials_Sign-Up-Page.html On-Demand Hadoop Tutorials Delivered to Your Inbox Enterprise Data Lake http://hortonworks.com/blog/enterprise-hadoop-journey-data-lake/ Enterprise Hadoop and the journey to Data Lake
  • 42. Outline 1. Drivers for an MDA 2. What’s an MDA Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 42
  • 43. 6. Q&A… Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Thank you!