SlideShare a Scribd company logo
1 of 11
Download to read offline
Big Data Architecture
for Enterprise
Wei Zhang
Big Data Architect
Up up consultant, LLC
Design Principles
• Future-proof, scalable and auto recoverable,
compatible with existing technologies, loose
coupled and layered architecture
Centralized Data
Governance service
• Build Schema catalog service to track all data
entities and attributes for both structured and
unstructured data sets
• Establish and enforce proper practices including
solution patterns/design, coding, testing
automation and release procedues
Logical Architecture
Data Transformation and
storage
Data
Acquisition
Text files
Image files
XML files
EDI files
Event
…
Data
Distribution
BI
Reports
Text files
Image files
XML files
EDI files
Event
…
Data Processing Pipeline
Hadoop HDFS
MapReduce
Hive
Pig
Flume
Spark
Java/Scala
NoSql
MongoDB
Cassandra
Relational
Database
MS Sql
Oracle
MySql
Logical Architecture
• Data lifecycle control, access audit, replication
and DR
• On-desk and in-memory data processing
technology stack - sql or nosql database,
hadoop map reduce, Spark or ETL tool etc
• Central data inventory services for discovery,
tracking and optimization
Technology Stack
• HDFS, MapReduce, Yarn
• Oozie, Hive, Spark, Kafka, Cassandra,
MongoDB
• BI & Reporting, Data acquisition and
distribution, Data inventory and data model
Schema Catalog
• MongoDB schema store
• Schemas, Entities, attributes defined using Arvo
format
• Define all Data Sources, destinations including
format, transfer protocol, file system, schedule
etc
Data Ledger
• Ledger inventory of all business data set across
enterprise
• data set producer and consumer registration
• Data set are tagged and can be queried for
traceability and usages
Data Process and Persistent
• Relational database for OLTP, data warehouse
and BI which need to access SQL database and
existing systems
• HDFS for source, destination, staging, no
structured document, large to huge data
processing, data saved in either Arvo or Parquet
format for better exchange and performance
• Cassanadra for high frequency, high write
transaction systems and MongoDB for document
Automated and Regression
Testing
• Maven, SBT, Junit, Scalatest
Physical Deployment
• Low End: 7.2 RPM / 75 IOPS, 16 core, 128G
(data acquisition and distribution)
• Medium: 15k RPM / 175 IOPS, 24 core, 512G
(batch processing)
• High End: 6K - 500K IOPS, 80 core, 1.5T
(realtime processing/analytics)

More Related Content

What's hot

Cloudera – One Platform to Rule Them All
Cloudera – One Platform to Rule Them All Cloudera – One Platform to Rule Them All
Cloudera – One Platform to Rule Them All Xpand IT
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoMark Kromer
 
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio..."Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...Dataconomy Media
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasThoughtworks
 
Versa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarVersa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarShawn Rao
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
DBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data ApplicationsDBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data Applicationsdecode2016
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on HadoopTyler Mitchell
 
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyondRakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyondBaiji He
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesMark Kromer
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
Real Use Cases - Pentaho & Big Data Ecosystem
Real Use Cases - Pentaho & Big Data Ecosystem Real Use Cases - Pentaho & Big Data Ecosystem
Real Use Cases - Pentaho & Big Data Ecosystem Xpand IT
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsMark Kromer
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon
 

What's hot (20)

Cloudera – One Platform to Rule Them All
Cloudera – One Platform to Rule Them All Cloudera – One Platform to Rule Them All
Cloudera – One Platform to Rule Them All
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio..."Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
Versa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarVersa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinar
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
DBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data ApplicationsDBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data Applications
 
Big Data and Hadoop Training in Chandigarh
Big Data and Hadoop Training in ChandigarhBig Data and Hadoop Training in Chandigarh
Big Data and Hadoop Training in Chandigarh
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyondRakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace Images
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Real Use Cases - Pentaho & Big Data Ecosystem
Real Use Cases - Pentaho & Big Data Ecosystem Real Use Cases - Pentaho & Big Data Ecosystem
Real Use Cases - Pentaho & Big Data Ecosystem
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analytics
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 

Viewers also liked

GUSS - CRITEO Meetup Scale SQL for the Web
GUSS - CRITEO Meetup Scale SQL for the WebGUSS - CRITEO Meetup Scale SQL for the Web
GUSS - CRITEO Meetup Scale SQL for the WebGUSS
 
Overall System Architecture of Big Data of Wind Power Based on IoT_20161...
Overall System Architecture of Big Data of Wind Power Based on IoT_20161...Overall System Architecture of Big Data of Wind Power Based on IoT_20161...
Overall System Architecture of Big Data of Wind Power Based on IoT_20161...元 黄
 
2014 ChattingCat service architecture
2014 ChattingCat service architecture2014 ChattingCat service architecture
2014 ChattingCat service architecturechattingcat
 
PostgreSql vaccum
PostgreSql vaccumPostgreSql vaccum
PostgreSql vaccum승범 현
 
Snap chat Interface Analysis Report
Snap chat Interface Analysis Report Snap chat Interface Analysis Report
Snap chat Interface Analysis Report Seunghun Yoo
 
10 Best Practices for Tableau Dashboard Design: Data Exploration and Actionab...
10 Best Practices for Tableau Dashboard Design: Data Exploration and Actionab...10 Best Practices for Tableau Dashboard Design: Data Exploration and Actionab...
10 Best Practices for Tableau Dashboard Design: Data Exploration and Actionab...Senturus
 
TOAST Meetup2015 - TOAST Cloud XaaS framework architecture (문지응)
TOAST Meetup2015 - TOAST Cloud XaaS framework architecture (문지응)TOAST Meetup2015 - TOAST Cloud XaaS framework architecture (문지응)
TOAST Meetup2015 - TOAST Cloud XaaS framework architecture (문지응)TOAST_NHNent
 
Micro Service Architecture 탐방기
Micro Service Architecture 탐방기Micro Service Architecture 탐방기
Micro Service Architecture 탐방기jbugkorea
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecturehasanshan
 
Snapchat
SnapchatSnapchat
SnapchatShooger
 
Software Architecture and Design - An Overview
Software Architecture and Design - An OverviewSoftware Architecture and Design - An Overview
Software Architecture and Design - An OverviewOliver Stadie
 
A Software Architect's View On Diagramming
A Software Architect's View On DiagrammingA Software Architect's View On Diagramming
A Software Architect's View On Diagrammingmeghantaylor
 
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론Terry Cho
 
Przentacja o kwiatach
Przentacja o kwiatachPrzentacja o kwiatach
Przentacja o kwiatachnela007
 
Academic and skills international qualifications asiqual of abms switzerland...
Academic and skills international qualifications asiqual  of abms switzerland...Academic and skills international qualifications asiqual  of abms switzerland...
Academic and skills international qualifications asiqual of abms switzerland...Ailsaaa
 
Tradiçoes Madeirenses João Tomás 4ºA
Tradiçoes Madeirenses João Tomás 4ºATradiçoes Madeirenses João Tomás 4ºA
Tradiçoes Madeirenses João Tomás 4ºAMarileneCunha1
 

Viewers also liked (20)

GUSS - CRITEO Meetup Scale SQL for the Web
GUSS - CRITEO Meetup Scale SQL for the WebGUSS - CRITEO Meetup Scale SQL for the Web
GUSS - CRITEO Meetup Scale SQL for the Web
 
Overall System Architecture of Big Data of Wind Power Based on IoT_20161...
Overall System Architecture of Big Data of Wind Power Based on IoT_20161...Overall System Architecture of Big Data of Wind Power Based on IoT_20161...
Overall System Architecture of Big Data of Wind Power Based on IoT_20161...
 
2014 ChattingCat service architecture
2014 ChattingCat service architecture2014 ChattingCat service architecture
2014 ChattingCat service architecture
 
PostgreSql vaccum
PostgreSql vaccumPostgreSql vaccum
PostgreSql vaccum
 
Snap chat Interface Analysis Report
Snap chat Interface Analysis Report Snap chat Interface Analysis Report
Snap chat Interface Analysis Report
 
Enterprise architecture for big data projects
Enterprise architecture for big data projectsEnterprise architecture for big data projects
Enterprise architecture for big data projects
 
10 Best Practices for Tableau Dashboard Design: Data Exploration and Actionab...
10 Best Practices for Tableau Dashboard Design: Data Exploration and Actionab...10 Best Practices for Tableau Dashboard Design: Data Exploration and Actionab...
10 Best Practices for Tableau Dashboard Design: Data Exploration and Actionab...
 
TOAST Meetup2015 - TOAST Cloud XaaS framework architecture (문지응)
TOAST Meetup2015 - TOAST Cloud XaaS framework architecture (문지응)TOAST Meetup2015 - TOAST Cloud XaaS framework architecture (문지응)
TOAST Meetup2015 - TOAST Cloud XaaS framework architecture (문지응)
 
Micro Service Architecture 탐방기
Micro Service Architecture 탐방기Micro Service Architecture 탐방기
Micro Service Architecture 탐방기
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
 
Life of data from generation to visualization using big data
Life of data from generation to visualization using big dataLife of data from generation to visualization using big data
Life of data from generation to visualization using big data
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Snapchat
SnapchatSnapchat
Snapchat
 
Software Architecture and Design - An Overview
Software Architecture and Design - An OverviewSoftware Architecture and Design - An Overview
Software Architecture and Design - An Overview
 
A Software Architect's View On Diagramming
A Software Architect's View On DiagrammingA Software Architect's View On Diagramming
A Software Architect's View On Diagramming
 
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
 
Przentacja o kwiatach
Przentacja o kwiatachPrzentacja o kwiatach
Przentacja o kwiatach
 
Knowledgehut
KnowledgehutKnowledgehut
Knowledgehut
 
Academic and skills international qualifications asiqual of abms switzerland...
Academic and skills international qualifications asiqual  of abms switzerland...Academic and skills international qualifications asiqual  of abms switzerland...
Academic and skills international qualifications asiqual of abms switzerland...
 
Tradiçoes Madeirenses João Tomás 4ºA
Tradiçoes Madeirenses João Tomás 4ºATradiçoes Madeirenses João Tomás 4ºA
Tradiçoes Madeirenses João Tomás 4ºA
 

Similar to Big Data Architecture For enterprise

How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summitOpen Analytics
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNDataWorks Summit
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL David Smelker
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho
 
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...confluent
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxPriyadarshini648418
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsDan Lynn
 

Similar to Big Data Architecture For enterprise (20)

How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Apache drill
Apache drillApache drill
Apache drill
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data Analytics
 

Big Data Architecture For enterprise

  • 1. Big Data Architecture for Enterprise Wei Zhang Big Data Architect Up up consultant, LLC
  • 2. Design Principles • Future-proof, scalable and auto recoverable, compatible with existing technologies, loose coupled and layered architecture
  • 3. Centralized Data Governance service • Build Schema catalog service to track all data entities and attributes for both structured and unstructured data sets • Establish and enforce proper practices including solution patterns/design, coding, testing automation and release procedues
  • 4. Logical Architecture Data Transformation and storage Data Acquisition Text files Image files XML files EDI files Event … Data Distribution BI Reports Text files Image files XML files EDI files Event … Data Processing Pipeline Hadoop HDFS MapReduce Hive Pig Flume Spark Java/Scala NoSql MongoDB Cassandra Relational Database MS Sql Oracle MySql
  • 5. Logical Architecture • Data lifecycle control, access audit, replication and DR • On-desk and in-memory data processing technology stack - sql or nosql database, hadoop map reduce, Spark or ETL tool etc • Central data inventory services for discovery, tracking and optimization
  • 6. Technology Stack • HDFS, MapReduce, Yarn • Oozie, Hive, Spark, Kafka, Cassandra, MongoDB • BI & Reporting, Data acquisition and distribution, Data inventory and data model
  • 7. Schema Catalog • MongoDB schema store • Schemas, Entities, attributes defined using Arvo format • Define all Data Sources, destinations including format, transfer protocol, file system, schedule etc
  • 8. Data Ledger • Ledger inventory of all business data set across enterprise • data set producer and consumer registration • Data set are tagged and can be queried for traceability and usages
  • 9. Data Process and Persistent • Relational database for OLTP, data warehouse and BI which need to access SQL database and existing systems • HDFS for source, destination, staging, no structured document, large to huge data processing, data saved in either Arvo or Parquet format for better exchange and performance • Cassanadra for high frequency, high write transaction systems and MongoDB for document
  • 10. Automated and Regression Testing • Maven, SBT, Junit, Scalatest
  • 11. Physical Deployment • Low End: 7.2 RPM / 75 IOPS, 16 core, 128G (data acquisition and distribution) • Medium: 15k RPM / 175 IOPS, 24 core, 512G (batch processing) • High End: 6K - 500K IOPS, 80 core, 1.5T (realtime processing/analytics)