Big Data Architecture For enterprise

•

1 like•514 views

Wei Zhang

Design Principles
• Future-proof, scalable and auto recoverable,
compatible with existing technologies, loose
coupled and layered architecture

Centralized Data
Governance service
• Build Schema catalog service to track all data
entities and attributes for both structured and
unstructured data sets
• Establish and enforce proper practices including
solution patterns/design, coding, testing
automation and release procedues

Logical Architecture
Data Transformation and
storage
Data
Acquisition
Text ﬁles
Image ﬁles
XML ﬁles
EDI ﬁles
Event
…
Data
Distribution
BI
Reports
Text ﬁles
Image ﬁles
XML ﬁles
EDI ﬁles
Event
…
Data Processing Pipeline
Hadoop HDFS
MapReduce
Hive
Pig
Flume
Spark
Java/Scala
NoSql
MongoDB
Cassandra
Relational
Database
MS Sql
Oracle
MySql

Logical Architecture
• Data lifecycle control, access audit, replication
and DR
• On-desk and in-memory data processing
technology stack - sql or nosql database,
hadoop map reduce, Spark or ETL tool etc
• Central data inventory services for discovery,
tracking and optimization

Technology Stack
• HDFS, MapReduce, Yarn
• Oozie, Hive, Spark, Kafka, Cassandra,
MongoDB
• BI & Reporting, Data acquisition and
distribution, Data inventory and data model

Schema Catalog
• MongoDB schema store
• Schemas, Entities, attributes deﬁned using Arvo
format
• Deﬁne all Data Sources, destinations including
format, transfer protocol, ﬁle system, schedule
etc

Data Ledger
• Ledger inventory of all business data set across
enterprise
• data set producer and consumer registration
• Data set are tagged and can be queried for
traceability and usages

Data Process and Persistent
• Relational database for OLTP, data warehouse
and BI which need to access SQL database and
existing systems
• HDFS for source, destination, staging, no
structured document, large to huge data
processing, data saved in either Arvo or Parquet
format for better exchange and performance
• Cassanadra for high frequency, high write
transaction systems and MongoDB for document

Automated and Regression
Testing
• Maven, SBT, Junit, Scalatest

Physical Deployment
• Low End: 7.2 RPM / 75 IOPS, 16 core, 128G
(data acquisition and distribution)
• Medium: 15k RPM / 175 IOPS, 24 core, 512G
(batch processing)
• High End: 6K - 500K IOPS, 80 core, 1.5T
(realtime processing/analytics)

What's hot

Cloudera – One Platform to Rule Them All Xpand IT

Innovation in the Data Warehouse - StampedeCon 2016StampedeCon

Big Data Analytics Projects - Real World with PentahoMark Kromer

"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...Dataconomy Media

Data platform architectureSudheer Kondla

Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn

Next Generation Data Platforms - Deon ThomasThoughtworks

Versa Shore Microsoft APS PDW webinarShawn Rao

Modernizing Your Data Warehouse using APSStéphane Fréchette

DBP-010_Using Azure Data Services for Modern Data Applicationsdecode2016

Big Data and Hadoop Training in ChandigarhBig Boxx Animation Academy

Solving Performance Problems on HadoopTyler Mitchell

Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyondBaiji He

Big Data in the Real WorldMark Kromer

Big Data in the Cloud with Azure Marketplace ImagesMark Kromer

Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon

Real Use Cases - Pentaho & Big Data Ecosystem Xpand IT

Azure cafe marketplace with looker data analyticsMark Kromer

How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon

Introduction to Big DataMd. Afif Al Mamun

What's hot (20)

Cloudera – One Platform to Rule Them All

Innovation in the Data Warehouse - StampedeCon 2016

Big Data Analytics Projects - Real World with Pentaho

"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...

Data platform architecture

Dirty Data? Clean it up! - Rocky Mountain DataCon 2016

Next Generation Data Platforms - Deon Thomas

Versa Shore Microsoft APS PDW webinar

Modernizing Your Data Warehouse using APS

DBP-010_Using Azure Data Services for Modern Data Applications

Big Data and Hadoop Training in Chandigarh

Solving Performance Problems on Hadoop

Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond

Big Data in the Real World

Big Data in the Cloud with Azure Marketplace Images

Turn Data Into Actionable Insights - StampedeCon 2016

Real Use Cases - Pentaho & Big Data Ecosystem

Azure cafe marketplace with looker data analytics

How to get started in Big Data without Big Costs - StampedeCon 2016

Introduction to Big Data

Viewers also liked

GUSS - CRITEO Meetup Scale SQL for the WebGUSS

Overall System Architecture of Big Data of Wind Power Based on IoT_20161...元黄

2014 ChattingCat service architecturechattingcat

PostgreSql vaccum승범 현

Snap chat Interface Analysis Report Seunghun Yoo

Enterprise architecture for big data projectsSandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW

10 Best Practices for Tableau Dashboard Design: Data Exploration and Actionab...Senturus

TOAST Meetup2015 - TOAST Cloud XaaS framework architecture (문지응)TOAST_NHNent

Micro Service Architecture 탐방기jbugkorea

Multidimensional Database Design & Architecturehasanshan

Life of data from generation to visualization using big dataBlazeclan Technologies Private Limited

Big Data Architectural PatternsAmazon Web Services

SnapchatShooger

Software Architecture and Design - An OverviewOliver Stadie

A Software Architect's View On Diagrammingmeghantaylor

대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론Terry Cho

Przentacja o kwiatachnela007

KnowledgehutManjunath V

Academic and skills international qualifications asiqual of abms switzerland...Ailsaaa

Tradiçoes Madeirenses João Tomás 4ºAMarileneCunha1

Viewers also liked (20)

GUSS - CRITEO Meetup Scale SQL for the Web

Overall System Architecture of Big Data of Wind Power Based on IoT_20161...

2014 ChattingCat service architecture

PostgreSql vaccum

Snap chat Interface Analysis Report

Enterprise architecture for big data projects

10 Best Practices for Tableau Dashboard Design: Data Exploration and Actionab...

TOAST Meetup2015 - TOAST Cloud XaaS framework architecture (문지응)

Micro Service Architecture 탐방기

Multidimensional Database Design & Architecture

Life of data from generation to visualization using big data

Big Data Architectural Patterns

Snapchat

Software Architecture and Design - An Overview

A Software Architect's View On Diagramming

대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론

Przentacja o kwiatach

Knowledgehut

Academic and skills international qualifications asiqual of abms switzerland...

Tradiçoes Madeirenses João Tomás 4ºA

Similar to Big Data Architecture For enterprise

How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences

Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays

No sql and sql - open analytics summitOpen Analytics

Prague data management meetup 2018-03-27Martin Bém

Microsoft Data Platform - What's includedJames Serra

Big Data Simplified - Is all about Ab'strakSHeNDataWorks Summit

Data warehouseSaurab Dulal

Is the traditional data warehouse dead?James Serra

Transform your DBMS to drive engagement innovation with Big DataAshnikbiz

Presentation big dataappliance-overview_oow_v3xKinAnx

20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge

Colorado Springs Open Source Hadoop/MySQL David Smelker

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho

Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...confluent

Big Data Analytics with HadoopPhilippe Julio

Apache drillMapR Technologies

Data Science Machine Lerning Bigdat.pptxPriyadarshini648418

Meta scale kognitio hadoop webinarMichael Hiskey

Options for Data Prep - A Survey of the Current MarketDremio Corporation

The Holy Grail of Data AnalyticsDan Lynn

Similar to Big Data Architecture For enterprise (20)

How to use Big Data and Data Lake concept in business using Hadoop and Spark...

Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...

No sql and sql - open analytics summit

Prague data management meetup 2018-03-27

Microsoft Data Platform - What's included

Big Data Simplified - Is all about Ab'strakSHeN

Data warehouse

Is the traditional data warehouse dead?

Transform your DBMS to drive engagement innovation with Big Data

Presentation big dataappliance-overview_oow_v3

20160331 sa introduction to big data pipelining berlin meetup 0.3

Colorado Springs Open Source Hadoop/MySQL

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...

Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...

Big Data Analytics with Hadoop

Apache drill

Data Science Machine Lerning Bigdat.pptx

Meta scale kognitio hadoop webinar

Options for Data Prep - A Survey of the Current Market

The Holy Grail of Data Analytics

Big Data Architecture For enterprise

1. Big Data Architecture for Enterprise Wei Zhang Big Data Architect Up up consultant, LLC

2. Design Principles • Future-proof, scalable and auto recoverable, compatible with existing technologies, loose coupled and layered architecture

3. Centralized Data Governance service • Build Schema catalog service to track all data entities and attributes for both structured and unstructured data sets • Establish and enforce proper practices including solution patterns/design, coding, testing automation and release procedues

4. Logical Architecture Data Transformation and storage Data Acquisition Text files Image files XML files EDI files Event … Data Distribution BI Reports Text files Image files XML files EDI files Event … Data Processing Pipeline Hadoop HDFS MapReduce Hive Pig Flume Spark Java/Scala NoSql MongoDB Cassandra Relational Database MS Sql Oracle MySql

5. Logical Architecture • Data lifecycle control, access audit, replication and DR • On-desk and in-memory data processing technology stack - sql or nosql database, hadoop map reduce, Spark or ETL tool etc • Central data inventory services for discovery, tracking and optimization

6. Technology Stack • HDFS, MapReduce, Yarn • Oozie, Hive, Spark, Kafka, Cassandra, MongoDB • BI & Reporting, Data acquisition and distribution, Data inventory and data model

7. Schema Catalog • MongoDB schema store • Schemas, Entities, attributes defined using Arvo format • Define all Data Sources, destinations including format, transfer protocol, file system, schedule etc

8. Data Ledger • Ledger inventory of all business data set across enterprise • data set producer and consumer registration • Data set are tagged and can be queried for traceability and usages

9. Data Process and Persistent • Relational database for OLTP, data warehouse and BI which need to access SQL database and existing systems • HDFS for source, destination, staging, no structured document, large to huge data processing, data saved in either Arvo or Parquet format for better exchange and performance • Cassanadra for high frequency, high write transaction systems and MongoDB for document

10. Automated and Regression Testing • Maven, SBT, Junit, Scalatest

11. Physical Deployment • Low End: 7.2 RPM / 75 IOPS, 16 core, 128G (data acquisition and distribution) • Medium: 15k RPM / 175 IOPS, 24 core, 512G (batch processing) • High End: 6K - 500K IOPS, 80 core, 1.5T (realtime processing/analytics)

Big Data Architecture For enterprise

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Big Data Architecture For enterprise

Similar to Big Data Architecture For enterprise (20)

Big Data Architecture For enterprise