SlideShare a Scribd company logo
1 of 59
ยฉ 2014 MapR Technologies 1ยฉ 2014 MapR Technologies
ยฉ 2014 MapR Technologies 2
Contact Information
Ted Dunning
Chief Applications Architect at MapR Technologies
Committer & PMC for Apacheโ€™s Drill, Zookeeper & others
VP of Incubator at Apache Foundation
Email tdunning@apache.org tdunning@maprtech.com
Twitter @ted_dunning
Hashtags today: #stratahadoop #ojai
ยฉ 2014 MapR Technologies 3
Donโ€™t Miss These
โ€ข Just-in-time optimizing a database
โ€“ Me! at 4:20 PM, Room 230 C, today
โ€ข Why flow instead of state?
โ€“ Me! at 5:10 PM, Room 210 D/H, today
โ€ข High Frequency Decisioning
โ€“ Jack Norris! at 11:00 PM, Room 210 B/F, tomorrow
โ€ข Threat detection on streaming data
โ€“ Carol Macdonald! at 3:45 PM, Solutions Theater, tomorrow
โ€ข Scaling Your Business โ€ฆ Zeta Architecture
โ€“ Jim Scott! at 5:10 PM, Room 210 D/H, tomorrow
ยฉ 2014 MapR Technologies 4
And Also, a Little Fun
Come jam with us
The Big Data Boys and the Real-time Stream Band
5:50 PM, MapR booth, today
ยฉ 2014 MapR Technologies 5
Goals
โ€ข Real-time or near-time
โ€“ Includes situations with deadlines
โ€“ Also includes situations where delay is simply undesirable
โ€“ Even includes situations where delay is just fine
โ€ข Micro-services
โ€“ Streaming is a convenient idiom for design
โ€“ Micro-services โ€ฆ you know we wanted it
โ€“ Service isolation is a key requirement
ยฉ 2014 MapR Technologies 6
Real-time or Near-time?
โ€ข The real point is flow versus state (see talk later today)
โ€ข One consequence of flow-based computing is real-time and
near-time become relatively easy
โ€ข Life may be a bitch, but it doesnโ€™t happen in batches!
ยฉ 2014 MapR Technologies 8
Agenda
โ€ข Background / micro-services
โ€ข Global requirements
โ€ข Scale
ยฉ 2014 MapR Technologies 9
A microservice is
loosely coupled
with bounded context
ยฉ 2014 MapR Technologies 10
How to Couple Services and Break micro-ness
โ€ข Shared schemas, relational stores
โ€ข Ad hoc communication between services
โ€ข Enterprise service busses
โ€ข Brittle protocols
โ€ข Poor protocol versioning
Donโ€™t do this!
ยฉ 2014 MapR Technologies 11
How to Decouple Services
โ€ข Use self-describing data
โ€ข Private databases
โ€ข Infrastructural communication between services
โ€ข Use modern protocols
โ€ข Adopt future-proof protocol practices
โ€ข Use shared storage where necessary due to scale
ยฉ 2014 MapR Technologies 13
What is the Right Structure for Flow Compute?
โ€ข Traditional message queues?
โ€“ Message queues are classic answer
โ€“ Key feature/bug is out-of-order acknowledgement
โ€“ Many implementations
โ€“ You pay a huge performance hit for persistence
โ€ข Kafka-esque Logs?
โ€“ Logs are like queues, but with ordering
โ€“ Out of order consumption is possible, acknowledgement not so much
โ€“ Canonical base implementation is Kafka
โ€“ Performance plus persistence
ยฉ 2014 MapR Technologies 14
Scenarios
Profile Database
ยฉ 2014 MapR Technologies 15
The task
?
POS 1
location, t, card #
yes/no?
POS 2
location, t, card #
yes/no?
ยฉ 2014 MapR Technologies 16
Traditional Solution
POS
1..n
Fraud
detector
Last card
use
ยฉ 2014 MapR Technologies 17
What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
ยฉ 2014 MapR Technologies 18
What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
ยฉ 2014 MapR Technologies 19
How to Get Service Isolation
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
ยฉ 2014 MapR Technologies 20
New Uses of Data
POS
1..n
Fraud
detector
Last card
use
Updater
Card
location
history
Other
card activity
ยฉ 2014 MapR Technologies 21
Scaling Through Isolation
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector
ยฉ 2014 MapR Technologies 22
Lessons
โ€ข De-coupling and isolation are key
โ€ข Private data stores/tables are important,
โ€“ but local storage of private data is a bug
โ€ข Propagate events, not table updates
ยฉ 2014 MapR Technologies 23
Scenarios
IoT Data Aggregation
ยฉ 2014 MapR Technologies 24
Basic Situation
Each location
has many
pumps
pump data
Multiple
locations
ยฉ 2014 MapR Technologies 25
What Does a Pump Look Like
inlet
out let
m ot or
Temperature
Pressure
Flow
Temperature
Pressure
Flow
Winding temperature
Voltage
Current
ยฉ 2014 MapR Technologies 26
Basic Situation
Each location
has many
pumps
pump data
Multiple
locations
ยฉ 2014 MapR Technologies 27
pump data
pump data
pump data
pump data
Basic Architecture Reflects Business Structure
ยฉ 2014 MapR Technologies 28
Lessons
โ€ข Data architecture should reflect business structure
โ€ข Even very modest designs involve multiple data centers
โ€ข Schemas cannot be frozen in the real world
โ€ข Security must follow data ownership
ยฉ 2014 MapR Technologies 29
Scenarios
Global Data Recovery
ยฉ 2014 MapR Technologies 30
Tokyo
Corporate
HQ
ยฉ 2014 MapR Technologies 31
Singapore
Tokyo
Corporate
HQ
ยฉ 2014 MapR Technologies 32
Singapore
Tokyo
Corporate
HQ
ยฉ 2014 MapR Technologies 33
Singapore
Tokyo
Corporate
HQ
ยฉ 2014 MapR Technologies 34
Lessons
โ€ข Arbitrary number of topics important for simplicity + performance
โ€ข Updates happen in many places
โ€ข Mobility implies change in replication patterns
โ€ข Multi-master updates simplify design massively
ยฉ 2014 MapR Technologies 35
Converged Requirements
ยฉ 2014 MapR Technologies 36
What Have We Learned?
โ€ข Need persistence and performance
โ€“ Possibly for years and to 100โ€™s of millions t/s
โ€ข Must have convergence
โ€“ Need files, tables AND streams
โ€“ Need volumes, snapshots, mirrors, permissions and โ€ฆ
โ€ข Must have platform security
โ€“ Cannot depend on perimeter
โ€“ Must follow business structure
โ€ข Must have global scale and scope
โ€“ Millions of topics for natural designs
โ€“ Multi-master replication and update
ยฉ 2014 MapR Technologies 37
The Importance of Common APIโ€™s
โ€ข Commonality and interoperability are critical
โ€“ Compare Hadoop eco-system and the noSQL world
โ€ข Table stakes
โ€“ Persistence
โ€“ Performance
โ€“ Polymorphism
โ€ข Major trend so far is to adopt Kafka API
โ€“ 0.9 API and beyond remove major abstraction leaks
โ€“ Kafka API supported by all major Hadoop vendors
ยฉ 2014 MapR Technologies 38
What we do
ยฉ 2014 MapR Technologies 39
Evolution of Data Storage
Functionality
Compatibility
Scalability
Linux
POSIX
Over decades of progress,
Unix-based systems have set the
standard for compatibility and
functionality
ยฉ 2014 MapR Technologies 40
Functionality
Compatibility
Scalability
Linux
POSIX
Hadoop
Hadoop achieves much higher
scalability by trading away
essentially all of this compatibility
Evolution of Data Storage
ยฉ 2014 MapR Technologies 41
Evolution of Data Storage
Functionality
Compatibility
Scalability
Linux
POSIX
Hadoop
MapR enhanced Apache Hadoop by
restoring the compatibility while
increasing scalability and performance
Functionality
Compatibility
Scalability
POSIX
ยฉ 2014 MapR Technologies 42
Functionality
Compatibility
Scalability
Linux
POSIX
Hadoop
Evolution of Data Storage
Adding tables and streams enhances
the functionality of the base file
system
ยฉ 2014 MapR Technologies 43
http://bit.ly/fastest-big-data
ยฉ 2014 MapR Technologies 44
How we do this with MapR
โ€ข MapR Streams is a C++ reimplementation of Kafka API
โ€“ Advantages in predictability, performance, scale
โ€“ Common security and permissions with entire MapR converged data
platform
โ€ข Semantic extensions
โ€“ A cluster contains volumes, files, tables โ€ฆ and now streams
โ€“ Streams contain topics
โ€“ Can have default stream or can name stream by path name
โ€ข Core MapR capabilities preserved
โ€“ Consistent snapshots, mirrors, multi-master replication
ยฉ 2014 MapR Technologies 45
MapR core Innovations
โ€ข Volumes
โ€“ Distributed management
โ€“ Data placement
โ€ข Read/write random access file system
โ€“ Allows distributed meta-data
โ€“ Improved scaling
โ€“ Enables NFS access
โ€ข Application-level NIC bonding
โ€ข Transactionally correct snapshots and mirrors
ยฉ 2014 MapR Technologies 46
MapR's Containers
๏ฌ Each container contains
๏ฌ Directories & files
๏ฌ Data blocks
๏ฌ Replicated on servers
๏ฌ No need to manage
directly
Files/directories are sharded into blocks, which
are placed into containers on disks
Containers are 16-
32 GB segments of
disk, placed on
nodes
ยฉ 2014 MapR Technologies 47
MapR's Containers
๏ฌ Each container has a
replication chain
๏ฌ Updates are transactional
๏ฌ Failures are handled by
rearranging replication
ยฉ 2014 MapR Technologies 48
Container locations and replication
CLDB
N1, N2
N3, N2
N1, N2
N1, N3
N3, N2
N1
N2
N3Container location database
(CLDB) keeps track of nodes
hosting each container and
replication chain order
ยฉ 2014 MapR Technologies 49
MapR Scaling
Containers represent 16 - 32GB of data
๏ฌ Each can hold up to 1 Billion files and directories
๏ฌ 100M containers = ~ 2 Exabytes (a very large cluster)
250 bytes DRAM to cache a container
๏ฌ 25GB to cache all containers for 2EB cluster
๏€ญ But not necessary, can page to disk
๏ฌ Typical large 10PB cluster needs 2GB
Container-reports are 100x - 1000x < HDFS block-reports
๏ฌ Serve 100x more data-nodes
๏ฌ Increase container size to 64G to serve 4EB cluster
๏ฌ Map/reduce not affected
ยฉ 2014 MapR Technologies 50
But Wait, Thereโ€™s More
โ€ข Directories and files are implemented in terms of B-trees
โ€“ Key is offset, value is data blob
โ€“ Internal transactional semantics guarantees safety and consistency
โ€“ Layout algorithms give very high layout linearization
โ€ข Tables are implemented in terms of B-trees
โ€“ Twisted B-tree implementation allows virtues of log-structured merge
tree without the compaction delays
โ€“ Tablet splitting without pausing, integration with file system transactions
โ€ข Common security and permissions scheme
ยฉ 2014 MapR Technologies 51
And More โ€ฆ
โ€ข Streams are implemented in terms of B-trees as well
โ€“ Topics and consumer offsets are kept in stream, not ZK
โ€“ Similar splitting technology as MapR DB tables
โ€“ Consistent permissions, security, data replication
โ€ข Standard Kafka 0.9 API
โ€ข Plans to add OJAI for high-level structuring
โ€ข Performance is very high
ยฉ 2014 MapR Technologies 52
Example
Files
Table
Streams
Directories
Cluster
Volume mount point
ยฉ 2014 MapR Technologies 53
Cluster
Volume mount point
ยฉ 2014 MapR Technologies 54
Lessons
โ€ข APIโ€™s matter more than implementations
โ€ข There is plenty of room to innovate ahead of the community
โ€ข Posix, HDFS, HBASE all define useful APIโ€™s
โ€ข Kafka 0.9+ does the same
ยฉ 2014 MapR Technologies 55
Call to action:
Support the Kafka APIโ€™s
ยฉ 2014 MapR Technologies 56
Call to action:
Support the Kafka APIโ€™s
And come by the MapR booth
to check out MapR Streams
ยฉ 2014 MapR Technologies 57
ยฉ 2014 MapR Technologies 58
Short Books by Ted Dunning & Ellen Friedman
โ€ข Published by Oโ€™Reilly in 2014 - 2016
โ€ข For sale from Amazon or Oโ€™Reilly
โ€ข Free e-books currently available courtesy of MapR
http://bit.ly/ebook-real-
world-hadoop
http://bit.ly/mapr-tsdb-
ebook
http://bit.ly/ebook-
anomaly
http://bit.ly/recommend
ation-ebook
ยฉ 2014 MapR Technologies 59
Streaming Architecture
by Ted Dunning and Ellen Friedman ยฉ 2016 (published by Oโ€™Reilly)
Free copies at book
signing today
http://bit.ly/mapr-ebook-streams
ยฉ 2014 MapR Technologies 60
Thank You!
ยฉ 2014 MapR Technologies 61
Q&A
@mapr maprtech
tdunning@maprtech.com
Engage with us!
MapR
maprtech
mapr-technologies

More Related Content

What's hot

Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
ย 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
ย 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
ย 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache MahoutTed Dunning
ย 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveTed Dunning
ย 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
ย 
T digest-update
T digest-updateT digest-update
T digest-updateTed Dunning
ย 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
ย 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
ย 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsMapR Technologies
ย 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
ย 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
ย 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
ย 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
ย 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
ย 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
ย 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
ย 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationTed Dunning
ย 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
ย 

What's hot (20)

Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
ย 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
ย 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
ย 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
ย 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
ย 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
ย 
T digest-update
T digest-updateT digest-update
T digest-update
ย 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
ย 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
ย 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
ย 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
ย 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
ย 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
ย 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
ย 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
ย 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
ย 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
ย 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
ย 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
ย 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
ย 

Similar to Real time-hadoop

Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
ย 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
ย 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series DatabaseDataWorks Summit
ย 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down InternetMapR Technologies
ย 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
ย 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise ArchitectureMapR Technologies
ย 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningJohn Mulhall
ย 
Zeta architecture - Hive London May15
Zeta architecture - Hive London May15Zeta architecture - Hive London May15
Zeta architecture - Hive London May15MapR Technologies
ย 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
ย 
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningBuilding HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningMapR Technologies
ย 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
ย 
Ted Dunning โ€“ Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning โ€“ Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning โ€“ Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning โ€“ Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
ย 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureMapR Technologies
ย 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
ย 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
ย 
Zeta architecture -2015
Zeta architecture -2015Zeta architecture -2015
Zeta architecture -2015MapR Technologies
ย 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
ย 
Real Time and Big Data โ€“ Itโ€™s About Time
Real Time and Big Data โ€“ Itโ€™s About TimeReal Time and Big Data โ€“ Itโ€™s About Time
Real Time and Big Data โ€“ Itโ€™s About TimeMapR Technologies
ย 
Real Time and Big Data โ€“ Itโ€™s About Time
Real Time and Big Data โ€“ Itโ€™s About TimeReal Time and Big Data โ€“ Itโ€™s About Time
Real Time and Big Data โ€“ Itโ€™s About TimeDataWorks Summit
ย 

Similar to Real time-hadoop (20)

Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
ย 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
ย 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
ย 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
ย 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
ย 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
ย 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
ย 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
ย 
Zeta architecture - Hive London May15
Zeta architecture - Hive London May15Zeta architecture - Hive London May15
Zeta architecture - Hive London May15
ย 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
ย 
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningBuilding HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
ย 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
ย 
Ted Dunning โ€“ Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning โ€“ Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning โ€“ Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning โ€“ Very High Bandwidth Time Series Database Implementation - NoSQL...
ย 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
ย 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
ย 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
ย 
Zeta architecture -2015
Zeta architecture -2015Zeta architecture -2015
Zeta architecture -2015
ย 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
ย 
Real Time and Big Data โ€“ Itโ€™s About Time
Real Time and Big Data โ€“ Itโ€™s About TimeReal Time and Big Data โ€“ Itโ€™s About Time
Real Time and Big Data โ€“ Itโ€™s About Time
ย 
Real Time and Big Data โ€“ Itโ€™s About Time
Real Time and Big Data โ€“ Itโ€™s About TimeReal Time and Big Data โ€“ Itโ€™s About Time
Real Time and Big Data โ€“ Itโ€™s About Time
ย 

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
ย 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
ย 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
ย 
Anomaly Detection: How to find what you didnโ€™t know to look for
Anomaly Detection: How to find what you didnโ€™t know to look forAnomaly Detection: How to find what you didnโ€™t know to look for
Anomaly Detection: How to find what you didnโ€™t know to look forTed Dunning
ย 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
ย 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
ย 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
ย 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
ย 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
ย 

More from Ted Dunning (9)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
ย 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
ย 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
ย 
Anomaly Detection: How to find what you didnโ€™t know to look for
Anomaly Detection: How to find what you didnโ€™t know to look forAnomaly Detection: How to find what you didnโ€™t know to look for
Anomaly Detection: How to find what you didnโ€™t know to look for
ย 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
ย 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
ย 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
ย 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
ย 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
ย 

Recently uploaded

CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
ย 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
ย 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
ย 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
ย 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)
ย 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
ย 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
ย 
Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...Steffen Staab
ย 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
ย 
call girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
call girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธcall girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
call girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธDelhi Call girls
ย 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
ย 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
ย 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
ย 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
ย 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
ย 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
ย 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธanilsa9823
ย 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
ย 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...OnePlan Solutions
ย 

Recently uploaded (20)

CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
ย 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
ย 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
ย 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
ย 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
ย 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
ย 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
ย 
Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...
ย 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
ย 
call girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
call girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธcall girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
call girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
ย 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
ย 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
ย 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
ย 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
ย 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ย 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
ย 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
ย 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
ย 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
ย 

Real time-hadoop

  • 1. ยฉ 2014 MapR Technologies 1ยฉ 2014 MapR Technologies
  • 2. ยฉ 2014 MapR Technologies 2 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apacheโ€™s Drill, Zookeeper & others VP of Incubator at Apache Foundation Email tdunning@apache.org tdunning@maprtech.com Twitter @ted_dunning Hashtags today: #stratahadoop #ojai
  • 3. ยฉ 2014 MapR Technologies 3 Donโ€™t Miss These โ€ข Just-in-time optimizing a database โ€“ Me! at 4:20 PM, Room 230 C, today โ€ข Why flow instead of state? โ€“ Me! at 5:10 PM, Room 210 D/H, today โ€ข High Frequency Decisioning โ€“ Jack Norris! at 11:00 PM, Room 210 B/F, tomorrow โ€ข Threat detection on streaming data โ€“ Carol Macdonald! at 3:45 PM, Solutions Theater, tomorrow โ€ข Scaling Your Business โ€ฆ Zeta Architecture โ€“ Jim Scott! at 5:10 PM, Room 210 D/H, tomorrow
  • 4. ยฉ 2014 MapR Technologies 4 And Also, a Little Fun Come jam with us The Big Data Boys and the Real-time Stream Band 5:50 PM, MapR booth, today
  • 5. ยฉ 2014 MapR Technologies 5 Goals โ€ข Real-time or near-time โ€“ Includes situations with deadlines โ€“ Also includes situations where delay is simply undesirable โ€“ Even includes situations where delay is just fine โ€ข Micro-services โ€“ Streaming is a convenient idiom for design โ€“ Micro-services โ€ฆ you know we wanted it โ€“ Service isolation is a key requirement
  • 6. ยฉ 2014 MapR Technologies 6 Real-time or Near-time? โ€ข The real point is flow versus state (see talk later today) โ€ข One consequence of flow-based computing is real-time and near-time become relatively easy โ€ข Life may be a bitch, but it doesnโ€™t happen in batches!
  • 7. ยฉ 2014 MapR Technologies 8 Agenda โ€ข Background / micro-services โ€ข Global requirements โ€ข Scale
  • 8. ยฉ 2014 MapR Technologies 9 A microservice is loosely coupled with bounded context
  • 9. ยฉ 2014 MapR Technologies 10 How to Couple Services and Break micro-ness โ€ข Shared schemas, relational stores โ€ข Ad hoc communication between services โ€ข Enterprise service busses โ€ข Brittle protocols โ€ข Poor protocol versioning Donโ€™t do this!
  • 10. ยฉ 2014 MapR Technologies 11 How to Decouple Services โ€ข Use self-describing data โ€ข Private databases โ€ข Infrastructural communication between services โ€ข Use modern protocols โ€ข Adopt future-proof protocol practices โ€ข Use shared storage where necessary due to scale
  • 11. ยฉ 2014 MapR Technologies 13 What is the Right Structure for Flow Compute? โ€ข Traditional message queues? โ€“ Message queues are classic answer โ€“ Key feature/bug is out-of-order acknowledgement โ€“ Many implementations โ€“ You pay a huge performance hit for persistence โ€ข Kafka-esque Logs? โ€“ Logs are like queues, but with ordering โ€“ Out of order consumption is possible, acknowledgement not so much โ€“ Canonical base implementation is Kafka โ€“ Performance plus persistence
  • 12. ยฉ 2014 MapR Technologies 14 Scenarios Profile Database
  • 13. ยฉ 2014 MapR Technologies 15 The task ? POS 1 location, t, card # yes/no? POS 2 location, t, card # yes/no?
  • 14. ยฉ 2014 MapR Technologies 16 Traditional Solution POS 1..n Fraud detector Last card use
  • 15. ยฉ 2014 MapR Technologies 17 What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
  • 16. ยฉ 2014 MapR Technologies 18 What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
  • 17. ยฉ 2014 MapR Technologies 19 How to Get Service Isolation POS 1..n Fraud detector Last card use Updater card activity
  • 18. ยฉ 2014 MapR Technologies 20 New Uses of Data POS 1..n Fraud detector Last card use Updater Card location history Other card activity
  • 19. ยฉ 2014 MapR Technologies 21 Scaling Through Isolation POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector
  • 20. ยฉ 2014 MapR Technologies 22 Lessons โ€ข De-coupling and isolation are key โ€ข Private data stores/tables are important, โ€“ but local storage of private data is a bug โ€ข Propagate events, not table updates
  • 21. ยฉ 2014 MapR Technologies 23 Scenarios IoT Data Aggregation
  • 22. ยฉ 2014 MapR Technologies 24 Basic Situation Each location has many pumps pump data Multiple locations
  • 23. ยฉ 2014 MapR Technologies 25 What Does a Pump Look Like inlet out let m ot or Temperature Pressure Flow Temperature Pressure Flow Winding temperature Voltage Current
  • 24. ยฉ 2014 MapR Technologies 26 Basic Situation Each location has many pumps pump data Multiple locations
  • 25. ยฉ 2014 MapR Technologies 27 pump data pump data pump data pump data Basic Architecture Reflects Business Structure
  • 26. ยฉ 2014 MapR Technologies 28 Lessons โ€ข Data architecture should reflect business structure โ€ข Even very modest designs involve multiple data centers โ€ข Schemas cannot be frozen in the real world โ€ข Security must follow data ownership
  • 27. ยฉ 2014 MapR Technologies 29 Scenarios Global Data Recovery
  • 28. ยฉ 2014 MapR Technologies 30 Tokyo Corporate HQ
  • 29. ยฉ 2014 MapR Technologies 31 Singapore Tokyo Corporate HQ
  • 30. ยฉ 2014 MapR Technologies 32 Singapore Tokyo Corporate HQ
  • 31. ยฉ 2014 MapR Technologies 33 Singapore Tokyo Corporate HQ
  • 32. ยฉ 2014 MapR Technologies 34 Lessons โ€ข Arbitrary number of topics important for simplicity + performance โ€ข Updates happen in many places โ€ข Mobility implies change in replication patterns โ€ข Multi-master updates simplify design massively
  • 33. ยฉ 2014 MapR Technologies 35 Converged Requirements
  • 34. ยฉ 2014 MapR Technologies 36 What Have We Learned? โ€ข Need persistence and performance โ€“ Possibly for years and to 100โ€™s of millions t/s โ€ข Must have convergence โ€“ Need files, tables AND streams โ€“ Need volumes, snapshots, mirrors, permissions and โ€ฆ โ€ข Must have platform security โ€“ Cannot depend on perimeter โ€“ Must follow business structure โ€ข Must have global scale and scope โ€“ Millions of topics for natural designs โ€“ Multi-master replication and update
  • 35. ยฉ 2014 MapR Technologies 37 The Importance of Common APIโ€™s โ€ข Commonality and interoperability are critical โ€“ Compare Hadoop eco-system and the noSQL world โ€ข Table stakes โ€“ Persistence โ€“ Performance โ€“ Polymorphism โ€ข Major trend so far is to adopt Kafka API โ€“ 0.9 API and beyond remove major abstraction leaks โ€“ Kafka API supported by all major Hadoop vendors
  • 36. ยฉ 2014 MapR Technologies 38 What we do
  • 37. ยฉ 2014 MapR Technologies 39 Evolution of Data Storage Functionality Compatibility Scalability Linux POSIX Over decades of progress, Unix-based systems have set the standard for compatibility and functionality
  • 38. ยฉ 2014 MapR Technologies 40 Functionality Compatibility Scalability Linux POSIX Hadoop Hadoop achieves much higher scalability by trading away essentially all of this compatibility Evolution of Data Storage
  • 39. ยฉ 2014 MapR Technologies 41 Evolution of Data Storage Functionality Compatibility Scalability Linux POSIX Hadoop MapR enhanced Apache Hadoop by restoring the compatibility while increasing scalability and performance Functionality Compatibility Scalability POSIX
  • 40. ยฉ 2014 MapR Technologies 42 Functionality Compatibility Scalability Linux POSIX Hadoop Evolution of Data Storage Adding tables and streams enhances the functionality of the base file system
  • 41. ยฉ 2014 MapR Technologies 43 http://bit.ly/fastest-big-data
  • 42. ยฉ 2014 MapR Technologies 44 How we do this with MapR โ€ข MapR Streams is a C++ reimplementation of Kafka API โ€“ Advantages in predictability, performance, scale โ€“ Common security and permissions with entire MapR converged data platform โ€ข Semantic extensions โ€“ A cluster contains volumes, files, tables โ€ฆ and now streams โ€“ Streams contain topics โ€“ Can have default stream or can name stream by path name โ€ข Core MapR capabilities preserved โ€“ Consistent snapshots, mirrors, multi-master replication
  • 43. ยฉ 2014 MapR Technologies 45 MapR core Innovations โ€ข Volumes โ€“ Distributed management โ€“ Data placement โ€ข Read/write random access file system โ€“ Allows distributed meta-data โ€“ Improved scaling โ€“ Enables NFS access โ€ข Application-level NIC bonding โ€ข Transactionally correct snapshots and mirrors
  • 44. ยฉ 2014 MapR Technologies 46 MapR's Containers ๏ฌ Each container contains ๏ฌ Directories & files ๏ฌ Data blocks ๏ฌ Replicated on servers ๏ฌ No need to manage directly Files/directories are sharded into blocks, which are placed into containers on disks Containers are 16- 32 GB segments of disk, placed on nodes
  • 45. ยฉ 2014 MapR Technologies 47 MapR's Containers ๏ฌ Each container has a replication chain ๏ฌ Updates are transactional ๏ฌ Failures are handled by rearranging replication
  • 46. ยฉ 2014 MapR Technologies 48 Container locations and replication CLDB N1, N2 N3, N2 N1, N2 N1, N3 N3, N2 N1 N2 N3Container location database (CLDB) keeps track of nodes hosting each container and replication chain order
  • 47. ยฉ 2014 MapR Technologies 49 MapR Scaling Containers represent 16 - 32GB of data ๏ฌ Each can hold up to 1 Billion files and directories ๏ฌ 100M containers = ~ 2 Exabytes (a very large cluster) 250 bytes DRAM to cache a container ๏ฌ 25GB to cache all containers for 2EB cluster ๏€ญ But not necessary, can page to disk ๏ฌ Typical large 10PB cluster needs 2GB Container-reports are 100x - 1000x < HDFS block-reports ๏ฌ Serve 100x more data-nodes ๏ฌ Increase container size to 64G to serve 4EB cluster ๏ฌ Map/reduce not affected
  • 48. ยฉ 2014 MapR Technologies 50 But Wait, Thereโ€™s More โ€ข Directories and files are implemented in terms of B-trees โ€“ Key is offset, value is data blob โ€“ Internal transactional semantics guarantees safety and consistency โ€“ Layout algorithms give very high layout linearization โ€ข Tables are implemented in terms of B-trees โ€“ Twisted B-tree implementation allows virtues of log-structured merge tree without the compaction delays โ€“ Tablet splitting without pausing, integration with file system transactions โ€ข Common security and permissions scheme
  • 49. ยฉ 2014 MapR Technologies 51 And More โ€ฆ โ€ข Streams are implemented in terms of B-trees as well โ€“ Topics and consumer offsets are kept in stream, not ZK โ€“ Similar splitting technology as MapR DB tables โ€“ Consistent permissions, security, data replication โ€ข Standard Kafka 0.9 API โ€ข Plans to add OJAI for high-level structuring โ€ข Performance is very high
  • 50. ยฉ 2014 MapR Technologies 52 Example Files Table Streams Directories Cluster Volume mount point
  • 51. ยฉ 2014 MapR Technologies 53 Cluster Volume mount point
  • 52. ยฉ 2014 MapR Technologies 54 Lessons โ€ข APIโ€™s matter more than implementations โ€ข There is plenty of room to innovate ahead of the community โ€ข Posix, HDFS, HBASE all define useful APIโ€™s โ€ข Kafka 0.9+ does the same
  • 53. ยฉ 2014 MapR Technologies 55 Call to action: Support the Kafka APIโ€™s
  • 54. ยฉ 2014 MapR Technologies 56 Call to action: Support the Kafka APIโ€™s And come by the MapR booth to check out MapR Streams
  • 55. ยฉ 2014 MapR Technologies 57
  • 56. ยฉ 2014 MapR Technologies 58 Short Books by Ted Dunning & Ellen Friedman โ€ข Published by Oโ€™Reilly in 2014 - 2016 โ€ข For sale from Amazon or Oโ€™Reilly โ€ข Free e-books currently available courtesy of MapR http://bit.ly/ebook-real- world-hadoop http://bit.ly/mapr-tsdb- ebook http://bit.ly/ebook- anomaly http://bit.ly/recommend ation-ebook
  • 57. ยฉ 2014 MapR Technologies 59 Streaming Architecture by Ted Dunning and Ellen Friedman ยฉ 2016 (published by Oโ€™Reilly) Free copies at book signing today http://bit.ly/mapr-ebook-streams
  • 58. ยฉ 2014 MapR Technologies 60 Thank You!
  • 59. ยฉ 2014 MapR Technologies 61 Q&A @mapr maprtech tdunning@maprtech.com Engage with us! MapR maprtech mapr-technologies