SlideShare a Scribd company logo
1 of 16
Download to read offline
Architecture for
Real-Time and Batch
Big-Data Analytics
The World's leading Mobile
Measurement Platform
Founded in 2011 by Reshef Mann
and Oren Kaniel
Just completed Round B funding -
total of $28M
Processing 3.5B daily events
(it was 1.9B just 2 months ago and
250M at the start of 2014!)
13 people in the development team
(we were just 6 people 12 months ago!)
AppsFlyer
Who?!
The Way We Were
•	We had no real concept of “Big Data”- we were
just occupied with making the system work
•	 Even though we were ignorant of the future, we
tried to adhere to a few abstractions:
- Small isolated services
- Central concept of message delivery via
- Different tech for different tasks
A message-bus
•	 CouchDB that served raw reports
via views couldn't keep up with view
generation
•	 Python processes that read from the
message bus (via pub sub) couldn't
keep up with the amount of data
•	 The split between aggregated and
raw data was good, but caused
discrepancies because each service
failed at a different time
•	 If the message bus failed (Redis), all
other services were also in a fail state –
single point of failure
First Creaks
In The
System
Part 1 of The Solution
•	Migrate raw reports from CouchDB to Google's
Bigquery (easiest solution at the time)
•	Rewrite some of the Python services in a new
language that:
– Deals better with strings and allocation of memory
– Can help us scale out
– Has a great ecosystem
– Functional
WHY FUNCTIONAL?!
Why Functional?
Why Clojure?
Sequence based processing
capabilities really fit in the
visualized data flow of
AppsFlyer (processing the
Event Stream)
Enforces use of
FP paradigm more
strictly than Scala
Repl based
development
Easy and
common Java
interop
JVM!
•	Python's proprietary serialization
•	Python custom data structure as the
base message in the system
The Hurdles
(or how 2 stupid mistakes can bite
you in the ass 2 years down the road)
Part 2
of The Solution
The Event
Stream
Small isolated services
Each service has a single business
responsibility
Each service encapsulates its own data
(if it has any) and he exposes it over a
well known interface
Data objects are always POCO/POJO
(simple data structures represented in
JSON or EDN)
Preference for queues and buffers
that pass isolated data for total async
processing
How We
Model
How We
Test
No QA team
Each new service can read from the
event stream to its heart content -
regular Kafka consumers behavior
Each new service handles real life
traffic and real life load because it's
connected to the event stream
Test DB if needed is easy to spin up on
the cloud
Once deemed ready, just throw the
switch to on
Service discovery
via Consul
How We
Orchestrate
MesosEach service is a
single uberjar in
a single docker
container
DBs get their
own dedicated
machines
Preference for ring
based architecture
for DBs
Marathon
Jenkins build server
Deployment via the in-house
Santa tool
Consul health checks
Statsd for the JVM and
application metrics
Sensu
End-to-end flows
Deployment
and
Monitoring
The Bird's Eye View

More Related Content

What's hot

Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientistsdatamantra
 
How To Buy Data Warehouse
How To Buy Data WarehouseHow To Buy Data Warehouse
How To Buy Data WarehouseEric Sun
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseRob Winters
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019Steven Moy
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataRob Winters
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshConfluentInc1
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Jordan Chung
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise ArchitectureMapR Technologies
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Dataconomy Media
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data IntegrationsPat Patterson
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 

What's hot (20)

Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
 
How To Buy Data Warehouse
How To Buy Data WarehouseHow To Buy Data Warehouse
How To Buy Data Warehouse
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big Data
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Data Federation
Data FederationData Federation
Data Federation
 

Viewers also liked

The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 
Top Agile Metrics
Top Agile MetricsTop Agile Metrics
Top Agile MetricsXBOSoft
 
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...Enrico Palumbo
 
Big Data Analytics: Architectural Perspective
Big Data Analytics: Architectural PerspectiveBig Data Analytics: Architectural Perspective
Big Data Analytics: Architectural PerspectiveSumit Kalra
 
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAst 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAccenture
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Real time data analytics - part 1 - backend infrastructure
Real time data analytics - part 1 - backend infrastructureReal time data analytics - part 1 - backend infrastructure
Real time data analytics - part 1 - backend infrastructureAmazon Web Services
 
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Neil Andrassy
 
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service Stefan Schwarz
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Tin Ho
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudDataWorks Summit
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageSandeep Patil
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflixCody Rioux
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...DB Tsai
 

Viewers also liked (20)

The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Top Agile Metrics
Top Agile MetricsTop Agile Metrics
Top Agile Metrics
 
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
 
Big Data Analytics: Architectural Perspective
Big Data Analytics: Architectural PerspectiveBig Data Analytics: Architectural Perspective
Big Data Analytics: Architectural Perspective
 
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAst 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Real time data analytics - part 1 - backend infrastructure
Real time data analytics - part 1 - backend infrastructureReal time data analytics - part 1 - backend infrastructure
Real time data analytics - part 1 - backend infrastructure
 
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
 
963
963963
963
 
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better Storage
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflix
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 

Similar to Architecture for Real-Time and Batch Big Data Analytics

ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011marpierc
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...confluent
 
Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)Emmanuel Olowosulu
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Societyconfluent
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streamsconfluent
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streamingconfluent
 
Backbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPBackbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPMax Neunhöffer
 
Amplitude wave architecture - Test
Amplitude wave architecture - TestAmplitude wave architecture - Test
Amplitude wave architecture - TestKiran Naiga
 
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...HostedbyConfluent
 
Melbourne Microservices Meetup: Agenda for a new Architecture
Melbourne Microservices Meetup: Agenda for a new ArchitectureMelbourne Microservices Meetup: Agenda for a new Architecture
Melbourne Microservices Meetup: Agenda for a new ArchitectureSaul Caganoff
 
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce ijujournal
 
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCEHMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCEijujournal
 
Asynchronous micro-services and the unified log
Asynchronous micro-services and the unified logAsynchronous micro-services and the unified log
Asynchronous micro-services and the unified logAlexander Dean
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSylvain Kalache
 
apidays Australia 2023 - The Playful Bond Between REST And Data Streams, Warr...
apidays Australia 2023 - The Playful Bond Between REST And Data Streams, Warr...apidays Australia 2023 - The Playful Bond Between REST And Data Streams, Warr...
apidays Australia 2023 - The Playful Bond Between REST And Data Streams, Warr...apidays
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 confluent
 

Similar to Architecture for Real-Time and Batch Big Data Analytics (20)

ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 
Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streaming
 
Backbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPBackbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTP
 
Streaming analytics
Streaming analyticsStreaming analytics
Streaming analytics
 
Amplitude wave architecture - Test
Amplitude wave architecture - TestAmplitude wave architecture - Test
Amplitude wave architecture - Test
 
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
 
Melbourne Microservices Meetup: Agenda for a new Architecture
Melbourne Microservices Meetup: Agenda for a new ArchitectureMelbourne Microservices Meetup: Agenda for a new Architecture
Melbourne Microservices Meetup: Agenda for a new Architecture
 
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
 
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCEHMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
 
Asynchronous micro-services and the unified log
Asynchronous micro-services and the unified logAsynchronous micro-services and the unified log
Asynchronous micro-services and the unified log
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
 
apidays Australia 2023 - The Playful Bond Between REST And Data Streams, Warr...
apidays Australia 2023 - The Playful Bond Between REST And Data Streams, Warr...apidays Australia 2023 - The Playful Bond Between REST And Data Streams, Warr...
apidays Australia 2023 - The Playful Bond Between REST And Data Streams, Warr...
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
 
Event Driven Architecture
Event Driven ArchitectureEvent Driven Architecture
Event Driven Architecture
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Architecture for Real-Time and Batch Big Data Analytics

  • 1. Architecture for Real-Time and Batch Big-Data Analytics
  • 2. The World's leading Mobile Measurement Platform Founded in 2011 by Reshef Mann and Oren Kaniel Just completed Round B funding - total of $28M Processing 3.5B daily events (it was 1.9B just 2 months ago and 250M at the start of 2014!) 13 people in the development team (we were just 6 people 12 months ago!) AppsFlyer Who?!
  • 3. The Way We Were • We had no real concept of “Big Data”- we were just occupied with making the system work • Even though we were ignorant of the future, we tried to adhere to a few abstractions: - Small isolated services - Central concept of message delivery via - Different tech for different tasks A message-bus
  • 4. • CouchDB that served raw reports via views couldn't keep up with view generation • Python processes that read from the message bus (via pub sub) couldn't keep up with the amount of data • The split between aggregated and raw data was good, but caused discrepancies because each service failed at a different time • If the message bus failed (Redis), all other services were also in a fail state – single point of failure First Creaks In The System
  • 5. Part 1 of The Solution • Migrate raw reports from CouchDB to Google's Bigquery (easiest solution at the time) • Rewrite some of the Python services in a new language that: – Deals better with strings and allocation of memory – Can help us scale out – Has a great ecosystem – Functional
  • 8. Why Clojure? Sequence based processing capabilities really fit in the visualized data flow of AppsFlyer (processing the Event Stream) Enforces use of FP paradigm more strictly than Scala Repl based development Easy and common Java interop JVM!
  • 9. • Python's proprietary serialization • Python custom data structure as the base message in the system The Hurdles (or how 2 stupid mistakes can bite you in the ass 2 years down the road)
  • 10. Part 2 of The Solution
  • 12. Small isolated services Each service has a single business responsibility Each service encapsulates its own data (if it has any) and he exposes it over a well known interface Data objects are always POCO/POJO (simple data structures represented in JSON or EDN) Preference for queues and buffers that pass isolated data for total async processing How We Model
  • 13. How We Test No QA team Each new service can read from the event stream to its heart content - regular Kafka consumers behavior Each new service handles real life traffic and real life load because it's connected to the event stream Test DB if needed is easy to spin up on the cloud Once deemed ready, just throw the switch to on
  • 14. Service discovery via Consul How We Orchestrate MesosEach service is a single uberjar in a single docker container DBs get their own dedicated machines Preference for ring based architecture for DBs Marathon
  • 15. Jenkins build server Deployment via the in-house Santa tool Consul health checks Statsd for the JVM and application metrics Sensu End-to-end flows Deployment and Monitoring