SlideShare a Scribd company logo
1 of 20
Narayan Kumar
Software Consultant
Knoldus Software LLP
Lambda Architecture with Spark
AgendaAgenda
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
AgendaAgenda
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
“ Lambda architecture is a data-processing architecture designed to
handle massive quantities of data by taking advantage of both batch-
and stream-processing methods.”
wikipedia
“ Lambda architecture is a data-processing architecture designed to
handle massive quantities of data by taking advantage of both batch-
and stream-processing methods.”
wikipedia
What is Lambda Architecture ?What is Lambda Architecture ?
Coined by Nathan marz
➢ Ex- Twitter Engineer
➢ Creator of Apache Storm
AgendaAgenda
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
Components of Lambda ArchitectureComponents of Lambda Architecture
Lambda architecture broadly classified into three layer :-
➢ Batch Layer
➢ Speed Layer
➢ Serving Layer
Overview of Lambda ArchitectureOverview of Lambda Architecture
https://www.mapr.com/developercentral/lambda-architecture
Batch LayerBatch Layer
In the Lambda Architecture, the batch layer precomputes the
master dataset into batch views so that queries can be resolved
with low latency.
Master DataSetMaster DataSet
The master dataset is the source of truth in the Lambda Archi-
tecture. Even if you were to lose all your serving layer datasets
and speed layer datasets, you could reconstruct your
application from the master dataset.
Data in master dataset must hold three properties :-
➢ Data is raw
➢ Data is immutable
➢ Data is eternally true
Computing functions on the batch layerComputing functions on the batch layer
As our master dataset is continually growing, we must have a
strategy for updating our batch views when new data becomes
available.
Here we have two suitable computing algorithm :-
➢ Recomputation algorithms : Throwing away the old batch views
and recomputing functions over the entire master dataset.
➢ Incremental algorithms : An incremental algorithm will update the
views directly when new data arrives.
Speed LayerSpeed Layer
There are two major facets of the speed layer: storing the
realtime views and processing the incoming data stream so as
to update those views.
Storing real time viewsStoring real time views
The underlying storage layer must meet the following requirements: -
Random reads : A realtime view should support fast random reads to
answer queries quickly.
Random writes : To support incremental algorithms, it must also be
possible to modify a realtime view with low latency.
Scalability : As with the serving layer views, the realtime views should
scale with the amount of data they store and the read/write rates required
by the application.
Fault tolerance : If a disk or a machine crashes, a realtime view should
continue to function normally.
Serving LayerServing Layer
In the Lambda Architecture, the serving layer provides low-latency
access to the results of calculations performed on the master
dataset. The serving layer views are slightly out of date due to the
time required for batch computation.
Requirements for a serving layer databaseRequirements for a serving layer database
Similar to speed layer these are following requirements: -
Random reads : A serving layer database must support random reads,
with indexes providing direct access to small portions of the view.
Batch writable : The batch views for a serving layer are produced from
scratch. When a new version of a view becomes available, it must be
possible to completely swap out the older version with the updated view.
Scalability : A serving layer database must be capable of handling views
of arbitrary size.
Fault tolerance : Because a serving layer database is distributed, it must
be tolerant of machine failures.
AgendaAgenda
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
Advantages of Lambda ArchitectureAdvantages of Lambda Architecture
These are following advantages of lambda architecture: -
Human fault tolerance : LA is provides human fault tolerance capability
to the Big data system.
Operational complexity : It resolved operational complexity issue of big
historical query by divide into precomputed query and on fly query.
Resilience : LA is fully resilience,because it is difficult for human errors or
hardware faults to corrupt data stored in the system since the system does
not allow update or delete operations in existing data.
Simple & Maintainable : It is simple in nature so we can easily
understand and it’s flexible architecture is helpful in maintainance.
AgendaAgenda
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
Implementation with Spark and it's BenefitsImplementation with Spark and it's Benefits
There are following benefits to implement LA with Spark : -
➢ Spark gave us unified stack like Spark Core,Spark SQL,Spark
Streaming,Mllib, and GraphX, so that we can easily implement LA.
➢ Spark has clean and easy-to-use APIs (far more readable and with
less boilerplate code than MapReduce).
➢ Biggest advantage Spark gave us in this case is Spark Streaming,
which allowed us to re-use the same aggregates we wrote for our
batch application on a real-time data stream.
ReferencesReferences
Big Data Principles and best practices of scalable real-time data systems
Nathan Marz WITH James Warren
https://en.wikipedia.org/wiki/Lambda_architecture
https://www.mapr.com/developercentral/lambda-architecture
http://lambda-architecture.net/
Thank youThank you

More Related Content

What's hot

Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
 
Twitter case study
Twitter case studyTwitter case study
Twitter case studydivya_binu
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnMike Frampton
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Introduction to jenkins
Introduction to jenkinsIntroduction to jenkins
Introduction to jenkinsAbe Diaz
 
Mobile Information Architecture
Mobile Information ArchitectureMobile Information Architecture
Mobile Information ArchitectureLifna C.S
 
Bringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesBringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesIvan Kruglov
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
seminar presentation on apache-spark
seminar presentation on apache-sparkseminar presentation on apache-spark
seminar presentation on apache-sparkJawhar Ali
 
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...confluent
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 

What's hot (20)

Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Twitter case study
Twitter case studyTwitter case study
Twitter case study
 
Raskar 2012, Idea Hexagon
Raskar 2012, Idea HexagonRaskar 2012, Idea Hexagon
Raskar 2012, Idea Hexagon
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
 
Devops | CICD Pipeline
Devops | CICD PipelineDevops | CICD Pipeline
Devops | CICD Pipeline
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Introduction to jenkins
Introduction to jenkinsIntroduction to jenkins
Introduction to jenkins
 
Mobile Information Architecture
Mobile Information ArchitectureMobile Information Architecture
Mobile Information Architecture
 
Bringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesBringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searches
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Twitter Big Data
Twitter Big DataTwitter Big Data
Twitter Big Data
 
CICD with Jenkins
CICD with JenkinsCICD with Jenkins
CICD with Jenkins
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
seminar presentation on apache-spark
seminar presentation on apache-sparkseminar presentation on apache-spark
seminar presentation on apache-spark
 
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 

Similar to Lambda Architecture with Spark

Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelGarindra Prahandono
 
Spark logs made easy
Spark logs made easySpark logs made easy
Spark logs made easySimona Meriam
 
Real time architecture big data
Real time architecture big dataReal time architecture big data
Real time architecture big dataSanjeev Solanki
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overviewKaran Alang
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3Databricks
 
Lambda usecase
Lambda usecaseLambda usecase
Lambda usecaseDavid Tung
 
2014 09-12 lambda-architecture-at-indix
2014 09-12 lambda-architecture-at-indix2014 09-12 lambda-architecture-at-indix
2014 09-12 lambda-architecture-at-indixYu Ishikawa
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosOpenSistemas
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for BeginnersAnirudh
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 Chester Chen
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseScyllaDB
 

Similar to Lambda Architecture with Spark (20)

Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 
Spark logs made easy
Spark logs made easySpark logs made easy
Spark logs made easy
 
Real time architecture big data
Real time architecture big dataReal time architecture big data
Real time architecture big data
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Lambda usecase
Lambda usecaseLambda usecase
Lambda usecase
 
2014 09-12 lambda-architecture-at-indix
2014 09-12 lambda-architecture-at-indix2014 09-12 lambda-architecture-at-indix
2014 09-12 lambda-architecture-at-indix
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectos
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
Spark
SparkSpark
Spark
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency Database
 

More from Knoldus Inc.

Supply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxSupply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxKnoldus Inc.
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingKnoldus Inc.
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionKnoldus Inc.
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxKnoldus Inc.
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptxKnoldus Inc.
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfKnoldus Inc.
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxKnoldus Inc.
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingKnoldus Inc.
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesKnoldus Inc.
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxKnoldus Inc.
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxKnoldus Inc.
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxKnoldus Inc.
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxKnoldus Inc.
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxKnoldus Inc.
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationKnoldus Inc.
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationKnoldus Inc.
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIsKnoldus Inc.
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II PresentationKnoldus Inc.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAKnoldus Inc.
 

More from Knoldus Inc. (20)

Supply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxSupply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptx
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On Introduction
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptx
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptx
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRA
 

Recently uploaded

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 

Recently uploaded (20)

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 

Lambda Architecture with Spark

  • 1. Narayan Kumar Software Consultant Knoldus Software LLP Lambda Architecture with Spark
  • 2. AgendaAgenda ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo
  • 3. AgendaAgenda ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo
  • 4. “ Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods.” wikipedia “ Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods.” wikipedia What is Lambda Architecture ?What is Lambda Architecture ? Coined by Nathan marz ➢ Ex- Twitter Engineer ➢ Creator of Apache Storm
  • 5. AgendaAgenda ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo
  • 6. Components of Lambda ArchitectureComponents of Lambda Architecture Lambda architecture broadly classified into three layer :- ➢ Batch Layer ➢ Speed Layer ➢ Serving Layer
  • 7. Overview of Lambda ArchitectureOverview of Lambda Architecture https://www.mapr.com/developercentral/lambda-architecture
  • 8. Batch LayerBatch Layer In the Lambda Architecture, the batch layer precomputes the master dataset into batch views so that queries can be resolved with low latency.
  • 9. Master DataSetMaster DataSet The master dataset is the source of truth in the Lambda Archi- tecture. Even if you were to lose all your serving layer datasets and speed layer datasets, you could reconstruct your application from the master dataset. Data in master dataset must hold three properties :- ➢ Data is raw ➢ Data is immutable ➢ Data is eternally true
  • 10. Computing functions on the batch layerComputing functions on the batch layer As our master dataset is continually growing, we must have a strategy for updating our batch views when new data becomes available. Here we have two suitable computing algorithm :- ➢ Recomputation algorithms : Throwing away the old batch views and recomputing functions over the entire master dataset. ➢ Incremental algorithms : An incremental algorithm will update the views directly when new data arrives.
  • 11. Speed LayerSpeed Layer There are two major facets of the speed layer: storing the realtime views and processing the incoming data stream so as to update those views.
  • 12. Storing real time viewsStoring real time views The underlying storage layer must meet the following requirements: - Random reads : A realtime view should support fast random reads to answer queries quickly. Random writes : To support incremental algorithms, it must also be possible to modify a realtime view with low latency. Scalability : As with the serving layer views, the realtime views should scale with the amount of data they store and the read/write rates required by the application. Fault tolerance : If a disk or a machine crashes, a realtime view should continue to function normally.
  • 13. Serving LayerServing Layer In the Lambda Architecture, the serving layer provides low-latency access to the results of calculations performed on the master dataset. The serving layer views are slightly out of date due to the time required for batch computation.
  • 14. Requirements for a serving layer databaseRequirements for a serving layer database Similar to speed layer these are following requirements: - Random reads : A serving layer database must support random reads, with indexes providing direct access to small portions of the view. Batch writable : The batch views for a serving layer are produced from scratch. When a new version of a view becomes available, it must be possible to completely swap out the older version with the updated view. Scalability : A serving layer database must be capable of handling views of arbitrary size. Fault tolerance : Because a serving layer database is distributed, it must be tolerant of machine failures.
  • 15. AgendaAgenda ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo
  • 16. Advantages of Lambda ArchitectureAdvantages of Lambda Architecture These are following advantages of lambda architecture: - Human fault tolerance : LA is provides human fault tolerance capability to the Big data system. Operational complexity : It resolved operational complexity issue of big historical query by divide into precomputed query and on fly query. Resilience : LA is fully resilience,because it is difficult for human errors or hardware faults to corrupt data stored in the system since the system does not allow update or delete operations in existing data. Simple & Maintainable : It is simple in nature so we can easily understand and it’s flexible architecture is helpful in maintainance.
  • 17. AgendaAgenda ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo
  • 18. Implementation with Spark and it's BenefitsImplementation with Spark and it's Benefits There are following benefits to implement LA with Spark : - ➢ Spark gave us unified stack like Spark Core,Spark SQL,Spark Streaming,Mllib, and GraphX, so that we can easily implement LA. ➢ Spark has clean and easy-to-use APIs (far more readable and with less boilerplate code than MapReduce). ➢ Biggest advantage Spark gave us in this case is Spark Streaming, which allowed us to re-use the same aggregates we wrote for our batch application on a real-time data stream.
  • 19. ReferencesReferences Big Data Principles and best practices of scalable real-time data systems Nathan Marz WITH James Warren https://en.wikipedia.org/wiki/Lambda_architecture https://www.mapr.com/developercentral/lambda-architecture http://lambda-architecture.net/

Editor's Notes

  1. 1The batch layer runs functions over the master dataset to precompute intermediate data called batch views. 2.The speed layer compensates for the high latency of the batch layer by providing low-latency updates using data that has yet to be precomputed into a batch view. 3.Queries are then satisfied by processing data from the serving layer views and the speed layer views, and merging the results.
  2. 1
  3. 1.The batch layer runs functions over the master dataset to precompute intermediate data called batch views. 2. Batch Layer has three component: 1Master data set:- which is immuatble and append only data set. 2 precomputing function : it is generally a map reduce function which operates on master data set and produce batch view.precomputing functions are use for high latency query like historical queries. 3 Batch View: It is a outcome of precomputed function
  4. 1.The master dataset is the only part of the Lambda Architecture that absolutely must be safeguarded from corruption. 2.Data is raw : When designing your Big Data system, you want to be able to answer as many questions as possible. To do so we need to store raw data in master dataset because if we store normalized data then we have to lose many facts of data but again it depends on use case ,what level of rawness of data we require . 3 Data is immutable : In immutability we can not update or delete data ,we can only append data in dataset. There are some vital advantages of it: a)Human-fault tolerance: if by mistake we added any bad data in dataset and after some time found this is bad we just remove this bad data and recompute on master data set. b)Simplicity:Immutable dataset is simple because it doesn’t required to store indexes like for mutable data. 4Data is eternally true: The key consequence of immutability is that each piece of data is true in perpetuity.That is, a piece of data, once true, must always be true. Immutability wouldn’t make sense without this property.
  5. 1.Performance: a)RA:Requires computational effort to process the entire master dataset. b)IA:Requires less computational resources but may generate much larger batch views. 2.Human-fault tolerance: a)RA:Extremely tolerant of human errors because the batch views are continually rebuilt. b)IA:Doesn’t facilitate repairing errors in the batch views; repairs are ad hoc and may require estimates. 3.Generality : a)RA: Complexity of the algorithm is addressed during precomputation, resulting in simple batch views and low-latency, on-the-fly processing. b)IA:Requires special tailoring; may shift complexity to on-the-fly query processing. 4.Conclusion: So conclusion of both algorithm is. a)RA:Essential to supporting a robust data-processing system. b)IA:Can increase the efficiency of your system, but only as a supplement to recomputation algorithms.
  6. 1. As we know that the power of the Lambda Architecture lies in the separation of roles in the different layers. 2.Speed layer is one of core layer in this architecture.It fills the delta gap that is left by batch layer.that means combine speed layer view and batch view give us capibility fire any adhoc query on all data that is query=function(over all data). 3.We can also set Expiring realtime views example in memcache we set expiring time for key/value pairs.
  7. 1. Random reads: This means the data it contains must be indexed. 2 scalability: Typically this implies that realtime views can be distributed across many machines. Now days sharding technique is widely use to meet scalability requirement in database. 3. Fault tolerance : Fault tolerance is accomplished by replicating data across machines so there are backups should a single machine fail.
  8. 1. But this is not a concern, because the speed layer will be responsible for any data not yet available in the serving layer. 2.We can also write on fly query function in serving layer to give low latency query result.
  9. 1.Human fault tolerance: a)If bug in batch job:Discard batch view and recompute it. b) If bug in master data then just discard buggy data and re-process on old data.master dataset is immutable and append only dataset so we can easily discard buggy data. c)If bug in query then re-deploy query layer. 2. In lambda Architecture we can use different alogorithm in each layar, like in batch layer we use Exact seach algorthm and in speed layer we can use Approximate seach algo. 3)Under the Lambda Architecture, results from the batch layer are always eventually consistent. As soon as a fresh batch update is completed, results from the batch layer are consistent.
  10. 1.Spark also provides in built common ML algorithms such as classification, regression, clustering, and collaborative filtering. 3.We didn’t need to re-implement the business logic, nor test and maintain a second code base.  4.