SlideShare a Scribd company logo
1 of 32
How to extract valuable
information from real-
time data feeds
Gene Leybzon, February 2016
“The critical challenge is using
this data when it is still in
motion – and extracting
valuable information from it.”
- Frédéric Combaneyre, SAS
IoT Challenge
 Detect events of interest and trigger appropriate
actions
 Aggregate information for monitoring
 Sensor data cleansing and validation
 Real-time predictive and optimized operations
(support for real-time decision making)
Role of Data Streams
Platforms
Google Cloud Platform
AWS IoT Initiative
SAS
 Transform data — convert the data into another format, for example,
converting a captured device signal voltage to a calibrated unit measure of
temperature
 Aggregate and compute data — By combining data you can add checks:
such as averaging data across multiple devices to avoid acting on a single,
spurious, device; or ensure you have actionable data if a single device goes
offline. By adding computation to your pipeline, you can apply streaming
analytics to data while it is still in the processing pipeline.
 Enrich data — You can combine the device-generated data with other
metadata about the device, or with other datasets, such as weather or
traffic data, for use in subsequent analysis.
 Move data — You can store the processed data in one or more final storage
locations.
Role of “Pipelines”
Architecture
 Fault-tolerance against hardware failures and human errors
 Support for a variety of use cases that include low latency
querying as well as updates
 Linear scale-out capabilities, meaning that throwing more
machines at the problem should help with getting the job done
 Extensibility so that the system is manageable and can
accommodate newer features easily
 Consistency - data is the same across the cluster
 Availability - ability to access the cluster even if a node in the
cluster goes down
 Partition-tolerance - cluster continues to function even if there is
a "partition" (communications break) between two nodes
What we want from stream
architecture?
“It is impossible for a distributed computer system to
simultaneously provide all three of the following
guarantees:
 Consistency (all nodes see the same data at the same
time)
 Availability (a guarantee that every request receives a
response about whether it succeeded or failed)
 Partition tolerance (the system continues to operate
despite arbitrary partitioning due to network
failures)”
CAP Theorem
Facing the Cap Theorem
Consistency Availability
Partition
Tolerance
∅
Cassandra
Riak
CouchBase
MongoDB
λ
Poxos
Zab
Raft
λ-Architecture
 One-way data flow (doesn’t transact and make per-
event decisions on the streaming data, nor does it
respond immediately to the events coming in)
 Eventual consistency
 NoSQL
 Complexity
Limitations of the λ-Architecture
Out-of the box Solutions
 Designed for low latency
 Open-sourced in 2012
 Long history of data
 Scale > 500K events/sec in Avg
Druid Project
Druid data store
 Distributed stream processing framework
 Simple API
 Fault tolerance
 Manages stream state
 Fault tolerance
 Guarantee that messages are processed in the order
they were written to a partition, and that no
messages are ever lost.
Apache Samza
Apache Samza
Samza Architecture
VoltDB
Stream Databases and Pipelines
Building Blocks
PipelineDB (example of usage)
AWS Kinesis
Apache Cassandra
 Decentralized (Every node in the cluster has the same role.)
 No single point of failure.
 Scalable
 Read and write throughput both increase linearly as new machines
are added, with no downtime or interruption to applications.
 Fault-tolerant
 Tunable level of consistency, all the way from "writes never fail" to
"block for all replicas to be readable”
 Hadoop integration, integration with MapReduce
 Query language
Apache Flink
• High performance
• Low latency
• Support for out-of
order events
• Flexible streaming
window
• Fault tolerance
Stream Processing Algorithms
 Finding frequent items
 Estimating number of distinct
 Statistics
 Finding “signal”
 Error correction
 Filtering
 Anomaly detection
 Incremental learning
 Data clustering
Popular Stream Algorithms
Machine Learning from Stream Data
Take into account recent history
ML Model is updatable (“evolves”
as new data comes in)
How ML from stream data is
different from traditional ML
techniques?
 Incremental algorithms (both support vector
machines and neural networks can work
incrementally)
 Periodic retraining with new data batch
Two Approaches to Adopt ML to
Stream Data
Questions?

More Related Content

What's hot

Analytics for the Real-Time Web
Analytics for the Real-Time WebAnalytics for the Real-Time Web
Analytics for the Real-Time Web
maria.grineva
 
Big Data and Analytics Innovation Summit
Big Data and Analytics Innovation SummitBig Data and Analytics Innovation Summit
Big Data and Analytics Innovation Summit
Martin Yan
 

What's hot (20)

Let me connect your Vertex
Let me connect your VertexLet me connect your Vertex
Let me connect your Vertex
 
SnapLogic Live: IoT Integration
SnapLogic Live: IoT IntegrationSnapLogic Live: IoT Integration
SnapLogic Live: IoT Integration
 
SnapLogic Live: AWS Integration
SnapLogic Live: AWS IntegrationSnapLogic Live: AWS Integration
SnapLogic Live: AWS Integration
 
Le monitoring d'infrastructure de l'ingestion aux données : un jeu d'enfants !
Le monitoring d'infrastructure de l'ingestion aux données : un jeu d'enfants !Le monitoring d'infrastructure de l'ingestion aux données : un jeu d'enfants !
Le monitoring d'infrastructure de l'ingestion aux données : un jeu d'enfants !
 
Analytics for the Real-Time Web
Analytics for the Real-Time WebAnalytics for the Real-Time Web
Analytics for the Real-Time Web
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"
 
SnapLogic Live: ServiceNow Integration
SnapLogic Live: ServiceNow IntegrationSnapLogic Live: ServiceNow Integration
SnapLogic Live: ServiceNow Integration
 
Taming the QIX Engine with Reactive Programming
Taming the QIX Engine with Reactive ProgrammingTaming the QIX Engine with Reactive Programming
Taming the QIX Engine with Reactive Programming
 
Integrating Web and Business Data
Integrating Web and Business DataIntegrating Web and Business Data
Integrating Web and Business Data
 
Big Data and Analytics Innovation Summit
Big Data and Analytics Innovation SummitBig Data and Analytics Innovation Summit
Big Data and Analytics Innovation Summit
 
Next Generation of Data Integration with Azure Data Factory by Tom Kerkhove
Next Generation of Data Integration with Azure Data Factory by Tom KerkhoveNext Generation of Data Integration with Azure Data Factory by Tom Kerkhove
Next Generation of Data Integration with Azure Data Factory by Tom Kerkhove
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web Services
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
SnapLogic Live: Powering Cloud Analytics
SnapLogic Live: Powering Cloud AnalyticsSnapLogic Live: Powering Cloud Analytics
SnapLogic Live: Powering Cloud Analytics
 
Transforming data into actionable insights
Transforming data into actionable insightsTransforming data into actionable insights
Transforming data into actionable insights
 
Aws community day pune 2020 v3
Aws community day pune 2020 v3Aws community day pune 2020 v3
Aws community day pune 2020 v3
 
Real-time analytics in IoT by Sam Vanhoutte (@Building The Future 2019)
Real-time analytics in IoT by Sam Vanhoutte (@Building The Future 2019)Real-time analytics in IoT by Sam Vanhoutte (@Building The Future 2019)
Real-time analytics in IoT by Sam Vanhoutte (@Building The Future 2019)
 
Detect Fraud Successfully with GrabDefence! | Muqi Li, Grab
Detect Fraud Successfully with GrabDefence! | Muqi Li, GrabDetect Fraud Successfully with GrabDefence! | Muqi Li, Grab
Detect Fraud Successfully with GrabDefence! | Muqi Li, Grab
 
The Impact of Always-on Connectivity for Geospatial Applications and Analysis
The Impact of Always-on Connectivity for Geospatial Applications and AnalysisThe Impact of Always-on Connectivity for Geospatial Applications and Analysis
The Impact of Always-on Connectivity for Geospatial Applications and Analysis
 
Data Con LA 2019 - Large scale streaming analytics using cloud based managed ...
Data Con LA 2019 - Large scale streaming analytics using cloud based managed ...Data Con LA 2019 - Large scale streaming analytics using cloud based managed ...
Data Con LA 2019 - Large scale streaming analytics using cloud based managed ...
 

Viewers also liked

Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
DataWorks Summit
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
DataWorks Summit
 

Viewers also liked (20)

filename-1-rotated
filename-1-rotatedfilename-1-rotated
filename-1-rotated
 
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
 
Przedsiębiorczość w Polsce [infografika]
Przedsiębiorczość w Polsce [infografika]Przedsiębiorczość w Polsce [infografika]
Przedsiębiorczość w Polsce [infografika]
 
PipelineDBとは?
PipelineDBとは?PipelineDBとは?
PipelineDBとは?
 
The future of real time information
The future of real time informationThe future of real time information
The future of real time information
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Real-time analytics with HBase
Real-time analytics with HBaseReal-time analytics with HBase
Real-time analytics with HBase
 
Big Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesBig Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companies
 
5 najważniejszych trendów w Big Data na 2017 rok
5 najważniejszych trendów w Big Data na 2017 rok5 najważniejszych trendów w Big Data na 2017 rok
5 najważniejszych trendów w Big Data na 2017 rok
 
Real-time data integration to the cloud
Real-time data integration to the cloudReal-time data integration to the cloud
Real-time data integration to the cloud
 
Real-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataReal-time information analysis: social networks and open data
Real-time information analysis: social networks and open data
 
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
 
Stream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdaysStream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdays
 
Data science challenges in flight search
Data science challenges in flight searchData science challenges in flight search
Data science challenges in flight search
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWS
 
Wearable medical devices
Wearable medical devicesWearable medical devices
Wearable medical devices
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 

Similar to How to extract valueable information from real time data feeds

Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the Cloud
Federico Feroldi
 

Similar to How to extract valueable information from real time data feeds (20)

Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
AWS Summit 2018 Summary
AWS Summit 2018 SummaryAWS Summit 2018 Summary
AWS Summit 2018 Summary
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_
 
Microsoft Azure Cloud Basics Tutorial
Microsoft Azure Cloud Basics TutorialMicrosoft Azure Cloud Basics Tutorial
Microsoft Azure Cloud Basics Tutorial
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
 
Scalable Service Architectures
Scalable Service ArchitecturesScalable Service Architectures
Scalable Service Architectures
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
NoSQL
NoSQLNoSQL
NoSQL
 
Designing distributed systems
Designing distributed systemsDesigning distributed systems
Designing distributed systems
 
Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the Cloud
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
Oracle Coherence
Oracle CoherenceOracle Coherence
Oracle Coherence
 
Cloud Crowd GigaSpaces Presentation
Cloud Crowd GigaSpaces PresentationCloud Crowd GigaSpaces Presentation
Cloud Crowd GigaSpaces Presentation
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 

More from Gene Leybzon

Non-fungible tokens (nfts)
Non-fungible tokens (nfts)Non-fungible tokens (nfts)
Non-fungible tokens (nfts)
Gene Leybzon
 

More from Gene Leybzon (20)

Generative AI Application Development using LangChain and LangFlow
Generative AI Application Development using LangChain and LangFlowGenerative AI Application Development using LangChain and LangFlow
Generative AI Application Development using LangChain and LangFlow
 
Chat GPTs
Chat GPTsChat GPTs
Chat GPTs
 
Generative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second SessionGenerative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second Session
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First Session
 
Non-fungible tokens (nfts)
Non-fungible tokens (nfts)Non-fungible tokens (nfts)
Non-fungible tokens (nfts)
 
Introduction to Solidity and Smart Contract Development (9).pptx
Introduction to Solidity and Smart Contract Development (9).pptxIntroduction to Solidity and Smart Contract Development (9).pptx
Introduction to Solidity and Smart Contract Development (9).pptx
 
Ethereum in Enterprise.pptx
Ethereum in Enterprise.pptxEthereum in Enterprise.pptx
Ethereum in Enterprise.pptx
 
ERC-4907 Rentable NFT Standard.pptx
ERC-4907 Rentable NFT Standard.pptxERC-4907 Rentable NFT Standard.pptx
ERC-4907 Rentable NFT Standard.pptx
 
Onchain Decentralized Governance 2.pptx
Onchain Decentralized Governance 2.pptxOnchain Decentralized Governance 2.pptx
Onchain Decentralized Governance 2.pptx
 
Onchain Decentralized Governance.pptx
Onchain Decentralized Governance.pptxOnchain Decentralized Governance.pptx
Onchain Decentralized Governance.pptx
 
Web3 File Storage Options
Web3 File Storage OptionsWeb3 File Storage Options
Web3 File Storage Options
 
Web3 Full Stack Development
Web3 Full Stack DevelopmentWeb3 Full Stack Development
Web3 Full Stack Development
 
Instantly tradeable NFT contracts based on ERC-1155 standard
Instantly tradeable NFT contracts based on ERC-1155 standardInstantly tradeable NFT contracts based on ERC-1155 standard
Instantly tradeable NFT contracts based on ERC-1155 standard
 
Non-fungible tokens. From smart contract code to marketplace
Non-fungible tokens. From smart contract code to marketplaceNon-fungible tokens. From smart contract code to marketplace
Non-fungible tokens. From smart contract code to marketplace
 
The Art of non-fungible tokens
The Art of non-fungible tokensThe Art of non-fungible tokens
The Art of non-fungible tokens
 
Graph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGraph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d apps
 
Substrate Framework
Substrate FrameworkSubstrate Framework
Substrate Framework
 
Chainlink
ChainlinkChainlink
Chainlink
 
OpenZeppelin + Remix + BNB smart chain
OpenZeppelin + Remix + BNB smart chainOpenZeppelin + Remix + BNB smart chain
OpenZeppelin + Remix + BNB smart chain
 
Chainlink, Cosmos, Kusama, Polkadot: Approaches to the Internet of Blockchains
Chainlink, Cosmos, Kusama, Polkadot:   Approaches to the Internet of BlockchainsChainlink, Cosmos, Kusama, Polkadot:   Approaches to the Internet of Blockchains
Chainlink, Cosmos, Kusama, Polkadot: Approaches to the Internet of Blockchains
 

Recently uploaded

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

How to extract valueable information from real time data feeds

  • 1. How to extract valuable information from real- time data feeds Gene Leybzon, February 2016
  • 2. “The critical challenge is using this data when it is still in motion – and extracting valuable information from it.” - Frédéric Combaneyre, SAS IoT Challenge
  • 3.  Detect events of interest and trigger appropriate actions  Aggregate information for monitoring  Sensor data cleansing and validation  Real-time predictive and optimized operations (support for real-time decision making) Role of Data Streams
  • 7. SAS
  • 8.  Transform data — convert the data into another format, for example, converting a captured device signal voltage to a calibrated unit measure of temperature  Aggregate and compute data — By combining data you can add checks: such as averaging data across multiple devices to avoid acting on a single, spurious, device; or ensure you have actionable data if a single device goes offline. By adding computation to your pipeline, you can apply streaming analytics to data while it is still in the processing pipeline.  Enrich data — You can combine the device-generated data with other metadata about the device, or with other datasets, such as weather or traffic data, for use in subsequent analysis.  Move data — You can store the processed data in one or more final storage locations. Role of “Pipelines”
  • 10.  Fault-tolerance against hardware failures and human errors  Support for a variety of use cases that include low latency querying as well as updates  Linear scale-out capabilities, meaning that throwing more machines at the problem should help with getting the job done  Extensibility so that the system is manageable and can accommodate newer features easily  Consistency - data is the same across the cluster  Availability - ability to access the cluster even if a node in the cluster goes down  Partition-tolerance - cluster continues to function even if there is a "partition" (communications break) between two nodes What we want from stream architecture?
  • 11. “It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:  Consistency (all nodes see the same data at the same time)  Availability (a guarantee that every request receives a response about whether it succeeded or failed)  Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures)” CAP Theorem
  • 12. Facing the Cap Theorem Consistency Availability Partition Tolerance ∅ Cassandra Riak CouchBase MongoDB λ Poxos Zab Raft
  • 14.  One-way data flow (doesn’t transact and make per- event decisions on the streaming data, nor does it respond immediately to the events coming in)  Eventual consistency  NoSQL  Complexity Limitations of the λ-Architecture
  • 15. Out-of the box Solutions
  • 16.  Designed for low latency  Open-sourced in 2012  Long history of data  Scale > 500K events/sec in Avg Druid Project
  • 18.  Distributed stream processing framework  Simple API  Fault tolerance  Manages stream state  Fault tolerance  Guarantee that messages are processed in the order they were written to a partition, and that no messages are ever lost. Apache Samza
  • 22. Stream Databases and Pipelines Building Blocks
  • 25. Apache Cassandra  Decentralized (Every node in the cluster has the same role.)  No single point of failure.  Scalable  Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.  Fault-tolerant  Tunable level of consistency, all the way from "writes never fail" to "block for all replicas to be readable”  Hadoop integration, integration with MapReduce  Query language
  • 26. Apache Flink • High performance • Low latency • Support for out-of order events • Flexible streaming window • Fault tolerance
  • 28.  Finding frequent items  Estimating number of distinct  Statistics  Finding “signal”  Error correction  Filtering  Anomaly detection  Incremental learning  Data clustering Popular Stream Algorithms
  • 29. Machine Learning from Stream Data
  • 30. Take into account recent history ML Model is updatable (“evolves” as new data comes in) How ML from stream data is different from traditional ML techniques?
  • 31.  Incremental algorithms (both support vector machines and neural networks can work incrementally)  Periodic retraining with new data batch Two Approaches to Adopt ML to Stream Data

Editor's Notes

  1. https://aws.amazon.com/iot/how-it-works/#shadows
  2. https://en.wikipedia.org/wiki/CAP_theorem
  3. http://www.slideshare.net/gakhov/bbuzz-overview-part1
  4. http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html https://www.mapr.com/developercentral/lambda-architecture
  5. http://radar.oreilly.com/2015/02/improving-on-the-lambda-architecture-for-streaming-analysis.html
  6. https://en.wikipedia.org/wiki/Druid_(open-source_data_store)
  7. https://en.wikipedia.org/wiki/Druid_(open-source_data_store)
  8. https://github.com/pipelinedb/pipelinedb
  9. https://github.com/pipelinedb/pipelinedb
  10. https://flink.apache.org/features.html https://flink.apache.org/
  11. Considerations: Data Horizon Data Obsolescence