SlideShare a Scribd company logo
1 of 29
The Continuous Distributed
Monitoring Model
Farzad Nozarian
fnozarian@aut.ac.ir
Chalmers University of Technology
18/04/2016
218/04/2016
Outline
Chalmers University of Technology
Countdown Problem
Monitoring Entropy
Geometric Approach
Sampling
Introduction
318/04/2016
What Is the Problem?
Chalmers University of Technology
Simple countdown!
Tracking the entropy
Distinct elements
Sampling
Top-k items
Several processing nodes receive streams of data items
The goal is how to monitor a function over the union of
items
Examples of monitoring functions:
with minimum communication cost
418/04/2016
Motivation and Applications
Chalmers University of Technology
Monitoring the global health of the network
in a large ISP
Tracking the usage of resources in
distributed data centers by social networks
Tracking global changes by collecting
information from sensors
518/04/2016
What Are the Challenges?
Chalmers University of Technology
Continuous Monitoring
Real-time tracking, rather than one-shot query
Streaming
Data is received at a very high speed
Distributed Processing
Each node only sees part of the global stream
Communication cost is important
618/04/2016
Trivial Solutions
Chalmers University of Technology
High communication cost!
Summarizing information in complex functions
Parameter tuning for frequency of the polling
Infrequent polling
Delay in identifying events
Frequent polling
High communication
Centralizing all the items
Periodic polling
The Countdown Problem
818/04/2016
The Countdown Problem
Chalmers University of Technology
A threshold monitoring problem with many applications
Identifying when the total number of observations reaches 𝜏
Trivial solution: Observers notify the coordinator
by sending a bit when an event is observed
But we can improve it!
O τ communication
918/04/2016
A First Approach
Chalmers University of Technology
The total communication is 𝑂(𝑘2 𝑙𝑜𝑔
𝜏
𝑘
)
Idea: there are many events at each
site before reaching the threshold 𝜏
At least one site should see 𝜏/𝑘 items before threshold
Every site waits to see at least 𝑆/𝑘 items before
reporting to the coordinator
After receiving a report from observer the coordinator
updates 𝑆 and informs all nodes
1018/04/2016
A Quadratic Improvement
Chalmers University of Technology
Waiting for more updates before reporting to
coordinator
Protocol runs over log(
𝜏
𝑘
) rounds
The total communication is 𝑂(𝑘 𝑙𝑜𝑔
𝜏
𝑘
)
In round 𝑗, all 𝑘 nodes wait to receive 2−𝑗 𝜏/𝑘 items
before reporting to the coordinator
Coordinator starts the 𝑗 + 1th round after receiving
𝑘 messages
Monitoring Entropy
1218/04/2016
Monitoring Entropy
Chalmers University of Technology
Monitoring non-monotone functions
Let 𝑓𝑖 denote the number of occurrences of item 𝑖
Let 𝑚 denote the total number of items
Union of input streams implicitly define a
probability distribution given by Pr[𝑖] = 𝑓𝑖/𝑚,
The goal is monitoring the entropy of this distribution
1318/04/2016
Entropy Protocol
Chalmers University of Technology
The protocol proceeds in multiple rounds
In the first round, coordinator collects a constant number
of items from sites
In each subsequent round 𝑖 coordinator does the following:
Computes the parameter 𝜏𝑖
Runs the approximate countdown protocol with 𝜏𝑖
Collects frequency distribution from all sites and
computes current entropy
The Geometric Approach
1518/04/2016
The Geometric Approach (1/2)
Chalmers University of Technology
Goal: monitoring of arbitrary threshold non-linear
functions
A geometric fact:
Idea: break down the testing of 𝑓 𝑥 > 𝜏
or 𝑓 𝑥 ≤ 𝜏 into local conditions
1618/04/2016
The Geometric Approach (2/2)
Chalmers University of Technology
Each site checks whether its sphere is monochromatic
When all the constraints are upheld:
Query result remains unchanged
No communication is required
When a constraint is violated:
New data is gathered from the streams
New constraints are set on the streams
Sampling
1818/04/2016
Sampling
Chalmers University of Technology
Given inputs of total size 𝑁, draw a sample of size 𝑠
Uniform over all subsets of size 𝑠
Sampling cases
Sampling
applications
Approximate query answering
Query planning
Number of distinct elements
Heavy hitters
Infinite windows
Sliding windows
1918/04/2016
Infinite Windows (1/2)
Chalmers University of Technology
Each site associates a random weight with each
observation
Coordinator maintains the following variables:
Set 𝑃 of 𝑠 random sample with weight no more than 𝑢
Weight 𝑢: the 𝑠-th smallest weight so far in the system
Each site only maintains its local 𝑠-th smallest weight 𝑢𝑖
2018/04/2016
Infinite Windows (2/2)
Chalmers University of Technology
Protocol outline:
Each site 𝑖 sends an element with weight smaller than
𝑢𝑖 to the coordinator
Coordinator updates 𝑃 and 𝑢, if weight of received
item is smaller than 𝑢
Coordinator replies back to site 𝑖 with the current value
of 𝑢
Thank You :)
Support Slides
2318/04/2016
A First Approach (long Ver.)
Chalmers University of Technology
Algorithm steps:
Initially, each site report the coordinator whenever its
num. of observed items exceeds 𝜏/𝑘
Coordinator compute current slack based on the sum
of all local count: 𝑆 = 𝜏 − 𝑁 (𝑁 is current count)
Each site set upper bound 𝑆/𝑘 on its local count
The total communication is 𝑂(𝑘2
𝑙𝑜𝑔
𝜏
𝑘
)
Idea: there are many events at each
site before reaching the threshold 𝜏
At least one site should see 𝜏/𝑘 items before threshold
2418/04/2016
Approximate Countdown
Chalmers University of Technology
Improve the cost by approximating the answer
Similar to previous approach but now terminate
when the bound of unreported count reaches 𝜖𝜏
The number of rounds is reduced to log
1
𝜖
The total communication is 𝑂(𝑘 log 1 /𝜖)
Let 𝜖 be the
approx. parameter
Report 0 if count < 1 − 𝜖 𝜏
Report 1 if count > 𝜏
2518/04/2016
Randomized Countdown Protocol (1/2)
Chalmers University of Technology
If 𝑘 grows very large the cost will be high
Allow algorithm to give an wrong answer with
small probability
Randomization reduces the dependency to 𝑘
by parameter 𝜖
2618/04/2016
Randomized Countdown Protocol (2/2)
Chalmers University of Technology
With randomization parameter 𝑐 determined by analysis:
Each site collect 𝜖2 𝜏
𝑐𝑘 of observations
With probability 1
𝑘 it sends a message
otherwise remains silent
The coordinator wait until receive 𝑐(
1
𝜖2 −
1
2𝜖
)
messages, then terminates
The total communication cost is 𝑂(
1
𝜖2)
2718/04/2016
Geometric Computational Model (1/2)
Chalmers University of Technology
Each site has a 𝑑-dimensional vector 𝑣𝑖(𝑡) called local
statistics vector
Let 𝑤1, 𝑤2, … , 𝑤 𝑛 be weights assigned to the streams
Define the global statistics vector 𝑣(𝑡) as the weighted
average of the 𝑣𝑖 𝑡 s
Let 𝑓: ℝ 𝑑 → ℝ be an arbitrary monitoring function
Goal: determining 𝑓 𝑣 𝑡 > 𝜏 at any given time 𝑡 and
threshold 𝜏
2818/04/2016
Geometric Computational Model (2/2)
Chalmers University of Technology
𝑣′𝑖 is the last statistics vector collected from the node 𝑝𝑖
Coordinator constructs estimate vector 𝑒(t) is the
weighted average of the 𝑣′𝑖
Each node 𝑝𝑖 also maintains following parameters:
Decomposing relies on the following fact:
𝑖=1
𝑛
𝑤𝑖 𝑢𝑖 𝑡
𝑖=1
𝑛
𝑤𝑖
= 𝑣(t)
Delta vector: Δ𝑣𝑖 𝑡 = 𝑣𝑖 𝑡 − 𝑣′𝑖
Drift vector: 𝑢𝑖 𝑡 = 𝑒 𝑡 + Δ𝑣𝑖(𝑡)
2918/04/2016
Geometric Interpretation
Chalmers University of Technology
Geometric interpretation: 𝑣(t) ∈ 𝐶𝑜𝑛𝑣(𝑢1 𝑡 , … , 𝑢 𝑛 𝑡 )
Convex hull can be fully covered by spheres with radius
𝑒−𝑢 𝑖
2
centered at
𝑒+𝑢 𝑖
2
𝑒
𝑢1
𝑢2
𝑢3
𝑢4𝑢5

More Related Content

Similar to The Continuous Distributed Monitoring Model

Hindin, David, US EPA, OECI, Next Generation Compliance Water and Waste Examp...
Hindin, David, US EPA, OECI, Next Generation Compliance Water and Waste Examp...Hindin, David, US EPA, OECI, Next Generation Compliance Water and Waste Examp...
Hindin, David, US EPA, OECI, Next Generation Compliance Water and Waste Examp...Kevin Perry
 
ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)
ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)
ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)Amit Mangukiya
 
Multisink based approach for continous object tracking wsn
Multisink based approach for continous object tracking  wsnMultisink based approach for continous object tracking  wsn
Multisink based approach for continous object tracking wsnSajida Imran
 
Contention - Aware Scheduling (a different approach)
Contention - Aware Scheduling (a different approach)Contention - Aware Scheduling (a different approach)
Contention - Aware Scheduling (a different approach)Dimos Raptis
 
Network Measurement and Monitori - Assigment 1, Group3, "Classification"
Network Measurement and Monitori - Assigment 1, Group3, "Classification"Network Measurement and Monitori - Assigment 1, Group3, "Classification"
Network Measurement and Monitori - Assigment 1, Group3, "Classification"Valentin Thirion
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor NetworksOscar Corcho
 
Prognostic Meteorological Models and Their Use in Dispersion Modelling
Prognostic Meteorological Models and Their Use in Dispersion ModellingPrognostic Meteorological Models and Their Use in Dispersion Modelling
Prognostic Meteorological Models and Their Use in Dispersion ModellingIES / IAQM
 
Reactive by example - at Reversim Summit 2015
Reactive by example - at Reversim Summit 2015Reactive by example - at Reversim Summit 2015
Reactive by example - at Reversim Summit 2015Eran Harel
 
Congestion Control in Wireless Sensor Networks Using Genetic Algorithm
Congestion Control in Wireless Sensor Networks Using Genetic AlgorithmCongestion Control in Wireless Sensor Networks Using Genetic Algorithm
Congestion Control in Wireless Sensor Networks Using Genetic AlgorithmEditor IJCATR
 
MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco
MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco
MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco IJECEIAES
 
Pittcon06 Auto Chrom
Pittcon06 Auto ChromPittcon06 Auto Chrom
Pittcon06 Auto Chromniharaina
 
An efficient recovery mechanism
An efficient recovery mechanismAn efficient recovery mechanism
An efficient recovery mechanismijcsa
 
Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDICO ...
Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDICO ...Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDICO ...
Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDICO ...Christo Ananth
 
 Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDIC...
 Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDIC... Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDIC...
 Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDIC...Christo Ananth
 
IRJET- IoT based Flow Analyzing and Alerting System
IRJET- IoT based Flow Analyzing and Alerting SystemIRJET- IoT based Flow Analyzing and Alerting System
IRJET- IoT based Flow Analyzing and Alerting SystemIRJET Journal
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Raja Chiky
 
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingCREATE-NET
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 

Similar to The Continuous Distributed Monitoring Model (20)

Hindin, David, US EPA, OECI, Next Generation Compliance Water and Waste Examp...
Hindin, David, US EPA, OECI, Next Generation Compliance Water and Waste Examp...Hindin, David, US EPA, OECI, Next Generation Compliance Water and Waste Examp...
Hindin, David, US EPA, OECI, Next Generation Compliance Water and Waste Examp...
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)
ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)
ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)
 
Multisink based approach for continous object tracking wsn
Multisink based approach for continous object tracking  wsnMultisink based approach for continous object tracking  wsn
Multisink based approach for continous object tracking wsn
 
Contention - Aware Scheduling (a different approach)
Contention - Aware Scheduling (a different approach)Contention - Aware Scheduling (a different approach)
Contention - Aware Scheduling (a different approach)
 
Network Measurement and Monitori - Assigment 1, Group3, "Classification"
Network Measurement and Monitori - Assigment 1, Group3, "Classification"Network Measurement and Monitori - Assigment 1, Group3, "Classification"
Network Measurement and Monitori - Assigment 1, Group3, "Classification"
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
Prognostic Meteorological Models and Their Use in Dispersion Modelling
Prognostic Meteorological Models and Their Use in Dispersion ModellingPrognostic Meteorological Models and Their Use in Dispersion Modelling
Prognostic Meteorological Models and Their Use in Dispersion Modelling
 
Reactive by example - at Reversim Summit 2015
Reactive by example - at Reversim Summit 2015Reactive by example - at Reversim Summit 2015
Reactive by example - at Reversim Summit 2015
 
Congestion Control in Wireless Sensor Networks Using Genetic Algorithm
Congestion Control in Wireless Sensor Networks Using Genetic AlgorithmCongestion Control in Wireless Sensor Networks Using Genetic Algorithm
Congestion Control in Wireless Sensor Networks Using Genetic Algorithm
 
MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco
MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco
MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco
 
Pittcon06 Auto Chrom
Pittcon06 Auto ChromPittcon06 Auto Chrom
Pittcon06 Auto Chrom
 
An efficient recovery mechanism
An efficient recovery mechanismAn efficient recovery mechanism
An efficient recovery mechanism
 
Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDICO ...
Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDICO ...Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDICO ...
Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDICO ...
 
 Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDIC...
 Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDIC... Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDIC...
 Call for Papers- Thematic Issue: Food, Drug and Energy Production, PERIÓDIC...
 
IRJET- IoT based Flow Analyzing and Alerting System
IRJET- IoT based Flow Analyzing and Alerting SystemIRJET- IoT based Flow Analyzing and Alerting System
IRJET- IoT based Flow Analyzing and Alerting System
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Transition scope
Transition scopeTransition scope
Transition scope
 
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computing
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 

More from Farzad Nozarian

SHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL DatabasesSHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL DatabasesFarzad Nozarian
 
Ultimate Goals In Robotics
Ultimate Goals In RoboticsUltimate Goals In Robotics
Ultimate Goals In RoboticsFarzad Nozarian
 
Apache HBase - Lab Assignment
Apache HBase - Lab AssignmentApache HBase - Lab Assignment
Apache HBase - Lab AssignmentFarzad Nozarian
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentFarzad Nozarian
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
Big Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing EnvironmentsBig Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing EnvironmentsFarzad Nozarian
 

More from Farzad Nozarian (10)

SHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL DatabasesSHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL Databases
 
Ultimate Goals In Robotics
Ultimate Goals In RoboticsUltimate Goals In Robotics
Ultimate Goals In Robotics
 
Shark - Lab Assignment
Shark - Lab AssignmentShark - Lab Assignment
Shark - Lab Assignment
 
Apache HBase - Lab Assignment
Apache HBase - Lab AssignmentApache HBase - Lab Assignment
Apache HBase - Lab Assignment
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Big Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing EnvironmentsBig Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing Environments
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 

Recently uploaded (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 

The Continuous Distributed Monitoring Model

  • 1. The Continuous Distributed Monitoring Model Farzad Nozarian fnozarian@aut.ac.ir Chalmers University of Technology 18/04/2016
  • 2. 218/04/2016 Outline Chalmers University of Technology Countdown Problem Monitoring Entropy Geometric Approach Sampling Introduction
  • 3. 318/04/2016 What Is the Problem? Chalmers University of Technology Simple countdown! Tracking the entropy Distinct elements Sampling Top-k items Several processing nodes receive streams of data items The goal is how to monitor a function over the union of items Examples of monitoring functions: with minimum communication cost
  • 4. 418/04/2016 Motivation and Applications Chalmers University of Technology Monitoring the global health of the network in a large ISP Tracking the usage of resources in distributed data centers by social networks Tracking global changes by collecting information from sensors
  • 5. 518/04/2016 What Are the Challenges? Chalmers University of Technology Continuous Monitoring Real-time tracking, rather than one-shot query Streaming Data is received at a very high speed Distributed Processing Each node only sees part of the global stream Communication cost is important
  • 6. 618/04/2016 Trivial Solutions Chalmers University of Technology High communication cost! Summarizing information in complex functions Parameter tuning for frequency of the polling Infrequent polling Delay in identifying events Frequent polling High communication Centralizing all the items Periodic polling
  • 8. 818/04/2016 The Countdown Problem Chalmers University of Technology A threshold monitoring problem with many applications Identifying when the total number of observations reaches 𝜏 Trivial solution: Observers notify the coordinator by sending a bit when an event is observed But we can improve it! O τ communication
  • 9. 918/04/2016 A First Approach Chalmers University of Technology The total communication is 𝑂(𝑘2 𝑙𝑜𝑔 𝜏 𝑘 ) Idea: there are many events at each site before reaching the threshold 𝜏 At least one site should see 𝜏/𝑘 items before threshold Every site waits to see at least 𝑆/𝑘 items before reporting to the coordinator After receiving a report from observer the coordinator updates 𝑆 and informs all nodes
  • 10. 1018/04/2016 A Quadratic Improvement Chalmers University of Technology Waiting for more updates before reporting to coordinator Protocol runs over log( 𝜏 𝑘 ) rounds The total communication is 𝑂(𝑘 𝑙𝑜𝑔 𝜏 𝑘 ) In round 𝑗, all 𝑘 nodes wait to receive 2−𝑗 𝜏/𝑘 items before reporting to the coordinator Coordinator starts the 𝑗 + 1th round after receiving 𝑘 messages
  • 12. 1218/04/2016 Monitoring Entropy Chalmers University of Technology Monitoring non-monotone functions Let 𝑓𝑖 denote the number of occurrences of item 𝑖 Let 𝑚 denote the total number of items Union of input streams implicitly define a probability distribution given by Pr[𝑖] = 𝑓𝑖/𝑚, The goal is monitoring the entropy of this distribution
  • 13. 1318/04/2016 Entropy Protocol Chalmers University of Technology The protocol proceeds in multiple rounds In the first round, coordinator collects a constant number of items from sites In each subsequent round 𝑖 coordinator does the following: Computes the parameter 𝜏𝑖 Runs the approximate countdown protocol with 𝜏𝑖 Collects frequency distribution from all sites and computes current entropy
  • 15. 1518/04/2016 The Geometric Approach (1/2) Chalmers University of Technology Goal: monitoring of arbitrary threshold non-linear functions A geometric fact: Idea: break down the testing of 𝑓 𝑥 > 𝜏 or 𝑓 𝑥 ≤ 𝜏 into local conditions
  • 16. 1618/04/2016 The Geometric Approach (2/2) Chalmers University of Technology Each site checks whether its sphere is monochromatic When all the constraints are upheld: Query result remains unchanged No communication is required When a constraint is violated: New data is gathered from the streams New constraints are set on the streams
  • 18. 1818/04/2016 Sampling Chalmers University of Technology Given inputs of total size 𝑁, draw a sample of size 𝑠 Uniform over all subsets of size 𝑠 Sampling cases Sampling applications Approximate query answering Query planning Number of distinct elements Heavy hitters Infinite windows Sliding windows
  • 19. 1918/04/2016 Infinite Windows (1/2) Chalmers University of Technology Each site associates a random weight with each observation Coordinator maintains the following variables: Set 𝑃 of 𝑠 random sample with weight no more than 𝑢 Weight 𝑢: the 𝑠-th smallest weight so far in the system Each site only maintains its local 𝑠-th smallest weight 𝑢𝑖
  • 20. 2018/04/2016 Infinite Windows (2/2) Chalmers University of Technology Protocol outline: Each site 𝑖 sends an element with weight smaller than 𝑢𝑖 to the coordinator Coordinator updates 𝑃 and 𝑢, if weight of received item is smaller than 𝑢 Coordinator replies back to site 𝑖 with the current value of 𝑢
  • 23. 2318/04/2016 A First Approach (long Ver.) Chalmers University of Technology Algorithm steps: Initially, each site report the coordinator whenever its num. of observed items exceeds 𝜏/𝑘 Coordinator compute current slack based on the sum of all local count: 𝑆 = 𝜏 − 𝑁 (𝑁 is current count) Each site set upper bound 𝑆/𝑘 on its local count The total communication is 𝑂(𝑘2 𝑙𝑜𝑔 𝜏 𝑘 ) Idea: there are many events at each site before reaching the threshold 𝜏 At least one site should see 𝜏/𝑘 items before threshold
  • 24. 2418/04/2016 Approximate Countdown Chalmers University of Technology Improve the cost by approximating the answer Similar to previous approach but now terminate when the bound of unreported count reaches 𝜖𝜏 The number of rounds is reduced to log 1 𝜖 The total communication is 𝑂(𝑘 log 1 /𝜖) Let 𝜖 be the approx. parameter Report 0 if count < 1 − 𝜖 𝜏 Report 1 if count > 𝜏
  • 25. 2518/04/2016 Randomized Countdown Protocol (1/2) Chalmers University of Technology If 𝑘 grows very large the cost will be high Allow algorithm to give an wrong answer with small probability Randomization reduces the dependency to 𝑘 by parameter 𝜖
  • 26. 2618/04/2016 Randomized Countdown Protocol (2/2) Chalmers University of Technology With randomization parameter 𝑐 determined by analysis: Each site collect 𝜖2 𝜏 𝑐𝑘 of observations With probability 1 𝑘 it sends a message otherwise remains silent The coordinator wait until receive 𝑐( 1 𝜖2 − 1 2𝜖 ) messages, then terminates The total communication cost is 𝑂( 1 𝜖2)
  • 27. 2718/04/2016 Geometric Computational Model (1/2) Chalmers University of Technology Each site has a 𝑑-dimensional vector 𝑣𝑖(𝑡) called local statistics vector Let 𝑤1, 𝑤2, … , 𝑤 𝑛 be weights assigned to the streams Define the global statistics vector 𝑣(𝑡) as the weighted average of the 𝑣𝑖 𝑡 s Let 𝑓: ℝ 𝑑 → ℝ be an arbitrary monitoring function Goal: determining 𝑓 𝑣 𝑡 > 𝜏 at any given time 𝑡 and threshold 𝜏
  • 28. 2818/04/2016 Geometric Computational Model (2/2) Chalmers University of Technology 𝑣′𝑖 is the last statistics vector collected from the node 𝑝𝑖 Coordinator constructs estimate vector 𝑒(t) is the weighted average of the 𝑣′𝑖 Each node 𝑝𝑖 also maintains following parameters: Decomposing relies on the following fact: 𝑖=1 𝑛 𝑤𝑖 𝑢𝑖 𝑡 𝑖=1 𝑛 𝑤𝑖 = 𝑣(t) Delta vector: Δ𝑣𝑖 𝑡 = 𝑣𝑖 𝑡 − 𝑣′𝑖 Drift vector: 𝑢𝑖 𝑡 = 𝑒 𝑡 + Δ𝑣𝑖(𝑡)
  • 29. 2918/04/2016 Geometric Interpretation Chalmers University of Technology Geometric interpretation: 𝑣(t) ∈ 𝐶𝑜𝑛𝑣(𝑢1 𝑡 , … , 𝑢 𝑛 𝑡 ) Convex hull can be fully covered by spheres with radius 𝑒−𝑢 𝑖 2 centered at 𝑒+𝑢 𝑖 2 𝑒 𝑢1 𝑢2 𝑢3 𝑢4𝑢5

Editor's Notes

  1. Have you ever thought how twitter track the current trends? In this presentation, I am going to talk about a paper that presents a model called continuous distributed monitoring which is the basis of not only the task of tracking trends but also many other applications.
  2. We first start by introducing the continuous distributed monitoring model, the applications and motivations behind this topic and we will see some challenges and trivial solutions for this model. Then in the second section, I’ll introduce a simple monitoring task: the countdown problem and explain two simple solutions for that. In the third section, we will investigate a more complex problem in this model which is how to monitor the entropy. After that in an important section we will introduce a general approach to solve any arbitrary problem in this model. And finally, in the last section, we will see how we can obtain a random sample drawn from the union of streams.
  3. Lets start with a simple question! What is the problem? The Continuous distributed monitoring model is one of the most important models in stream processing. In this model there are several physically distributed sites receiving high-volume local streams of data. These sites talk to a central coordinator, who has to continuously MONITOR A FUNCTION over the union of all streams observed so far. The challenge is to minimize the communication between the different sites and the coordinator, while providing an accurate answer to queries at all times. The monitoring functions can be as simple as counting the total number of observations, or more complex non-linear functions such as tracking the entropy of the induced distribution.
  4. A common requirement in many emerging applications is the ability to process a continuous high volume stream of data. For example: Network elements within the network of a large ISP are observing local usage of links, and wish to work together to compute functions which determine the overall health of the network. Or for example consider a scenario in Internet of things where many sensors have been deployed in the field, with the aim of collecting environmental information, and need to cooperate to track global changes in this data. And as another case think about twitter that wants to monitor the usage of many resources in distributed data centers spread around the world
  5. There are several challenges in continuous distributed monitoring, First of all because continuous distributed model is a special case within the general area of distributed processing so obviously we will experience the taste of challenges that exist in distributed computing. For example, each node only sees part of the global stream Or the communication is so important in processing In addition the monitoring should be continuous that means it should be real time, rather than one-shot query And finally we have to process infinite high-speed stream of data with constrained CPU and memory
  6. One can say, why don’t we just simply send all the observations to a single, centralized location. If we have a stream with a sufficiently slow rate, it might be a good idea, however for the case of high speed streams it causes a lot of communication. Another simple approach is that a central monitor polls each observer for their observation periodically and combines these together to get a snapshot of the current status. This approach has also some limitations. First, the information needed in this protocol should be summarized compactly. It is hard for some complex algorithms to provide a summarized scheme. In addition, this protocol requires careful parameter tuning for frequency of the polling event. if we set the polling time too large there is large delay in identifying events and If we set the polling time very short it might cause network overload.
  7. Let see the first simple countdown problem.
  8. The count down problem is an instance of threshold monitoring problem with many applications. we wish to determine when a total of 𝜏 observation have been seen. A trivial solution is that each observer sends one bit to the coordinator for each event which uses order of tau communication. But there are several better answers that we will introduce in the following slides.
  9. A simple idea to improve the trivial solution is to consider that there are many events at each site before the threshold can be reached. The smarter approach uses the fact that at least one … So each site instead of sending a bit whenever it sees an item, it waits until at least S/k items have been received and after that it reports to the coordinator
  10. The previous approach was inappropriate in terms of communication because updating all node whenever one node reports that it has exceeded its current local threshold is wasteful. So we can improve this algorithm by tolerating more updates before a global communication is triggered. In round 𝑗, each observer sends a message to the coordinator when its local count reaches this amount Subtract this amount from 𝑛 𝑖 and alert the coordinator Coordinator waits until it has received 𝑘 messages, then alerts all nodes to starts round 𝑗+1 This process continues until the bound reaches 1, when each site reports each event when it occurs. a factor k improvement over the previous approach.
  11. Next, we move on monitoring the entropy
  12. The solutions that we described for countdown problem depend on the fact that monitored function is monotone: that means the number of events kept increasing. Monitoring non-monotone functions are more complex. Entropy is one of those non-monotonic functions. Suppose that fi denote the number of occurrences of item i observed across the whole system, And m denote the total number of items Then we can use the empirical probability to implicitly define the probability distribution of union of input streams So the goal is monitoring … Now lets describe this protocol
  13. This is because the entropy can change quickly in this initial stage. Then it is uses a special data structure called sketch to get an estimate of entropy with an additive error. Compute a parameter tau_i based on the estimated entropy calculated in first step Then each site simulate the countdown monitoring algorithm with the calculated tau_i and error ½ The value of calculated tau_i ensures that the countdown algorithm notifies the coordinator when enough items have been received in the round, thus avoid unnecessary communication.
  14. Most of the proposed schemes, deal with monitoring simple aggregated values but we are looking for an approach which is able to compute any non-linear function in this model An arbitrary global monitoring task can be split into a set of constraints applied locally on each of the streams. The heart of this idea is a geometric fact, that the area of a convex hull of a set of points can be fully covered by a set of spheres, one sphere incident on each point. Each of the nodes can independently construct its sphere.
  15. As data arrives on the streams, the local constraint each node maintains is to verify that its sphere is monochromatic. As long as none of these constraints have been violated, the query result is guaranteed to remain unchanged, and thus no communication is required.
  16. A fundamental problem in distributed monitoring is to obtain a random sample drawn from the union of all distributed streams There are many scenarios where we need sampling. For instance, approx. query answering or query planning or There are two variations for sampling: In the fist case we want to sample over all the events we have observed so far And in the second case we want a sample only over the more recent events Because we don’t have enough time I just explain the fist approach: infinite windows
  17. In sampling in infinite windows each site associates a random weight with each observation. The coordinator then maintains the set of s elements with the minimum weights in the union of the streams at all times Each site only maintains its local s-th smallest weight denoted by u_i Note that it is too expensive to keep the view of each site synchronized with the coordinator’s view at all times
  18. Then the coordinator discard an element from P with the largest weight and insert the new element Update u to the current largest weight in P … with the current value of u which is the minimum weight in the union of all streams
  19. A simple idea to improve the trivial solution is to consider that there are many events at each site before the threshold can be reached. The smarter approach uses the fact that at least one of … So we can use this fact for a new algorithm. The steps of proposed algorithm are as follows: Initially each site set its upper bound of observed item to 𝜏/𝑘 and begins to observe events Then the coordinator determine the current slack, that is the difference between the current count N and the threshold. After that, the coordinator Redistribute the updated slack to the observers and enforce an upper bound of S/k to observers.
  20. in many application scenarios, exact tracking is often unnecessary. Users are willing to trade-off accuracy with savings in communication. So to improve the answer we can use approximation Or tolerate some imprecision in the result the coordinator can determine that the true count is below … or above ….. .and when the true count is in between then, it doesn’t matter and the coordinator can indicate either state. This approach exactly works with the same algorithm illustrated in the previous slide, but now we terminate when we reach to the upper bound of 1−𝜖 𝜏
  21. The idea here is that there will be enough opportunities to send messages that with high probability the coordinator will not declare too early or too late.
  22. We are interested in determining at any given time, t, whether or not f(~v(t)) > r, where r is a predetermined threshold value. The weight wi assigned to the node pi usually corresponds to the number of data items its local statistics vector is derived from.
  23. The last statistics vector collected from the node 𝑝 𝑖 is denoted by 𝑣 ’ 𝑖 . The estimate vector is the weighted average of the the local statistics vectors collected from the nodes at certain times, as dictated by the algorithms. Delta vector is the difference between the current local statistics vector and the last statistics vector collected from the node, Drift vector is a displacement of delta v in relation to the estimate vector, The method for decomposing the monitoring task is based on the following observation that at any given time the weighted average of the drift vectors held by the nodes is equal to the global statistics vector
  24. The geometric interpretation of the this fact is that the global statistics vector is in the convex hull of the drift vectors held by the nodes, This observation enables us to take advantage of the fact that this convex hull is fully covered by a set of spheres with radius … centered at … Each sphere is constructed independently by one of the nodes and at any given time each node has all the information required to independently construct its spheres.