The Continuous Distributed Monitoring Model

The Continuous Distributed
Monitoring Model
Farzad Nozarian
fnozarian@aut.ac.ir
Chalmers University of Technology
18/04/2016

218/04/2016
Outline
Countdown Problem
Monitoring Entropy
Geometric Approach
Sampling
Introduction

318/04/2016
What Is the Problem?
Simple countdown!
Tracking the entropy
Distinct elements
Sampling
Top-k items
Several processing nodes receive streams of data items
The goal is how to monitor a function over the union of
items
Examples of monitoring functions:
with minimum communication cost

418/04/2016
Motivation and Applications
Monitoring the global health of the network
in a large ISP
Tracking the usage of resources in
distributed data centers by social networks
Tracking global changes by collecting
information from sensors

518/04/2016
What Are the Challenges?
Continuous Monitoring
Real-time tracking, rather than one-shot query
Streaming
Data is received at a very high speed
Distributed Processing
Each node only sees part of the global stream
Communication cost is important

618/04/2016
Trivial Solutions
High communication cost!
Summarizing information in complex functions
Parameter tuning for frequency of the polling
Infrequent polling
Delay in identifying events
Frequent polling
High communication
Centralizing all the items
Periodic polling

818/04/2016
The Countdown Problem
A threshold monitoring problem with many applications
Identifying when the total number of observations reaches 𝜏
Trivial solution: Observers notify the coordinator
by sending a bit when an event is observed
But we can improve it!
O τ communication

918/04/2016
A First Approach
The total communication is 𝑂(𝑘2 𝑙𝑜𝑔
𝜏
𝑘
)
Idea: there are many events at each
site before reaching the threshold 𝜏
At least one site should see 𝜏/𝑘 items before threshold
Every site waits to see at least 𝑆/𝑘 items before
reporting to the coordinator
After receiving a report from observer the coordinator
updates 𝑆 and informs all nodes

1018/04/2016
A Quadratic Improvement
Waiting for more updates before reporting to
coordinator
Protocol runs over log(
𝜏
𝑘
) rounds
The total communication is 𝑂(𝑘 𝑙𝑜𝑔
𝜏
𝑘
)
In round 𝑗, all 𝑘 nodes wait to receive 2−𝑗 𝜏/𝑘 items
before reporting to the coordinator
Coordinator starts the 𝑗 + 1th round after receiving
𝑘 messages

1218/04/2016
Monitoring Entropy
Monitoring non-monotone functions
Let 𝑓𝑖 denote the number of occurrences of item 𝑖
Let 𝑚 denote the total number of items
Union of input streams implicitly define a
probability distribution given by Pr[𝑖] = 𝑓𝑖/𝑚,
The goal is monitoring the entropy of this distribution

1318/04/2016
Entropy Protocol
The protocol proceeds in multiple rounds
In the first round, coordinator collects a constant number
of items from sites
In each subsequent round 𝑖 coordinator does the following:
Computes the parameter 𝜏𝑖
Runs the approximate countdown protocol with 𝜏𝑖
Collects frequency distribution from all sites and
computes current entropy

1518/04/2016
The Geometric Approach (1/2)
Goal: monitoring of arbitrary threshold non-linear
functions
A geometric fact:
Idea: break down the testing of 𝑓 𝑥 > 𝜏
or 𝑓 𝑥 ≤ 𝜏 into local conditions

1618/04/2016
The Geometric Approach (2/2)
Each site checks whether its sphere is monochromatic
When all the constraints are upheld:
Query result remains unchanged
No communication is required
When a constraint is violated:
New data is gathered from the streams
New constraints are set on the streams

1818/04/2016
Sampling
Given inputs of total size 𝑁, draw a sample of size 𝑠
Uniform over all subsets of size 𝑠
Sampling cases
Sampling
applications
Approximate query answering
Query planning
Number of distinct elements
Heavy hitters
Infinite windows
Sliding windows

1918/04/2016
Infinite Windows (1/2)
Each site associates a random weight with each
observation
Coordinator maintains the following variables:
Set 𝑃 of 𝑠 random sample with weight no more than 𝑢
Weight 𝑢: the 𝑠-th smallest weight so far in the system
Each site only maintains its local 𝑠-th smallest weight 𝑢𝑖

2018/04/2016
Infinite Windows (2/2)
Protocol outline:
Each site 𝑖 sends an element with weight smaller than
𝑢𝑖 to the coordinator
Coordinator updates 𝑃 and 𝑢, if weight of received
item is smaller than 𝑢
Coordinator replies back to site 𝑖 with the current value
of 𝑢

2318/04/2016
A First Approach (long Ver.)
Algorithm steps:
Initially, each site report the coordinator whenever its
num. of observed items exceeds 𝜏/𝑘
Coordinator compute current slack based on the sum
of all local count: 𝑆 = 𝜏 − 𝑁 (𝑁 is current count)
Each site set upper bound 𝑆/𝑘 on its local count
The total communication is 𝑂(𝑘2
𝑙𝑜𝑔
𝜏
𝑘
)
Idea: there are many events at each
site before reaching the threshold 𝜏
At least one site should see 𝜏/𝑘 items before threshold

2418/04/2016
Approximate Countdown
Improve the cost by approximating the answer
Similar to previous approach but now terminate
when the bound of unreported count reaches 𝜖𝜏
The number of rounds is reduced to log
1
𝜖
The total communication is 𝑂(𝑘 log 1 /𝜖)
Let 𝜖 be the
approx. parameter
Report 0 if count < 1 − 𝜖 𝜏
Report 1 if count > 𝜏

2518/04/2016
Randomized Countdown Protocol (1/2)
If 𝑘 grows very large the cost will be high
Allow algorithm to give an wrong answer with
small probability
Randomization reduces the dependency to 𝑘
by parameter 𝜖

2618/04/2016
Randomized Countdown Protocol (2/2)
With randomization parameter 𝑐 determined by analysis:
Each site collect 𝜖2 𝜏
𝑐𝑘 of observations
With probability 1
𝑘 it sends a message
otherwise remains silent
The coordinator wait until receive 𝑐(
1
𝜖2 −
1
2𝜖
)
messages, then terminates
The total communication cost is 𝑂(
1
𝜖2)

2718/04/2016
Geometric Computational Model (1/2)
Each site has a 𝑑-dimensional vector 𝑣𝑖(𝑡) called local
statistics vector
Let 𝑤1, 𝑤2, … , 𝑤 𝑛 be weights assigned to the streams
Define the global statistics vector 𝑣(𝑡) as the weighted
average of the 𝑣𝑖 𝑡 s
Let 𝑓: ℝ 𝑑 → ℝ be an arbitrary monitoring function
Goal: determining 𝑓 𝑣 𝑡 > 𝜏 at any given time 𝑡 and
threshold 𝜏

2818/04/2016
Geometric Computational Model (2/2)
𝑣′𝑖 is the last statistics vector collected from the node 𝑝𝑖
Coordinator constructs estimate vector 𝑒(t) is the
weighted average of the 𝑣′𝑖
Each node 𝑝𝑖 also maintains following parameters:
Decomposing relies on the following fact:
𝑖=1
𝑛
𝑤𝑖 𝑢𝑖 𝑡
𝑖=1
𝑛
𝑤𝑖
= 𝑣(t)
Delta vector: Δ𝑣𝑖 𝑡 = 𝑣𝑖 𝑡 − 𝑣′𝑖
Drift vector: 𝑢𝑖 𝑡 = 𝑒 𝑡 + Δ𝑣𝑖(𝑡)

2918/04/2016
Geometric Interpretation
Geometric interpretation: 𝑣(t) ∈ 𝐶𝑜𝑛𝑣(𝑢1 𝑡 , … , 𝑢 𝑛 𝑡 )
Convex hull can be fully covered by spheres with radius
𝑒−𝑢 𝑖
2
centered at
𝑒+𝑢 𝑖
2
𝑒
𝑢1
𝑢2
𝑢3
𝑢4𝑢5

The Continuous Distributed Monitoring Model

Recommended

Recommended

More Related Content

Similar to The Continuous Distributed Monitoring Model

Similar to The Continuous Distributed Monitoring Model (20)

More from Farzad Nozarian

More from Farzad Nozarian (10)

Recently uploaded

Recently uploaded (20)

The Continuous Distributed Monitoring Model

Editor's Notes