Big data! Fast data! Real-time analytics! These are buzzwords commonly associated with platform offerings around IoT.
Although the Law of large numbers always applies, just because you can deploy more sensors doesn't automatically mean that you should. After all, they cost money, bandwidth, and can be a pain to maintain. On the example of the Westminster Parking Trial, I'd like to show how analytics on preliminary survey data could have reduced the number of deployed sensors significantly.
A similar logic goes for fast and real-time analytics. While being advertised as killer features, many people new to IoT and analytics are not even aware that they might get away with batch processing. On the example of flying a drone, I'd like to discuss for which use cases I'd apply edge processing (on the drone), stream or micro-batch analytics (when data arrives at the platform) or work on batched data (stored in a database).
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Just because you can doesn't mean that you should - thingmonk 2016
1. Just because you can
doesn’t mean that
you should
Dr. Boris Adryan
@BorisAdryan
2. The logarithmic history of things
Boris the Academic “Give me £50M
and I build you the best IoT ontology
money can buy.”
3. “I wonder if
anyone is making
money with IoT”
Talking about inflated expectations
“There may be
money in IoT”
“I’m going to get
rich with IoT”
“I’m making a decent
salary with IoT”
4. The logarithmic history of things
Boris the Academic “Give me £50M
and I build you the best IoT ontology
money can buy.”
Boris the Freelancer “If you want to pay £5M
for machine learning - make sure it isn’t rude
or annoying.”
Boris at Zühlke “Don’t pay anyone £0.5M
- I show you how we can do it for half.”
6. Do I get more peanuts at Thing Monk
or at Monki Gras?
0 50 100
“on average”
thingmonk 3 samples
“on average”
monkigras
7. Do I get more peanuts at Thing Monk
or at Monki Gras?
0 50 100
“on average”
thingmonk 4 samples
“on average”
monkigras
8. Do I get more peanuts at Thing Monk
or at Monki Gras?
0 50 100
“on average”
thingmonk
n samples
“on average”
monkigras
statistical power through
large numbers of samples
deviation
9. Statisticians and data scientists LOVE
larger sample sizes!
…but if sampling costs time and resources, we need a
compromise.
10. precision and accuracy
that can be achieved
theoretically
Sampling strategy
precision and accuracy
that is needed to get
a job done
accurate
and precise
not accurate,
but precise
accurate,
not precise
not what
you want
12. 39% of survey participants
are worried about the
upfront investment for an
industrial IoT solution.
“Why aren’t you doing IoT?”
13. •how to cut down on hardware costs
•how to cut down on software costs
Sweetening IoT for your
customer
A few recommendations from the trenches:
insights from a project with OpenSensors
15. Can we learn an optimal
deployment and sampling pattern?
•sampling rate of 5-10 min
•data over 2 weeks in May 2015
•overall 2.6 million data points
Can we make Ethos’ budget go further by
• distributing a given number of sensors over
a wider geographic area?
• lowering the sampling rate for better
battery life?
labour:
expensive
sensor:
cheap
16. Correlation and clustering
0
5
10
15
20
0 3 6 9 12
“correlated”
0
5
10
15
20
0 3 6 9 12
“anti-correlated”
0
5
10
15
20
0 3 6 9 12
“independent”
lorry
coach
car
bike
skateboard
hierarchical clustering on
the basis of a feature matrix
17. Good news: temporal occupancy
pattern roughly predicts neighbours
lots in Southampton
lots around
the corner of
each other
750 parking lots
18. A caveat: Is a high-degree of correlation
a function of parking lot size?
finding two lots of 20
spaces that correlate
finding two lots of 3
spaces that correlate
0:00 12:00 23:59
0:00 12:00 23:59
“more likely”
“less likely”
19. Bootstrapping in DBSCAN clusters
Simulation: Swap the
occupancy vectors
between parking lots
of similar size and test
per grid cell if lots still
correlate
20. Verdict: In some grid cells the level of the occupancy of
one parking lot predicts the occupancy of most parking
spaces.
x
x
x
x
x
x
x
x
x x x
x
x
x
x
x
Better for navigation
We suggested that about ONE THIRD of
the sensors may be sufficient.
Better predictive power
21. Suggested technology for trials
A temporary survey would have allowed us to make
the same recommendation, including the insight that
the provided 5’ resolution is probably not required.
22. Monte Carlo simulations are great tools
to assess the business value of IoT
base
assets
“A tour of my assets
every Friday.”
base
‘cost function’:
sum of all edges
p1(need today)
“A demand-driven
tour of my assets.”
‘cost function’:
sum of edges
needed in 7 days
p2(need today)
p3(need today)
p4(need today)
p5(need today)
p6(need today)
23. Hardware is often perceived as
investment that customers
understand and therefore
anticipate the cost.
This talk is about unfounded IoT fears.
There’s an air of magic around
data and analytics.
24. “My data problem must be special!”
✓ unstructured data
✓ distributed ingestion and storage
Or they believe from hear-say that IoT
automatically requires:
✓ real-time analytics
✓ sophisticated machine learning
My company went to an
IoT conference
&
all I got was this t-shirt
and a bunch of
buzzwords.
Customers fear costs because
they’re thinking about:
25. “I need to do real-time analytics!”
microseconds
to seconds
seconds to
minutes
minutes
to hours
hours to
weeks
on
device
on
stream
in batch
am I falling?
counteract
battery level
should I land?
how many
times did I
stall?
what’s the best
weather for
flying?
in process
in database
operational insight
performance insight
strategic insight
e.g. Kalman filter
e.g. with machine learning
e.g. rules engine
e.g. summary stats
26. Can IoT ever be real-time?
zone 1:
real-time
[us]
zone 2:
real-time
[ms]
zone 3:
real-time
[s]
27. Edge, fog and cloud computing
Edge
Pro:
- immediate compression from raw
data to actionable information
- cuts down traffic
- fast response
Con:
- loses potentially valuable raw data
- developing analytics on embedded
systems requires specialists
- compute costs valuable battery life
Cloud
Pro:
- compute power
- scalability
- familiarity for developers
- integration centre across
all data sources
- cheapest ‘real-time’
option
Con:
- traffic
Fog
Pro:
- same as Edge
- closer to ‘normal’ development work
- gateways often mains-powered
Con:
- loses potentially valuable raw data
28. Options for real-time in cloud
some features can cost a bit, especially
when you don’t really know what
you’re doing and want to ‘try it out’.
a badly configured
SMACK stack on your
own commodity
hardware can be slow
and unreliable
your pre-trained
classifier
29. My current pet hate: Deep Learning
Deep learning has delivered impressive
results mimicking human reasoning,
strategic thinking and creativity.
At the same time, big players
have released libraries such
that even ‘script kiddies’ can
apply deep learning.
It’s already leading to unreflected use
of deep learning when other methods
would be more appropriate.
30. Dr. Boris Adryan
@BorisAdryan
‣ Preliminary surveys, data analysis and simulation can
help to minimise the number of sensors and develop an
optimal deployment strategy and sampling schedule.
‣ Faster analytics on bigger and better hardware are not
automatically the most useful solution.
‣ A good understanding on the type of insight that is
required by the business model is essential.
Zühlke can advise on options around IoT and
data analytics, and provide complete
solutions where needed.
Summary