6. IoT Predictive Maintenance Concepts
Predictive Maintenance in IoT
Traditional Predicative
Maintenance
Goal
Improve production and/or
maintenance efficiency
Ensure the reliability of
machine operation
Data
Data stream (time varying features),
Multiple data sources
Very limited time varying
features
Scope Component level, System level Parts level
Approach Data driven Model driven
Tasks
Failure prediction, fault/failure
detection & diagnosis, maintenance
actions recommendation, etc.
Essentially any task that improves
production/maintenance efficiency
Failure prediction
(prognosis), fault/failure
detection & diagnosis
(diagnosis)
8. IoT Predictive Maintenance – Qantas Airways
~24,000 sensors
Qantas A380 Fleet
Technical Delays
12
$65M+
per A380
50%
Technical Delays
400-700 Fault/warning
messages/day
have potential for predictive
modelling
Develop ML model
(MATLAB)
alongside local
university
Optimise code
Reduce runtime
Develop
user web
front endBuild
evaluation
module
Refine model
parameters
Configure model
in AML PM
template
Evaluate & refine
model data &
parameters
Visualize results
in Power BI
Months
/year
Orchestrate data
pipeline in Azure
Data Factory
Source: www.microsoft.com
9. Stay ahead of the curve with Cortana Intelligence Suite
Business
apps
Custom
apps
Sensors
and
devices
People
Automated
systems
Data Intelligence
Cortana Intelligence
Action
Apps
10. The IoT Ecosystem Around ML
Intelligence
Dashboards &
Visualizations
Information
Management
Big Data Stores Machine Learning
and Analytics
CortanaEvent Hubs
HDInsight
(Hadoop and
Spark)
Stream
Analytics
Data Intelligence Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Bot
Framework
SQL Data
WarehouseData Catalog
Data Lake
Analytics
Data Factory
Machine
Learning
Data Lake Store
Cognitive
Services
Power BI
Data
Sources
Apps
Sensors
and
devices
Data
15. Scope
Question
is sharp.
Data
measures
what they
care
about.
Data is
connected.
Data is
accurate.
A lot of
data.
The better the raw materials, the better the product.
E.g. Predict
whether
component X will
fail in the next Y
days; clear path
of action with
answer
E.g. Identifiers at
the level they are
predicting
E.g. Will be difficult
to predict failure
accurately with few
examples
E.g. Failures are
really failures,
human labels on
root causes; domain
knowledge
translated into
process
E.g. Machine
information linkable
to usage
information
16.
17.
18.
19.
20.
21. Data Sources
The failure history of a machine
or component within the
machine.
The repair history of a machine,
e.g. previous maintenance
records, components replaced,
maintenance activities
performed. Maintenance types.
The operation conditions of a
machine, e.g. data collected from
sensors.
FAILURE HISTORY REPAIR HISTORY MACHINECONDITIONS
The features of machine or
components, e.g. production
date, technical specifications.
Environmental features that may
influence a machine’s
performance, e.g. location,
temperature, other interactions.
The attributes of the operator
who uses the machine, e.g. driver.
MACHINE FEATURES OPERATING CONDITIONS OPERATORATTRIBUTES
22. Sample training data
~20k rows,
100 unique engine id
Sample testing data
~13k rows,
100 unique engine id
Sample ground truth data
100 rows
Please refer to following link of doc for Data description section
https://gallery.cortanaintelligence.com/Experiment/df7c518dcba
7407fb855377339d6589f
23. Classes
•Regression models: How many more cycles an in-
service engine will last before it fails?
•Binary classification: Is this engine going to fail within
w1 cycles?
•Multi-class classification: Is this engine going to fail
within the window [1, w0] cycles or to fail within the
window [w0+1, w1] cycles, or it will not fail within w1
cycles?
24. Feature Engineering
The process of creating features that provide better or
additional predictive power to the learning algorithm.
a1 a2 … a21 sd1 sd2 … sd21 RUL label1 label2
40+ engineered features
25. Data Labeling
How far ahead of
time the alert of
failure should trigger
before the actual
failure event.
31. Modelling Techniques
Predict failures within a future period of time
BINARY CLASSIFICATION
Predict failures with their causes within a future
time period.
Predict remaining useful life within ranges of
future periods
MULTICLASSCLASSIFICATION
Predict remaining useful life, the amount of time
before the next failure
REGRESSION
Identify change in normal trends to find
anomalies
ANOMALYDETECTION
33. Evaluation
• Time dependent split
• Train in the past, validate in the future
• Class imbalance
• A few failure events
• sampling, cost-sensitive learning
• Metrics
• Recall, Precision, F1
• Random Guess, Weighted Guess
34. “Most IoT data are not used currently…
the data that are used today are mostly for
anomaly detection and control, not
optimization and prediction, which
provide the greatest value.”1
35.
36. Acknowledgements
• We utilized the following publically available data to help us generate realistic data for
the demo shown. We received assistance in creating this solution as a result of this
repository and the donators of the data:
“A. Saxena and K. Goebel (2008). "PHM08 Challenge Data Set", NASA Ames Prognostics
Data Repository (http://ti.arc.nasa.gov/project/prognostic-data-repository), NASA Ames
Research Center, Moffett Field, CA.”
• McKinskey Global Institute, The Internet of Things: Mapping the Value beyond the hype
• Microsoft Cortana Gallery Experiments
37. Learn and try yourself!
• Learn from Cortana Analytics Gallery
• Solution package material – deploy by hand to learn here
• Try Cortana Analytics Solution Template – Predictive
Maintenance for Aerospace in private preview
• Try Azure IOT pre-configured solution for Predictive
Maintenance
• Read the Predictive Maintenance Playbook for more details
on how to approach these problems
• Run the Modelling Guide R Notebook for a DS walk-
through
38. • Contact us for 1 free consultation: giuseppe@valueamplify.com
• Twitter: @giuseppeHighTec
• Linkedin: www.linkedin.com/in/giuseppemascarella
Editor's Notes
Designing with artificial intelligence
The secret to getting people to engage with products and services is to make interaction as simple as possible. Remove friction and people will embrace your product. But simplicity isn’t the same as minimalism.
The secret to getting people to engage with products and services is to make interaction as simple as possible. Remove friction and people will embrace your product. But simplicity isn’t the same as minimalism. For IoT devices, the interface may be as minimal as a few LEDs and a touchpad—and that kind of minimalism can feel obscure and confusing to users. What’s more, IoT devices often need to operate in concert to create delightful services, such as coordinating the levels of light and sound in a room. This simply increases complexity. Unless we come up with new ideas, the world is about to feel terribly broken.
That’s why interfaces and services increasingly rely on artificial intelligence technologies. Algorithms make sense of contextual data, anticipate user needs, and accept more natural forms of input, like voice commands. Keeping the interface simple means the device has to become more intelligent.
AI isn’t magic—it’s engineering. To develop compelling products, designers and product managers need to understand the constraints and possibilities of AI. They also need to develop new ways of working together so that the resulting products and services feel more… human.
This session looks at how algorithms work, examines what they can and can’t do, and explores case studies and examples of how product teams have combined a deep understanding of people with clever design and smart algorithms to produce truly wonderful products.
Decisions of what data to keep, ignore, and what to forward to a centralized authority will be required. Many of the kinetic devices will be used and application whose action can neither tolerate long latency nor risk the possibility that the connection with the centralized authority (“the cloud”) is not available. Their decisions must be made instantly with local information and knowledge. Most IoT endpoints will be limited in capabilities due to size, cost, and the power requirements and will need companion computing that is either embedded in the larger system or in a companion gateway. These gateways will primarily bridge between the local device communication domains and higher level network domains and will in most cases make behavioral decisions. As the industry matures, these gateways will also be responsible for allowing data to be exchanged between intended devices, and ensuring the information is protected. Network traffic patterns will be significantly impacted as more device-to-endpoint traffic will occur and more machine-to-machine communication will materialize, shifting from today’s patterns. However, these solutions will not be static, and their evolving behavior will need to vary depending on local characteristics, giving rise to more software-defined functions at both the edge and within the datacenter. Further, their numbers will be vast and their operation cannot require human intervention.
The input data consists of "train_FD001.txt", "test_FD001.txt", and "RUL_FD001.txt" in the original data source [1].
The training data ("train_FD001.txt") consists of multiple multivariate time series with "cycle" as the time unit, together with 21 sensor readings for each cycle. Each time series can be assumed as being generated from a different engine of the same type. Each engine is assumed to start with different degrees of initial wear and manufacturing variation, and this information is unknown to the user. In this simulated data, the engine is assumed to be operating normally at the start of each time series. It starts to degrade at some point during the series of the operating cycles. The degradation progresses and grows in magnitude. When a predefined threshold is reached, then the engine is considered unsafe for further operation. In other words, the last cycle in each time series can be considered as the failure point of the corresponding engine. Taking the sample training data shown in the following table as an example, the engine with id=1 fails at cycle 192, and engine with id=2 fails at cycle 287.
The testing data ("test_FD001.txt") has the same data schema as the training data. The only difference is that the data does not indicate when the failure occurs (in other words, the last time period does NOT represent the failure point). Taking the sample testing data shown in the following table as an example, the engine with id=1 runs from cycle 1 through cycle 31. It is not shown how many more cycles this engine can last before it fails.
The ground truth data ("RUL_FD001.txt") provides the number of remaining working cycles for the engines in the testing data. Taking the sample ground truth data shown in the following table as an example, the engine with id=1 in the testing data can run another 112 cycles before it fails.
Selected raw features The raw features are those that are included in the original input data. In order to decide which raw features should be included in the training data, both the detailed data field description and domain knowledge is helpful. In this template, all the sensor measurements (s1-s21) are included in the training data. Other raw features get used are: cycle, setting1-setting3.
Aggregate features These features summarize the historical activity of each asset. In the template, two types of aggregate features are created for each of the 21 sensors. The description of these features are shown below.
a1-a21: the moving average of sensor values in the most w recent cycles
sd1-sd21: the standard deviation of sensor values in the most w recent cycles