1. Data Analytics in Manufacturing
Gian Antonio Susto
Statwolf LTD
gianantonio.susto@statwolf.com
1
2. Outline
1. The Data Analytics Environment
2. Principles of Manufacturing Informatics
3. Machine (Statistical) Learning
4. Machine Learning in Manufacturing
a) Virtual Metrology
b) Root Cause Analysis
c) Predictive Maintenance
d) Fault Detection
2
4. The (Big) Data Era
• Data Explosion
– Increased storage capability (Moore’s Law)
– Internet of Things
Gartner: 26 billion of IoT object by 2020
4
Techradar.com
Data aggregated by Gongos Research
7. We are drowning in information and starving for
knowledge
- Rutherford D. Rogers
• Insights/learning
• Predictions
• Decision making suggestions
• ...
7
The (Big) Data Deluge
8. Statistical
Learning
Software
Engineering
Data Analytics
Finance Manufacturing
Biology Robotics
...
8
Data Analytics: an Interdisciplinary Field
10. (Big) Data in Manufacturing
10
• Manufacturing companies record enormous amount of process
data
• Example [1] - Consumer Package Goods company that produces
a personal care product generates:
[1] The rise of Industrial Big Data
- General Electrics
11. (Big) Data in Manufacturing
11
• ‘Leveraging big data is imperative as information is at the heart of competition
and growth for industrial businesses. Data-driven strategies based on real-time
and historical process information will help companies optimize performance’ [1]
• Possible improvements:
- Proving quality to trading partner/costumer
- Maximizing yield
- Reduce downtime
- Recovering capacity
[1] The rise of Industrial Big Data
- General Electrics
12. The Manufacturing Data Analysis Process
12
- Conversion
- Parsing
- Aggregation
- Alignment
Problem Collection Cleaning
Modelling Roll-out
- Definition
- Expected
Impact
- Evaluation
metric
- Quality
- Reconciliation
- Missing data
handling
- Denoising
- Outlier detection
- On-line
implementation
- Business
outcome
- Improvement
- Feature
Extraction
- Building
- Evaluation/
Comparison
13. The Manufacturing Data Analysis Process
13
- Conversion
- Parsing
- Aggregation
- Alignment
Problem Collection Cleaning
Modelling Roll-out
- Definition
- Expected
Impact
- Evaluation
metric
- Quality
- Reconciliation
- Missing data
handling
- Denoising
- Outlier detection
- Feature
Extraction
- Building
- Evaluation/
Comparison
Machine Learning
modeling based on historical
dataset Z of
- n observations
(samples)
- p variables (features)
- On-line
implementation
- Business
outcome
- Improvement
15. Machine Learning Problems
• Two classes of modeling problem depending on
the type of data
– Supervised if labeled data (Z = [X Y] - X input, Y output)
– Unsupervised if un-labeled data (Z = X)
15
Modeling
Problem
Supervised
Regression
Classification
Unsupervised
16. Machine Learning Problems
• Two categories in case of supervised learning,
depending on the output type
– Regression if Y is continuous
– Classification if Y is discrete/categorical
16
Modeling
Problem
Supervised
Regression
Classification
Unsupervised
17. Supervised Learning: a Regression example
• Example: house pricing for real estate market [2]
• Historical dataset of n house transactions with
information regarding
– House price (output - Y)
– Land square footage (input - X)
– Living square feet (input - X)
– Effective year built (input - X)
– Mailing address (input - X)
[2] Machine Learning and the Spatial Structure of House Prices and
Housing Returns – A. Caplin et al.
17
18. Supervised Learning: a Classification example
• Example: Shazam
• A ‘digital fingerprint’ (X) is extracted from a
song sample and compared with a database
of 11 million songs (classes - Y)
Tip 1 - Defining good features is generally half of the battle
19. Unsupervised Learning
• Unlabeled data: quest for
hidden structure in the data
– Market Basket/Affinity
Analysis
• Pattern in the purchases: what is
bought together?
• Amazon 2009 revenue $24.5B,
$5B from recommended
products
– Clustering
• Grouping of a set of ‘similar’
object
• ‘You may also like’
21. Manufacturing Data Analytics Example
Four Examples of Manufacturing data analytics
problems:
A. Regression – Virtual Metrology (Semiconductor)
B. Regression – Root Cause Analysis (Pharmaceutical)
C. Classification – Predictive Maintenance
(Semiconductor)
D. Unsupervised Learning – Fault/Novelty Detection
(Semiconductor/HVAC)
21
22. [A] Regression – Virtual Metrology (VM)
22
• Semiconductor Manufacturing
• Production based on wafers
• Organization in lots (25 wafers)
• Hundreds (thousands!) of processes:
- Etching
- Lithography
- Chemical Vapor Deposition (CVD)
- ...
• Goodness of a process assessed by measuring one or more parameters (Y)
on the wafer (for CVD the thickness of the deposited layer)
• Unfortunately, measuring is costly and time-consuming
23. [A] Regression – Virtual Metrology (VM)
23
Wafer with metrology data Wafer without metrology data
• Common practice to save money/time: measuring just 1 wafer on a lot
• Drawbacks:
- Delays in detecting drifts in production
- No quality check for unmeasured wafers
- Update of the eventual controller just once on 25 process iterations
24. [A] Regression – Virtual Metrology (VM)
24
• Tool data X available for every iteration
(temperatures, pressures, flows, …)
• Exploit tool/logistic/production data to
estimate Y
• Each wafer has now at least an
estimation for quality/control purposes
X
i.e. From Lot-to-Lot to
Run-to-Run control [3]
[3] ‘Virtual Metrology and Feedback Control for
Semiconductor Manufacturing Processes using Recursive
Partial Least Squares’ - Journal of Process Control, Khan,
Moyne and Tilbury
25. [A] Regression – Virtual Metrology (VM)
25
• Modeling difficulties
1. Data fragmentation: several multiple-chambers
machines, multiple
products/recipes
2. High-dimensionality: thousands of variables
3. ‘Skinny problem’ (p >> n): numerical
problems for model estimation
Example Prediction of thickness for CVD: tool
with 3 chambers with 2 sub-chambers
- Exploiting Clustering for subset modeling
Tip 2 – ‘Visualize’/Examine data before
modeling
26. Dealing with high-dimensionality: Regularization
methods
26
• Not all the regression techniques are suitable for high-dimensional problems
• Simplest approach: Least Square Regression
• Objective: minimization of the prediction error on the training data
• OLS solutions with high-dimensional dataset are often ill-conditioned: the
predicted output can change drastically with small perturbations of the input
causing poor prediction performance
27. Dealing with high-dimensionality: Regularization
methods
27
• Regularization methods overcome the issue
• Ridge Regression (RR) [L2]: stable (“easier”) solutions are
encouraged by penalizing coefficients (ill-posed problems or
over-fitting issues are generally resolved)
• Least Absolute Shrinkage and Selection Operator (LASSO) [L1]:
28. Dealing with high-dimensionality: Regularization
methods
28
• A penalty on model complexity generally enhance performances
• Different behaviour: LASSO provides sparse results!
• Ie. Diabetes data: p = 10, n = 367 [4]
• Sparsity provides interpretable models
Essentially, all models are wrong,
but some are useful
- George E.P. Box
[4] ‘The Elements of Statistical Learning:
Data Mining, Inference, and Prediction’ –
Hastie, Tibshirani, Friedman 2009
29. Regularization methods: guidelines
29
• RR & LASSO: no a-priori guarantee on best prediction accuracy (cross-validation
always a necessary step to evaluate results generality)
• LASSO is generally outperformed by RR when:
– p > n
– if there are high correlations between predictors
• Elastic Nets combined the 2 techniques
• Kernel Methods
– non-linear solutions
embedded in a linear framework
(augmented space)
From Chris Thornton, U. Sussex
30. Non-linear Regression: Neural Networks (NNs)
30
• NNs mimic the structure of the brain and how it learns from experience
• Example architecture:
Variables are associated with
nodes and functions with arches
x
a(x)
y
S
b
x
u1
u2
un
w1
w2
wn
31. Non-linear Regression: Neural Networks (NNs)
31
• PROS:
- Great prediction accuracy
- Flexibility in modeling non-linearities
• CONS:
- Time consuming tuning
- Not suitable for high-dimensional problems
• In case of high-dimensionality, 2 steps procedure applied:
1. Dimensionality reduction (correlation, PCA, etc… )
2. Modeling
Tip 3 - The choice between linear vs non-linear approaches should be
tailored to the problem at hand
32. [B] Regression – Root Cause Analysis (RCA)
32
• Pharmaceutical
Manufacturing
• Slow-Release (Time
Release) technologies:
capsules that dissolve over
time for a controlled
release of drug into the
bloodstream
• Dissolution profiles (y1,2,3,4) over different time intervals (T1, T2, T3, T4)
are required to fall within intervals
• Variability in the production: where does it come from? Root Cause
Analysis
33. [B] Regression – Root Cause Analysis (RCA)
x0 y1,2,3,4
• Several production steps and can
be influenced by many factors (e.g.
raw materials quality)
• All the available data sources are
exploited for modeling the
dissolution curves (y1,2,3,4)
• Modeling with sparse approaches
to pinpoint most influential
parameter for variability
33
Process #1 Process #2 Process #3 Process #4
x1 x2 x3 x4
16
14
12
10
8
6
4
2
0
RCA
X1 X2 X3 X4
34. • With data analytics sophisticated approaches maintenances
handling
• 3 groups of approaches in manufacturing for dealing with
maintenances:
34
[C] Classification – Predictive Maintenance
(PdM)
R2F PvM PdM
1. Run-to-Failure (R2F)
• Repairs or restore actions
performed only after the
occurrence of a failure
• ‘If it’s not broken don’t fix it’
35. • With data analytics sophisticated approaches maintenances
handling
• 3 groups of approaches in manufacturing for dealing with
maintenances:
35
[C] Classification – Predictive Maintenance
(PdM)
R2F PvM PdM
2. Preventive Maintenance (PvM)
• Planned schedule of maintenances
with the aim of anticipating
failures
• Failures generally warded off
• Unnecessary maintenances
performed
36. • With data analytics sophisticated approaches maintenances
handling
• 3 groups of approaches in manufacturing for dealing with
maintenances:
36
[C] Classification – Predictive Maintenance
(PdM)
R2F PvM PdM
3. Predictive Maintenance (PdM)
• Maintenance actions based on
suggestion provided by a data
analytics module
• PdM module based on data
available on the tool/production
37. [C] Classification – Predictive Maintenance
(PdM)
37
• Semiconductor
Manufacturing
• Forecast of integral type
faults (caused by machine
usage)
• Use case: breaking of
tungsten filament in ion-implanters
• Goal: define an indicator (y) – health factor – of the current component
status from process parameters (X)
38. [C] Classification – Predictive Maintenance
(PdM)
38
• Health factor indicator is a quantitave index, however we treat this as a
Classification problem
• Observations divided into:
o ‘Non-Faulty’ (data of process iterations with working component)
o ‘Faulty’ (data of process iterations with broken component)
• Use of Support Vector Machines: the distance from the decision
boundary is exploited as ‘distance to fail’
Decision
boundary
Adapted from [4]
39. [C] Classification – Predictive Maintenance
(PdM)
39
• Health factor indicator is a quantitave index, however we treat this as a
Classification problem
• Observations divided into:
o ‘Non-Faulty’ (data of process iterations with working component)
o ‘Faulty’ (data of process iterations with broken component)
• Use of Support Vector Machines: the distance from the decision
boundary is exploited as ‘distance to fail’
Adapted from [4]
40. [C] Classification – Predictive Maintenance
(PdM)
40
• Health factor indicator is a quantitave index, however we treat this as a
Classification problem
• Observations divided into:
o ‘Non-Faulty’ (data of process iterations with working component)
o ‘Faulty’ (data of process iterations with broken component)
• Use of Support Vector Machines: the distance from the decision
boundary is exploited as ‘distance to fail’
Adapted from [4]
41. [C] Classification – Predictive Maintenance
(PdM)
41
• Health factor indicator is a quantitave index, however we treat this as a
Classification problem
• Observations divided into:
o ‘Non-Faulty’ (data of process iterations with working component)
o ‘Faulty’ (data of process iterations with broken component)
• Use of Support Vector Machines: the distance from the decision
boundary is exploited as ‘distance to fail’
Adapted from [4]
42. [C] Classification – Predictive Maintenance
(PdM)
42
• Health factor indicator is a quantitave index, however we treat this as a
Classification problem
• Observations divided into:
o ‘Non-Faulty’ (data of process iterations with working component)
o ‘Faulty’ (data of process iterations with broken component)
• Use of Support Vector Machines: the distance from the decision
boundary is exploited as ‘distance to fail’
Adapted from [4]
44. [C] Classification – Predictive Maintenance
(PdM)
44
• Minimization of the overall costs
• Support Decision System:
from process data and production/maintenances
costs, the PdM module suggests when actions should
be taken to minimize costs
45. [D] Unsupervised Learning – Fault Detection
45
• Two classes of failures related problem
1) Prediction (breakings in the future)
2) Detection (already happened breaking)
• With thousands of variables the
detection of a breaking is not
always a trivial task
• Univariate monitoring can be
measleading
Tip 4 - Multivariate systems need
multivariate approaches
46. [D] Unsupervised Learning – Fault Detection
46
• Employment
1. Issue recognized by the system
2. Drill-down of the ‘guilty’ parameter/s
3. Original data inspection
47. Data Analytics in Manufacturing
Gian Antonio Susto
Statwolf LTD
gianantonio.susto@statwolf.com
47