The document discusses autocorrelation and cross correlation analysis of time series data. It provides an example of measuring daily body weight over 4 weeks and finds autocorrelation at a lag of 1 day. This indicates dependence between successive daily measurements. The document also analyzes viscosity measurements taken hourly and finds autocorrelation up to a lag of 4 hours. An autoregressive model is fitted to account for this autocorrelation. Finally, the document examines cross correlation between methane feed rate and CO2 concentration measurements taken minute-by-minute. The largest correlation is found at a lag of -1 minute, suggesting the CO2 is affected by methane feed rate from the previous minute.
Water Industry Process Automation & Control Monthly - April 2024
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Autocorrelation and Cross Correlation
1. Page 1/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
Week 4
Page 2/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
If we analyze processes we use samples in accordance to an in
advance prepared sampling plan. These samples were mostly taken in
time intervals.
For a clear analysis, the assumption is that the observed values are
independent from each other. A deviation from this assumption can
cause misleading conclusion which results in the initiation of wrong
actions.
The statistic offers the possibility to determine the degree of relations
within a set of data. The evaluation method calls autocorrelation. If a
significant autocorrelation is detected, than it can be included in the
mathematical model. Therefore the effect of it for the data set can
calculated and corrected.
Introduction
2. Page 3/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
Autocorrelation is a value of correlation between two
observation in a time series.
The probability of an active autocorrelation is larger, the smaller the
time difference between the two samples is.
Disturbances due to noise variables occurs over a time period, which may
longer than the sample interval. In addition, many technical processes tend
to level out slowly after changes or disturbances. In this case, the
probability for a correlation of observations over one hour is higher than
over ten hours.
If observation were made in short time periods, so that they correlate
strongly with each other, they don’t deliver independent information.
Autocorrelation is a good tool for this evaluation.
Introduction
Page 4/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
!
"#
$%&#
&%&#
File: Autocorrelation1.mtw
Example: Daily Body Weight
We have measured the
weight daily over a
time frame of 4 weeks.
The data are normal
distributed.
The analysis shows an
unpleasant trend.
Is that everything?
Is there a relation within the
values?
We check this with
Autocorrelation.
Week Day Weight
1 1 77,2
1 2 77,1
1 3 76,9
1 4 76,8
1 5 77,1
1 6 77,2
1 7 77,3
2 1 77,4
2 2 77,3
2 3 77,1
2 4 77,2
2 5 77,4
2 6 77,4
2 7 77,6
3 1 77,6
3 2 77,5
3 3 77,3
3 4 77,4
3 5 77,5
3 6 77,5
3 7 77,8
4 1 77,9
4 2 77,8
4 3 77,6
4 4 77,7
4 5 77,6
4 6 77,8
4 7 77,9
' (
'
) ( * + , -
.
/0
1
+
'
) /2 (
3
' (
( 3
' 4
% ( 5 6 '
% ( 5 6 ' (
% ( 5 6 / * 6
7 .
'
/ * 6
3. Page 5/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
The graphic displays the correlation coefficients over 7 time series lags combined with a
confidence interval of 95 %. It shows a large positive, significant spike at lag 1 with a
subsequent positive autocorrelation.
Example: Daily Body Weight
Stat
>Time Series…
>Autocorrelation…
Lag ACF T LBQ
1 0,805006 4,26 20,16
2 0,581408 2,03 31,08
3 0,428218 1,31 37,24
4 0,301430 0,87 40,42
5 0,304455 0,86 43,81
6 0,363586 1,00 48,86
7 0,328108 0,87 53,16
Page 6/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
Final Estimates of Parameters
Type Coef SE Coef T P
AR 1 0,9155 0,1042 8,79 0,000
Constant 6,54385 0,02951 221,73 0,000
Mean 77,4802 0,3494
In the second step we determine the
autoregressive model (ARIMA) via
autocorrelation. With this option we can
calculate uninfluenced theoretical values
and perform the residual diagnostic. If the
residual are normal distributed and
without a trend then we can accept the
measurement values.
Example: Daily Body Weight
Stat
>Time Series…
>ARIMA…
4. Page 7/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
Example: Daily Body Weight
!
"
!
!
#
!
$ " " ! !
% ! ! &
! "
Page 8/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
Below you see the evaluation of a viscosity measurement of a chemical process in a
one hour frequency. File: Autocorrelation2.mtw. Please note that number of
observation outside of the +/-3Sigma control limits, indicates that the process is not in
control! For viscosity we can assume no sudden changes, that means that these
observation are auto correlated! Lets evaluate this.
Example: Viscosity Measurement
' (
'
) ( * + , -
.
/0
1
+
'
) /2 (
3
' (
( 3
' 4
% ( 5 6 '
% ( 5 6 ' (
% ( 5 6 / * 6
7 .
'
/ * 6
!
"#
$%&#
&%&#
5. Page 9/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
Stat
>Time Series…
>Autocorrelation…
Minitab generates an autocorrelation at each time lag of 1 from the original measurement
data set. The significance of the correlation is shown by the coefficient and the T value.
The graphic display the 95% confidence interval as an additional information. In this
case we have a significant autocorrelation up to approximately 4 lags.
Example: Viscosity Measurement
Lag ACF T LBQ
1 0,820897 8,21 69,43
2 0,699538 4,57 120,36
3 0,601390 3,30 158,39
4 0,489573 2,43 183,86
5 0,410502 1,93 201,95
6 0,366249 1,66 216,51
7 0,349833 1,54 229,93
8 0,352668 1,52 243,72
Page 10/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
'
/
8 /2
8 /2 (9
"
6 # : %
Stat
>Time Series…
>Lag…
Description with Fitted Line Plot
Stat
>Regression…
>Fitted Line Plot…
Therefore generate a new
column with a lag of 1 hour
versus the original
observations.
Please note the strong
positive correlation with
coefficient of 0,8.
Similar diagrams can be
generated for the lags 2, 3,
etc. in order to display the
correlation.
6. Page 11/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
We notice , that the observation with a 1 hour lag are correlated with correlation
coefficient of r1=0,82 and the observations with a 2 hour lag with a correlation
coefficient of r2=0,70.
For the 3 and 4 hour lags we have a coefficient of r3=0,60 and r4=0,49.
Because all these observation falls outside the +/- 2 sigma limits (red lines) we can say
that these delayed autocorrelation are statistically significant.
A regressive decrease of the correlation can be observed. This pattern is typically of an
autoregressive process.
For the clarification of the correlation between the 1 hour lag observations lets show
the viscosity at the time t-1, call that Yt-1, versus the viscosity at the time t, call that Yt .
We receive the best model, if we use the moving average of all changes versus the
previous values. We can have 1st order or higher order models. In this case we have
1st order model for autoregressive processes (AR 1):
t1tt
YcY ε+ϕ+= −
C is a constant which results out of the average change of the moving average and
εεεε is the residual error, which we call white noise.
The Autoregressive Model
Page 12/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
ARIMA = Autoregressive Integrated Moving Average
Final Estimates of Parameters
Type Coef SE Coef T P
AR 1 0,8467 0,0549 15,42 0,000
Constant 13,1219 0,3806 34,48 0,000
Mean 85,588 2,482
The fitted model:
Y = 13,12 + 0,847 Y t-1 +ε
The ARIMA Model in Minitab
Stat
>Time Series…
>ARIMA…
7. Page 13/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
!
"
!
!
#
!
$ " " ! !
% ! ! &
! "
The residual are independent and normal distributed which
indicates that the model is valid.
The Residual Analysis
The fitted model:
Y = 13,12 + 0,847 Y t-1 +ε
Page 14/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
Notice the process does not show any evidence of being out of
control once the auto correlative structure in the data due to
dependency among successive observations is modeled.
Such dependency is often due to the slowness of the process to
change relative to the sampling frequency. That is, the sampling
interval is much shorter than the time constant for the process
dynamics.
Here is an opportunity to reduce the sampling rate without
compromising the control of the process. This may have little
importance if there is an on-line viscometer or an APC system
where data gathering is cheap and control is automatic. But it
could be a major saving if viscosity is read in the lab and over
correction (i.e. false alarms) is done leading to greater process
variation
Interpretation of the Results
8. Page 15/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
• Observations taken in time series are frequently auto correlated
and hence are not independently distributed
• In this case, assumptions for using conventional control charts are
violate
• This can impact the number of false out-of-control signals if the
process is being monitored via an SPC chart
• The autocorrelation can be modeled and the control chart can be
applied to the residuals to more correctly identify out-of-control
situations
• In practice, if the process is under manual control rather than APC,
it is often better to alter the sampling plan so the sampling interval
exceeds the time constant
• This saves the costs of sampling and the costs of introducing
more variation caused by unnecessary process interference.
Results & Learning's
Page 16/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
This is the correlation between two variables, e.g. one input like
xt at the time t and a output like yt+k observed at time t+k. In this
case the lag is k – times between the observations.
Example:
In multi-step chemical process where 5 hours separates when
the reactor temperature is changed and when the yield is
impacted and measured, the correlation is strongest when yield
lags temperature by 5 hours
Time Lagged Cross Correlation
9. Page 17/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
Time Lagged Cross Correlation
• Some reasons why we might need to know the time
lagged relationships between KPIV’s and one or
more KPOV’s
– Analyze multi-vari type data
• see what input and noise variables show some relationship
with output variables in order to better control the process
• allow for the possibility of a time difference or lag between
change in one versus the effect on the other
– To better know when a change in an input variable will
impact the output variable
– To identify and develop a transfer function model that can
be used for Automatic Process Control (APC)
Page 18/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
In a chemical process the CO2 content shall be controlled at a level of
47-53% with the methane feed rate. An experiment will be conducted to
determine the control parameter. After the evaluation, it seams that
additional other factors effect the CO2 content. The data have been
colleted in minute frequency. File Cross Correlation.mtw
Only 31% of the variation explained??
Example: Time Lagged Cross Correlation
Methane Feed CO2 Conc.
0,37 53,4
-0,18 52
-1,302 54,9
0,435 55,7
0,987 51,6
1,866 49,2
0,79 47,5
0,645 51,1
2,812 50
1,239 46
0,535 47,9
1,019 50,6
1,223 50,1
0,255 49,2
(
')
/
8 /2
8 /2 (9
"
%; % < # ' = (
10. Page 19/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
1 0 2 0 3 0
4 5
5 0
5 5
6 0
I n d e x
CO2Konz.
1 0 2 0 3 0
- 3
- 2
- 1
0
1
2
3
I n d e x
MethanZufuhr
The time series plot looks like expected:
Negative correlations result in mirror images.
A detailed look discovers that the plots are shifted by 1!
Example: Time Lagged Cross Correlation
MethaneFeedRateCO2Concentration
Page 20/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
Minitab shows the correlation
coefficient for every lag. The
lag of –1min has the most
significant effect.
Stat
>Time Series…
>Cross Correlation…
Cross Correlation Function: CO2 Conc.; Methane Feed
CCF - correlates CO2 Conc.(t) and Methane Feed(t+k)
-1,0 -0,8 -0,6 -0,4 -0,2 0,0 0,2 0,4 0,6 0,8 1,0
+----+----+----+----+----+----+----+----+----+----+
-15 0,109 XXXX
-14 0,081 XXX
-13 0,164 XXXXX
-12 0,362 XXXXXXXXXX
-11 0,261 XXXXXXXX
-10 0,133 XXXX
-9 -0,004 X
-8 0,006 X
-7 0,044 XX
-6 0,060 XXX
-5 0,049 XX
-4 -0,018 X
-3 -0,215 XXXXXX
-2 -0,527 XXXXXXXXXXXXXX
-1 -0,965 XXXXXXXXXXXXXXXXXXXXXXXXX
0 -0,556 XXXXXXXXXXXXXXX
1 -0,185 XXXXXX
2 -0,068 XXX
3 0,094 XXX
4 0,087 XXX
5 0,047 XX
6 -0,014 X
7 -0,049 XX
8 0,142 XXXXX
9 0,282 XXXXXXXX
10 0,382 XXXXXXXXXXX
11 0,198 XXXXXX
12 0,072 XXX
13 0,113 XXXX
14 0,053 XX
15 0,067 XXX
Cross Correlation in Minitab
11. Page 21/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
*
')
/
8 /2
8 /2 (9
"
%; % < # %
Generate a new column with a
lag of 1 min.
Now we receive the expected
good correlation between
input and output.
Creation of the Time Lagged Model
Stat
>Time Series…
>Lag…
Stat
>Regression…
>Fitted Line Plot…
Page 22/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
!
"
!
!
#
!
$ " " ! !
% ! ! &
! " ' )
The residuals are normal distributed without
trends. Therefore the model can be used.
In which area must the
methane feed rate be
controlled, if the CO2
portion shall be kept
between 47-53%?
Doe you think that
there are other items
which should be
considered?
The Residual Analysis
Stat
>Regression
>Fitted Line Plot…
>Graphs
>Four in one
12. Page 23/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
The values are independent from each other?
Also, if we control the process inputs in the future with more tide
limits, we have to watch always the results.
The Values of the CO2 Content
' (
'
) ( * + , -
.
/0
1
+
'
) /2 (
3
' (
( 3
' 4
% ( 5 6 '
% ( 5 6 ' (
% ( 5 6 / * 6
7 .
'
/ * 6
' )
!
"#
$%&#
&%&#
' )
Page 24/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
Final Estimates of Parameters
Type Coef SE Coef T P
AR 1 0,6409 0,1450 4,42 0,000
Constant 18,8337 0,4761 39,55 0,000
Mean 52,448 1,326
Autocorrelation Function: CO2
Conc.
Lag ACF T LBQ
1 0,638884 3,50 13,51
2 0,258750 1,05 15,81
3 0,048163 0,19 15,89
4 -0,060800 -0,24 16,03
5 -0,072229 -0,28 16,23
6 -0,048721 -0,19 16,32
7 -0,011212 -0,04 16,33
8 0,000434 0,00 16,33
The Check for Autocorrelation
' )
Lag 1 shows a significant autocorrelation. This indicates the next step.
We also generate here the autoregressive model and analyze the residuals.
Stat
>Time Series…
>ARIMA…
13. Page 25/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
!
"
!!
#
!
$ " " ! !
% ! ! &
! " ' )
Without trend & normal distributed... The AR 1 model fits.
This short evaluation can be important for the future check frequency or
for the display of the CO2 content.
The Residual Analysis
P-Value = 0,934
Page 26/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
• Often important correlations are missed between
variables because the time lag is not considered
• Time lagged cross correlation analysis is a valuable
tool to use with multi-vari data to discover and
quantify relationships
• Cross correlation studies are an important step in
developing control plans, as well as controller or
transfer function relationships for APC
Results & Learning's
14. Page 27/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
Appendix for Time Series Analysis
•Autocorrelation
•Time Lagged Cross Correlation
Page 28/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
• Use basic data over 5 – 10 days with more than 25 observation to
gather the typically process variation.
• Display the effect or the KPOV with an original data control chart,
check for observation out of the “3 sigma limits” and “9 points above
or below the center line”.
• Create an autocorrelations diagram: Stat>Time Series
>Autocorrelation.
• If the autocorrelation decreases exponential with some significant
values at the initial lags, we assume that the data auto correlate and
that we can express it with a 1st order autoregressive model
• Create an 1st order autoregressive model: Stat>Time Series>ARIMA
(Autoregressive Integrated Moving Average) and use the original KPOV
for "series" and enter "1" at the menu "autoregressive". Store the
residuals and fits.
Steps for the check of Autocorrelation
and Creation of a Control Chart
15. Page 29/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
• Analyze the residual with the residual plots, check if the data are
normal distributed and randomly distributed. No or little pattern shall
be noticeable. (This is a check if the 1st order autocorrelation model is
adequate for the description of the auto correlative structure.)
• Create now a control chart for the residuals. If the “out of control”
signals (test 1 and 2) are differs significantly from the original data, use
the residual for the output variable on the control chart.
• An alternative is the extension of the sampling interval. A rough rule is,
to use a new interval in a manual SPC system (that mean not in an APC
system) so that the first lagged autocorrelation is < 0,5. In the viscosity
example we could change the sampling interval from 1 hour to about 4 -
5 hours. For some reason it could be necessary to keep the existing
interval, e.g. if another critical KPOV has to be controlled often and
accurate.
Steps for the check of Autocorrelation
and Creation of a Control Chart
Page 30/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
• If the autocorrelation looks like a sinus wave, than we can use the 2nd
order autocorrelation model. Enter “2” in the menu "autoregressive” at
ARIMA. (Another possibility to check for the 2nd order auto regression
is the use of the partial autocorrelation function: Stat>Time
Series>Partial Autocorrelation. (If 2 significant spikes occur we have a
2nd order model, if only 1 spike occurs it is a 1st order model). Analyze
the residual (e) with a control chart and compare it with the chart of the
original measurements (yt).
• Decide which output variable (original measurements or residuals) and
which sample interval you want to use for your SPC plan.
Steps for the check of Autocorrelation
and Creation of a Control Chart
16. Page 31/3111b BB W4 Auto & Cross Correlation, 04, D. Szemkus/H. Winkler
• Montgomery, ‘Intro to SQC”, John Wiley and Sons, 3rd Edition,
1996, Seite 374-398.
• Box, Hunter und Hunter, “Statistics for Experimenters”, John Wiley
and Sons, 1978, Kapitel 18.
• Box, Jenkins und Reinsel, “Time Series Analysis, Forecasting and
Control”, Prentice Hall, 3rd Edition, 1994.
Note: There is a whole family of autoregressive integrated moving average
(ARIMA) time series models that are discussed in the above books. In this
module we only covered in detail the simplest situations. Those seriously
interested in the subject should read / refer to the above texts.
Literature