This document discusses techniques for scalable conformance checking of business process models against event logs. It presents challenges with existing approaches related to scalability for large logs. The research aims to improve scalability while still providing a complete set of differences between the model and log. The approach compresses the model and log into Deterministic Finite Automata and a State Space Partitioning, then uses these compressed structures to efficiently compute optimal alignments and behavioral differences. An evaluation on real-world and artificial datasets demonstrates the approach outperforms traditional trace alignments in scalability for large logs.
Scalable Conformance Checking of Business Processes
1. Scalable Conformance Checking
for Business Processes
Daniel Reißner, Raffaele Conforti, Marlon Dumas, Marcello La Rosa,
Abel Armas-Cervantes
1
2. Process mining
Process mining is a family of methods for analyzing business processes
based on event logs.
• Some of the most important process mining operations:
• Discovery
• Conformance checking
• Enhancement
2
3. Process mining
Process mining is a family of methods for analyzing business processes
based on event logs.
• Some of the most important process mining operations:
• Discovery
• Conformance checking
• Enhancement
Model v1
Log
3
4. Applications of conformance checking
Compliance
auditing
Model quality
measures
Model repair Deviance mining
Conformance
checking
How well do process
executions fit to a
normative model?
What is the quality
of a discovered
process model?
How can we
adapt the process
model to fit
reality better?
What are the
current compliance
risks?
Are there any employee
innovations?
➢ Fitness,
Precision etc.
4
5. id trace
(1) C, B, D, F, E
(2) ⟨ B, C, D, E, I, G, D, F ⟩
Trace Alignment (1): 1/2
Log
Model
compare
Event LogProcess model
FDBC E
BB
CC
DD
E
G
FF
I
Trace Alignment (1): 2/2
Log
Model
One optimal alignmentAll optimal alignments
≫
FDBC E
B
C
D
E
≫
Trace Alignments
• Mismatches are reported as task misalignments,
i.e. moves on model or moves on log ≫
• The one-optimal variant returns one model path
with a minimal number of misalignments
• Adopt interleaving semantics
• Build a synchronous net for each trace
• Use an 𝐴∗-Algorithm to find the closest trace
in the model for each trace in the log
Existing approaches:
Trace Alignments
5
• All optimal alignments aim at returning all possible
model-path with minimal number of misalignments
6. Existing approaches:
Behavioral Alignment
id trace
1 C, B, D, F, E
2 ⟨ B, C, D, E, I, G, D, F ⟩compare
Behavioral Alignment
• Adopts true concurrency semantics
• Translates model and log to prime event structures (PES)
• Uses an 𝐴∗
-Algorithm to find the closest run
in the model PES for each run in the log PES
6
Event LogProcess model
8. Existing approaches:
Behavioral Alignment
compare
PES of event LogPES of process model
Behavioral Alignment
B
C
D
E
F I G
B
C
D
F
E
E
GI D F
Behavioral Mismatch (𝟏):
In the Log, after ‘D’, ’F’ is substituted by ‘E’.
Behavioral Mismatch (𝟐):
In the Log, after ‘D’, ’F’ occurs before ‘E’, while
in the model they are mutually exclusive.
8
• Mismatches are gathered as event misalignments,
i.e. moves on model or moves on log ≫
• Mismatches of behavioral relations can be detected
• Differences can be reported as natural language
statements
9. Scalability challenges of current approaches
• Trace alignment does not scale up with large logs. In some cases trace alignment
is not capable of computing all optimal alignments
• Behavioral alignment is generally slower than trace alignment
• Scalability issues of the conformance checkers can affect other techniques, such
as model repair or process discovery, which rely on conformance checking to
justify the quality of their outputs
9
10. Research question and desiderata
RQ: How can we improve scalability of conformance checking techniques with large
and noisy event logs while still providing a complete set of differences?
Desiderata:
• Compute one- or all-optimal alignments
• Report the results of the conformance checking as trace alignments and
behavioral statements
10
11. Overview and general idea
Petri Net
compress
DAFSA
Reachability
Graph
PSP
Event Log
Optimal
Alignments
Difference
Statements
expand
compare
(1)
(2)
(3)
11
12. From event log to DAFSA
Trace N
⟨ 𝐵, 𝐷, 𝐸 ⟩ 5
⟨ 𝐵, 𝐷, 𝐹 ⟩ 10
⟨ 𝐶, 𝐵, 𝐷, 𝐸 ⟩ 15
⟨ 𝐶, 𝐵, 𝐷, 𝐹 ⟩ 5
Log
s 𝑛1 𝑛2 𝑓1
B D E
𝑛3
BC
𝑓2
F
𝑛4 𝑛5 𝑓3
D E
𝑓4
F
DAFSA
=
Prefixes
⟨ 𝐵, 𝐷 ⟩ , ⟨ 𝐶, 𝐵, 𝐷 ⟩
Suffixes
𝐷, 𝐹 , ⟨ 𝐷, 𝐸 ⟩
12
13. 𝜏
𝐵 𝐶
𝜏
𝐷
𝐹𝐸 𝐼
𝐺
Petri net
𝑝3
𝑝6
𝑝5 𝑝4
𝑝2
𝑝1
𝑝10 𝑝8
𝑝9𝑝7
𝜏
From process model to reachability graph
[𝑝1] [𝑝2, 𝑝3]
Process model
[𝑝5, 𝑝3]
Reachability graph
[𝑝2, 𝑝4]
[𝑝5, 𝑝4]
[𝑝6] [𝑝7]
[𝑝8][𝑝9]
[𝑝10]
τ B
I
G
ED
F
τ
C
C B
B
C
D
x
Why to remove 𝜏-transitions:How to
• Reduce state space for conformance checking
• Reduce uninterpretable conformance results for end user
• For each 𝜏 not targeting a final marking, insert a copy of each
outgoing arc of the target of 𝜏 and link it to the source,
• otherwise, use each incoming arc of its source
Removing unconnected markings
𝜏-less Reachability graph
F
τ
13
15. Patterns for conformance checking diagnosis
Unfitting behavior:
• Relation mismatch:
1. Causality-Concurrency
2. Conflict
• Event mismatch:
3. Task skipping
4. Task substitution
5. Unmatched repetition
6. Task relocation
7. Task insertion / absence
L. García-Bañuelos, N. R.T.P. van Beest , M. Dumas, and M. La Rosa, and W. Mertens: Complete and interpretable conformance checking
of business processes. IEEE Trans. Softw. Eng.: 2017
15
16. Pattern detection in the example
⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩
( 𝑝1 , 𝑠)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩⟨𝑟ℎ𝑖𝑑𝑒, 𝐶⟩
( 𝑝2, 𝑝4 , 𝑠)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩
( 𝑝5, 𝑝4 , 𝑛1)
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
( 𝑝5, 𝑝4 , 𝑛1)
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
⟨𝑟ℎ𝑖𝑑𝑒, 𝐶⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩
( 𝑝5, 𝑝3 , 𝑛1)
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩
( 𝑝2, 𝑝4 , 𝑛3)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐶⟩
( 𝑝5, 𝑝4 , 𝑛1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩
Behavioral alignment feedback:
• In the log, at the start of the trace, “C” is optional
• In the model, after “B”, “C” occurs before “D”
16
17. Evaluation setup
• Implemented approach in an open source java tool: ProConformance 2.0
(available from http://apromore.org/platform/tools)
• Tested the approach in three setups:
• Road traffic fines management process (RTFMP) ➢ publicly available model - log pair
• BPI Challenge Log 2013 (BPIC13) ➢ artificially generated process model
• SAP R/3 model collection (120 models) ➢ artificially created logs (2.5% → 10% noise)
• 480 model-log pairs
17
18. Evaluation results
18
Key findings:
• In the case of all-optimal, our technique outperforms
trace alignments by 1-2 orders of magnitude
• Trace alignments timed out in 207 / 480 SAP cases
(given a time bound)
• In the case of one-optimal, our technique performs
from 1.5 to nearly 40 times faster than trace alignment
• In BPIC13, one-optimal trace alignment outperforms
our technique
19. Evaluation results
19
Optimal alignments
(upper bound of 95% confidence interval)
All optimal
Dataset DAFSA Trace align. [#unfiltered]
RTFMP 467 338 [1,898,182]
BPIC13 cp. 28,656 22,259 [1,904,057]
SAP R/3 2.5% 4,253
(22,675)
1,233[1,067,533]
(6,470 [1,929,629])
SAP R/3 5% 7,672
(41,133)
1,751[1,224,079]
(9,178 [2,199,248])
SAP R/3 7.5% 11,652
(61,504)
2,154 [1,283,583]
(14,207 [3,039,240])
SAP R/3 10% 15,754
(84,167)
2,809 [1,286,568]
(22,883 [3,302,068])
We detected 5 times more
(all optimal) alignments
20. Future work
• Improve the handling of concurrency and nested loops
• Evaluate our technique using more complex models and logs
• Extend our technique to detect additional model behavior
• Explore different applications for our technique, e.g., process model repair,
drift detection, log delta analysis, etc.
20
22. Pattern detection in PSP
Statement:
In the log, after ”C”, “A” is optional.
Detecting task skips
C
A
B
Model
match(C)
rhide(A)
match(B)
PSP
match(A)
match(B)
B
C
Log
A
B
PSP
rhide(A)
match(B)
match(C)
lhide(A)
B
C
A
Log
A
B
C
Model
Statement:
In the log, ”A” appears after “C” instead
of the initial marking.
Detecting task relocations
match(C)
rhide(A)
match(B)
C
A
B
B
C
Log Model PSP
Statement:
In the model, after ”C”, “A” occurs before “B”,
while in the log they are mutually exclusive.
A match(A)
rhide(B)
Detecting Causality – Conflict mismatches
22
Editor's Notes
ADOPT
[1] Verbeek, H. M. W., & van der Aalst, W. M. (2016, June). Merging alignments for decomposed replay. In International Conference on Applications and Theory of Petri Nets and Concurrency (pp. 219-239). Springer International Publishing.
[2] L. Garc ́ıa-Ban ̃uelos, N. van Beest, M. Dumas, M. La Rosa, and W. Mertens. Complete and interpretable conformance checking of business processes. IEEE TSE, 43, 2017. In press.
Translate nondeterministic to deterministic automaton
Unmatched behavior as a way to avoid reporting in the generalization
We identified a complete set of mismatch patterns (these in the slide are those for conformance checking, we have similar ones for log delta analysis)
For each of these patterns we have a verbalization in natural language
---
We only report on immediate causality (not transitive causality) and direct conflict (not inherited conflict) because we want to report each mismatch once:
1. Immediate causality vs concurrency
2. direct conflict vs concurrency
direct conflict vs immediate causality
Each mismatch occurs in a given context, i.e. a pair of configurations, one for each PES
Relation mismatch patterns are O(n) where n is the number of arcs of the PSP (via optimizations of O(n^3))
---
Task absence / insertion is a “catch all” pattern, essentially saying that there is a task at a given configuration in the PES of the log but not in the corresponding configuration in the PES of the model
---
Complete finding fitness-related differences:
Concurrency (Log) – Conflict
Concurrency (Log) – Causality
(S-components)
Additional model behavior: unobserved behavior in the model but present in the log
We proposed a scalable conformance checking technique for handling large and nonconforming event logs
We remapped the problem of Conformance checking to automaton synchronization: DAFSA of an event Log vs reachability graph of a model
We show that our technique scales well with big event logs, but imprecise process models impose a challenge