Scalable Conformance Checking of Business Processes

Scalable Conformance Checking
for Business Processes
Daniel Reißner, Raffaele Conforti, Marlon Dumas, Marcello La Rosa,
Abel Armas-Cervantes
1

Process mining
Process mining is a family of methods for analyzing business processes
based on event logs.
• Some of the most important process mining operations:
• Discovery
• Conformance checking
• Enhancement
2

Process mining
Process mining is a family of methods for analyzing business processes
based on event logs.
• Some of the most important process mining operations:
• Discovery
• Conformance checking
• Enhancement
Model v1
Log
3

Applications of conformance checking
Compliance
auditing
Model quality
measures
Model repair Deviance mining
Conformance
checking
How well do process
executions fit to a
normative model?
What is the quality
of a discovered
process model?
How can we
adapt the process
model to fit
reality better?
What are the
current compliance
risks?
Are there any employee
innovations?
➢ Fitness,
Precision etc.
4

id trace
(1) C, B, D, F, E
(2) ⟨ B, C, D, E, I, G, D, F ⟩
Trace Alignment (1): 1/2
Log
Model
compare
Event LogProcess model
FDBC E
BB
CC
DD
E
G
FF
I
Trace Alignment (1): 2/2
Log
Model
One optimal alignmentAll optimal alignments
≫
FDBC E
B
C
D
E
≫
Trace Alignments
• Mismatches are reported as task misalignments,
i.e. moves on model or moves on log ≫
• The one-optimal variant returns one model path
with a minimal number of misalignments
• Adopt interleaving semantics
• Build a synchronous net for each trace
• Use an 𝐴∗-Algorithm to find the closest trace
in the model for each trace in the log
Existing approaches:
Trace Alignments
5
• All optimal alignments aim at returning all possible
model-path with minimal number of misalignments

Behavioral Alignment
id trace
1 C, B, D, F, E
2 ⟨ B, C, D, E, I, G, D, F ⟩compare
• Adopts true concurrency semantics
• Translates model and log to prime event structures (PES)
• Uses an 𝐴∗
-Algorithm to find the closest run
in the model PES for each run in the log PES
6

id trace
1 C, B, D, F, E
2 ⟨ B, C, D, E, I, G, D, F ⟩
B
C
D
E
F I G B
C
D
F
E
E
GI D F
7

compare
PES of event LogPES of process model
B
C
D
E
F I G
B
C
D
F
E
E
GI D F
Behavioral Mismatch (𝟏):
In the Log, after ‘D’, ’F’ is substituted by ‘E’.
Behavioral Mismatch (𝟐):
In the Log, after ‘D’, ’F’ occurs before ‘E’, while
in the model they are mutually exclusive.
8
• Mismatches are gathered as event misalignments,
i.e. moves on model or moves on log ≫
• Mismatches of behavioral relations can be detected
• Differences can be reported as natural language
statements

Scalability challenges of current approaches
• Trace alignment does not scale up with large logs. In some cases trace alignment
is not capable of computing all optimal alignments
• Behavioral alignment is generally slower than trace alignment
• Scalability issues of the conformance checkers can affect other techniques, such
as model repair or process discovery, which rely on conformance checking to
justify the quality of their outputs
9

Research question and desiderata
RQ: How can we improve scalability of conformance checking techniques with large
and noisy event logs while still providing a complete set of differences?
Desiderata:
• Compute one- or all-optimal alignments
• Report the results of the conformance checking as trace alignments and
behavioral statements
10

Overview and general idea
Petri Net
compress
DAFSA
Reachability
Graph
PSP
Event Log
Optimal
Alignments
Difference
Statements
expand
compare
(1)
(2)
(3)
11

From event log to DAFSA
Trace N
⟨ 𝐵, 𝐷, 𝐸 ⟩ 5
⟨ 𝐵, 𝐷, 𝐹 ⟩ 10
⟨ 𝐶, 𝐵, 𝐷, 𝐸 ⟩ 15
⟨ 𝐶, 𝐵, 𝐷, 𝐹 ⟩ 5
Log
s 𝑛1 𝑛2 𝑓1
B D E
𝑛3
BC
𝑓2
F
𝑛4 𝑛5 𝑓3
D E
𝑓4
F
DAFSA
=
Prefixes
⟨ 𝐵, 𝐷 ⟩ , ⟨ 𝐶, 𝐵, 𝐷 ⟩
Suffixes
𝐷, 𝐹 , ⟨ 𝐷, 𝐸 ⟩
12

𝜏
𝐵 𝐶
𝜏
𝐷
𝐹𝐸 𝐼
𝐺
Petri net
𝑝3
𝑝6
𝑝5 𝑝4
𝑝2
𝑝1
𝑝10 𝑝8
𝑝9𝑝7
𝜏
From process model to reachability graph
[𝑝1] [𝑝2, 𝑝3]
Process model
[𝑝5, 𝑝3]
Reachability graph
[𝑝2, 𝑝4]
[𝑝5, 𝑝4]
[𝑝6] [𝑝7]
[𝑝8][𝑝9]
[𝑝10]
τ B
I
G
ED
F
τ
C
C B
B
C
D
x
Why to remove 𝜏-transitions:How to
• Reduce state space for conformance checking
• Reduce uninterpretable conformance results for end user
• For each 𝜏 not targeting a final marking, insert a copy of each
outgoing arc of the target of 𝜏 and link it to the source,
• otherwise, use each incoming arc of its source
Removing unconnected markings
𝜏-less Reachability graph
F
τ
13

⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩
PSP construction with the A∗
- Algorithm
[𝑝1]
[𝑝5, 𝑝3]
[𝑝2, 𝑝4]
[𝑝5, 𝑝4]
[𝑝6] [𝑝7]
[𝑝8][𝑝9]
[𝑝10]
I
G
ED
F
C
B
B
C
D
𝝉-less Reachability graph
F
s 𝑛1 𝑛2 𝑓1
B D E
𝑛3
BC
F
DAFSA
( 𝑝1 , 𝑠)
( 𝑝5, 𝑝3 , 𝑛1)
( 𝑝5, 𝑝3 , 𝑠) ( 𝑝1 , 𝑛1)
⟨𝑟ℎ𝑖𝑑𝑒, 𝐵⟩ ⟨𝑙ℎ𝑖𝑑𝑒, 𝐵⟩
( 𝑝2, 𝑝4 , 𝑠)
⟨𝑟ℎ𝑖𝑑𝑒, 𝐶⟩
⟨ 𝐵, 𝐷, 𝐸 ⟩
current trace
𝑐 = 1
𝑔 = 0
ℎ = 1
𝑐 = 3
𝑔 = 1
ℎ = 2
𝑐 = 1
𝑔 = 1
ℎ = 0
( 𝑝2, 𝑝4 , 𝑠)
( 𝑝5, 𝑝4 , 𝑛1) ( 𝑝2, 𝑝4 , 𝑛1)
⟨𝑙ℎ𝑖𝑑𝑒, 𝐵⟩
( 𝑝5, 𝑝4 , 𝑠)
𝑐 = 1
𝑔 = 1
ℎ = 0
𝑐 = 3
𝑔 = 2
ℎ = 1
𝑐 = 3
𝑔 = 2
ℎ = 1
⟨𝑟ℎ𝑖𝑑𝑒, 𝐵⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩
( 𝑝5, 𝑝4 , 𝑛1)
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
( 𝑝5, 𝑝4 , 𝑛1)
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
( 𝑝5, 𝑝3 , 𝑛1)
✓
𝑐 = 3
𝑔 = 1
ℎ = 2
𝑐 = 1
Prefix Memoization
⟨ 𝐵, 𝐷 ⟩ Node 1, Node 2
𝑐 = 1
⟨ 𝐵, 𝐷, 𝐹 ⟩
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩
( 𝑝10 , 𝑓1)
node, Suffix Memoization
( 𝑝5, 𝑝4 , 𝑛1), ⟨ 𝐷, 𝐸 ⟩ Path to node 3
( 𝑝2, 𝑝4 , 𝑛3)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐶⟩
( 𝑝5, 𝑝4 , 𝑛1)
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
⟨ 𝐶, 𝐵, 𝐷, 𝐸 ⟩⟨ 𝐶, 𝐵, 𝐷, 𝐹 ⟩
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
( 𝑝10 , 𝑓1)
⟨ 𝐶, 𝐵, 𝐷 ⟩ Node 4
1 2
3
4
14

Patterns for conformance checking diagnosis
Unfitting behavior:
• Relation mismatch:
1. Causality-Concurrency
2. Conflict
• Event mismatch:
3. Task skipping
4. Task substitution
5. Unmatched repetition
6. Task relocation
7. Task insertion / absence
L. García-Bañuelos, N. R.T.P. van Beest , M. Dumas, and M. La Rosa, and W. Mertens: Complete and interpretable conformance checking
of business processes. IEEE Trans. Softw. Eng.: 2017
15

Pattern detection in the example
( 𝑝1 , 𝑠)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩⟨𝑟ℎ𝑖𝑑𝑒, 𝐶⟩
( 𝑝2, 𝑝4 , 𝑠)
( 𝑝5, 𝑝4 , 𝑛1)
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
( 𝑝5, 𝑝4 , 𝑛1)
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
( 𝑝5, 𝑝3 , 𝑛1)
( 𝑝10 , 𝑓1)
( 𝑝10 , 𝑓1)
( 𝑝2, 𝑝4 , 𝑛3)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐶⟩
( 𝑝5, 𝑝4 , 𝑛1)
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
( 𝑝10 , 𝑓1)
Behavioral alignment feedback:
• In the log, at the start of the trace, “C” is optional
• In the model, after “B”, “C” occurs before “D”
16

Evaluation setup
• Implemented approach in an open source java tool: ProConformance 2.0
(available from http://apromore.org/platform/tools)
• Tested the approach in three setups:
• Road traffic fines management process (RTFMP) ➢ publicly available model - log pair
• BPI Challenge Log 2013 (BPIC13) ➢ artificially generated process model
• SAP R/3 model collection (120 models) ➢ artificially created logs (2.5% → 10% noise)
• 480 model-log pairs
17

Evaluation results
18
Key findings:
• In the case of all-optimal, our technique outperforms
trace alignments by 1-2 orders of magnitude
• Trace alignments timed out in 207 / 480 SAP cases
(given a time bound)
• In the case of one-optimal, our technique performs
from 1.5 to nearly 40 times faster than trace alignment
• In BPIC13, one-optimal trace alignment outperforms
our technique

Evaluation results
19
Optimal alignments
(upper bound of 95% confidence interval)
All optimal
Dataset DAFSA Trace align. [#unfiltered]
RTFMP 467 338 [1,898,182]
BPIC13 cp. 28,656 22,259 [1,904,057]
SAP R/3 2.5% 4,253
(22,675)
1,233[1,067,533]
(6,470 [1,929,629])
SAP R/3 5% 7,672
(41,133)
1,751[1,224,079]
(9,178 [2,199,248])
SAP R/3 7.5% 11,652
(61,504)
2,154 [1,283,583]
(14,207 [3,039,240])
SAP R/3 10% 15,754
(84,167)
2,809 [1,286,568]
(22,883 [3,302,068])
We detected 5 times more
(all optimal) alignments

Future work
• Improve the handling of concurrency and nested loops
• Evaluate our technique using more complex models and logs
• Extend our technique to detect additional model behavior
• Explore different applications for our technique, e.g., process model repair,
drift detection, log delta analysis, etc.
20

Pattern detection in PSP
Statement:
In the log, after ”C”, “A” is optional.
Detecting task skips
C
A
B
Model
match(C)
rhide(A)
match(B)
PSP
match(A)
match(B)
B
C
Log
A
B
PSP
rhide(A)
match(B)
match(C)
lhide(A)
B
C
A
Log
A
B
C
Model
Statement:
In the log, ”A” appears after “C” instead
of the initial marking.
Detecting task relocations
match(C)
rhide(A)
match(B)
C
A
B
B
C
Log Model PSP
Statement:
In the model, after ”C”, “A” occurs before “B”,
while in the log they are mutually exclusive.
A match(A)
rhide(B)
Detecting Causality – Conflict mismatches
22

Scalable Conformance Checking of Business Processes

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Scalable Conformance Checking of Business Processes

Similar to Scalable Conformance Checking of Business Processes (20)

More from Marlon Dumas

More from Marlon Dumas (20)

Recently uploaded

Recently uploaded (20)

Scalable Conformance Checking of Business Processes

Editor's Notes