How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
EIS 2011
1. FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
Merging Computer Log Files for Process Mining:
An Artificial Immune System Technique
Jan Claes and Geert Poels
http://processmining.ugent.be
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 30 October, 2011
2. Process Mining
Processes are supported by IT systems
IT systems record actual process data
Process data can be used to
Discover process model
Check conformance with existing process info
Improve or extend existing process model
Attention Process Mining
Only As-Is
Only (correctly) recorded information
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 2 / 15
3. Process data in event logs
Event log
The process
Process support Grouped events
Recorded events
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 3 / 15
4. Process Mining steps
Preparation
Collect data: find event information
Merge data: from different sources
Structure data: group per instance
Convert data: to tool specific format
Process mining
Make decisions, take action
Manual task Analysts needed in most cases
Automated task Less human involvement needed
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 4 / 15
5. Merging log files
My research:
Merging log files
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 5 / 15
6. Merging log files
1. Find links between traces 2. Merge events chronologically 3. Add unlinked traces
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 6 / 15
7. Find links
Required properties of solution
Finds traces in both log files that belong to the
same process execution
Without prior knowledge about the provided log
files (as generic as possible)
But with maximal possibilities for the (expert) user
to include his knowledge about the log files
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 7 / 15
8. Find links
Proposed solution
Take the best possible guess based on assumptions
Include multiple indicator factors in analysis
Calculate factor scores for each analysed solution
Combine factor scores into global score per solution
‘Best guess’ is solution with highest combined score,
because based on assumed indicators,
most indicator value points to this solution
Provide user interaction possibilities
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 8 / 15
9. Decisions to make
Which indicator factors?
How to calculate a score for each factor?
How to combine factor scores to global score?
Which solutions to analyse?
(analyse = calculate & compare scores)
Which user interactions to include (expert)
user knowledge?
See paper for more details
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 9 / 15
10. Indicator factors
Same trace identifier
Assumption: If both logs contain a trace with the
same id, there is a very high chance they match
Not always though (e.g. customer id vs. order id)
16 10
17 12
18 14
19 16
20 18
21 20
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 10 / 15
11. Indicator factors
Equal attribute values
Assumption: The more attributes of a trace and its
events from both logs are equal, the higher the
chance they match
16 JAN 12:00 17 JC 14 14:00
17 JAN 12:10 18 JC 15 14:10
18 JAN 12:20 19 JC 16 14:20
19 JAN 12:30 1A JC 17 14:30
20 JAN 12:40 1B JC 18 14:40
21 JAN 12:50 1C JC 19 14:50
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 11 / 15
12. Test results
Simulated data (300-400 msec on standard laptop)
Benefit of controllable parameters, known solution
Correct number of linked traces in all tests
Perfect results for same trace id and up to 50%
noise, worse results for higher overlap of traces
Real data (6-10 min on standard laptop)
Correct number of linked traces in all tests
Almost perfect results for same trace id and up to
50% noise, worse results for higher overlap
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 12 / 15
13. New approach
Rule Based Merger
User has to configure rules for linking traces
Rule = relationship between attributes in both logs
Events of linked traces are merged chronologically
“Merge all traces where
attribute A of the trace in log 1 equals
attribute B of any event in the trace in log 2”
Select attributes, contexts and operator
Research focus: suggesting merging rules
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 13 / 15
14. New approach
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 14 / 15
15. Contact information
Jan Claes
jan.claes@ugent.be
http://processmining.ugent.be
Twitter: @janclaesbelgium
Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011
Department of Management Information and Operations Management 15 / 15