Process Mining for Dummies (part 2)

Alexander Ruehle
July 13, 2017

Event Logs are the foundation for reconstructing process models and thus for process mining. But how does an Event Log look like and what are the steps to be taken in a process mining project? Answers on these questions can be read in our first part of this blog series called: “Process Mining for Dummies“. This time I would like to introduce the process mining algorithms needed for transforming an event log into a process model. The third part of this series will cover the different types of process mining.

Process Mining Algorithms

The main component in process mining is the mining algorithm. It determines how the process models are created. A broad variety of mining algorithms does exist. The following three categories will be discussed in more detail.

Deterministic mining algorithms
Heuristic mining algorithms
Genetic mining algorithms

Determinism means that an algorithm only produces defined and reproducible results. It always delivers the same result for the same input. A representative of this category is the α-Algorithm. It was one of the first algorithms to be able to deal with concurrency. It takes an event log as input and calculates the ordering relation of the events contained in the log. Heuristic mining also uses deterministic algorithms but they incorporate frequencies of events and traces for reconstructing a process model. A common problem in process mining is the fact that real processes are highly complex and their discovery leads to complex models. This complexity can be reduced by disregarding infrequent paths in the models.

Genetic mining algorithms use an evolutionary approach that mimics the process of natural evolution. They are not deterministic. Genetic mining algorithms follow four steps: initialization, selection, reproduction and termination. The idea behind these algorithms is to generate a random population of process models and to find a satisfactory solution by iteratively selecting individuals and reproducing them by crossover and mutation over different generations. The initial population of process models is generated randomly and might have little in common with the event log. But due to the high number of models in the population, as well as selection and reproduction, better fitting models are created in each generation.

The outcomes of mining differ depending on the algorithm used. We shall use the event log displayed in Table 1 to illustrate the models created by different mining algorithms and to provide a picture of how mining works.

Table 1 Sample Event Log

Case ID	Event ID	Timestamp	Activity	Resource
1	1000	01/01/2013	(A) Order Goods	Peter
	1001	10/01/2013	(B) Receive Goods	Michael
	1002	13/01/2013	(C) Receive Invoice	Frank
	1003	20/01/2013	(D) Pay Invoice	Tanja
2	1004	02/01/2013	(A) Order Goods	Peter
	1005	03/02/2013	(B) Receive Goods	Michael
	1006	05/02/2013	(C) Receive Invoice	Frank
	1007	06/02/2013	(D) Pay Invoice	Tanja
3	1008	01/01/2013	(A) Order Goods	Louise
	1009	04/01/2013	(C) Receive Invoice	Frank
	1010	05/01/2013	(B) Receive Goods	Michael
	1011	10/01/2013	(D) Pay Invoice	Tanja
4	1016	15/01/2013	(A) Order Goods	Peter
	1017	20/01/2013	(C) Receive Invoice	Claire
	1018	25/01/2013	(D) Pay Invoice	Frank
5	1023	01/01/2013	(A) Order Goods	Michael
	1024	10/01/2013	(B) Receive Goods	Michael
	1025	13/01/2013	(C) Receive Invoice	Michael
	1026	20/01/2013	(D) Pay Invoice	Michael

The event log contains a very limited number of events and cases. But it is nevertheless suitable to demonstrate key aspects that are relevant for process mining and the selection of mining algorithms.

Figure 1 shows a mined process model that has been reconstructed by applying the α- Algorithm to the sample event log. It has been translated into a BPMN model for better comparability. In case 3, the invoice is received before the goods. Due to the fact that both possibilities are included in the event log (goods received before the invoice in cases 1, 2, and 5, and invoice received before the ordered goods in case 3), the mining algorithm assumes that these activities can be carried out concurrently.

Figure 1: Mined Process Model Using the α-Algorithm

When we look at the model in Figure 1 a little bit closer, we can see another important fact. Case 4 is not actually reflected in the process model. No execution sequence in the model is able to reproduce the trace of case 4. It only allows for two possible execution sequences: ABCD and ACBD, but not ACD. The end-gateway after the “Order Goods” activity requires that both following branches are executed. Therefore, it is not possible to have an execution sequence without the execution of activity B. The model is a poor fit because it is not able to reflect all relevant traces in the event log. The model shown in Figure 2 is able to replay all traces. Due to the exclusive-gateways, all three traces ABCD, ACBD and ACD are possible. But now the problem is that there are many more execution sequences that are possible than are reflected in the event log. In fact the process model allows for an infinite set of sequences. Now loops are possible either starting from B to C or from C to B leading to possible sequences with infinite iterations of B and C or C and B. The sequence ABCBCD would, for example, be possible although it is not included as a trace in the event log.

If a process model is too general, it is said to be “under-fitting”. It is insufficiently precise. A major challenge in process mining is finding an adequate solution between fitness, precision, simplicity and generalizability (taken from: Buijs et al.: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery.):

Fitness: Fitness quantifies the extent to which the discovered model can accurately reproduce the cases recorded in the log.

Precision: Precision quantifies the fraction of the behavior allowed by the model which is not seen in the event log.

Simplicity: The complexity of a process model is captured by the simplicity dimension.

Generalizability: Generalization assesses the extent to which the resulting model will be able to reproduce future behavior of the process.

Figure 2: Under-fitting Process Model

Various advanced mining algorithms do exist that can be used for different purposes. Figure 3 illustrates the mined model using the heuristic Fuzzy Miner algorithm. The model does not follow the BPMN notation but instead uses a dependency graph representation. It does not contain any gateway operators but shows the dependencies between different activities. The dependency graph illustrates, for example, that A was followed three times by B and two times by C.

Figure 3: Mined Process Model Using the Fuzzy Miner Algorithm

It is important to identify which requirements need to be considered for achieving the intended objectives for each individual process mining project. The appropriateness of an algorithm should be evaluated depending on the area of application.

In the next series you can read more about the different types of Process Mining. If you can not wait any longer, you can download the scientific paper including several questions for self-study and the corresponding answers:

This article appeared as a long version: Gehrke, N., Werner, M.: Process Mining, in: WISU – Wirtschaft und Studium, Juli edition 2013, pp. 934 – 943

Alexander Ruehle

is CEO & Co-Founder of zapliance and has been an auditor at heart for over 15 years. He is both a Certified Internal Auditor and Certified Information Systems Auditor and is moving forward as a thought leader to transform the finance, audit and compliance industry - people-centric, data-driven and context-driven.

ALLE BEITRÄGE

Process Mining for Dummies (part 2)

Process Mining Algorithms

Alexander Ruehle

Artikel teilen

Auch interessant

MENÜ