Simplified Alpha algorithm

During this and the next laboratory, we will discuss a simple process mining solution. For this purpose, the widely known Alpha algorithm was simplified and adjusted to discover a BPMN process model.

Let's consider the following traces (variants) of some workflow log:

a b c d e g
a b c d f g 
a c b d e g
a c b d f g

For this workflow log, based on the Alpha algorithm, we can find the following relations (x,y - events):

(>) direct succession: x>y iff for some case x is directly followed by y,
(→) causality: x→y iff x>y and not y>x,
(||) parallelism: x||y iff x>y and y>x,
(#) choice/unrelated/no direct succession: x#y iff not x>y and not y>x.

In the below implementation, we do not use directly „no direct succession” relation, as it can be just checked if there is causality or parallelism instead.

direct_succession = {
    'a': {'b','c'},
    'b': {'c','d'},
    'c': {'b','d'},
    'd': {'e','f'},
    'e': {'g'},
    'f': {'g'}
}

causality = {
    'a': {'b', 'c'},
    'b': {'d'},
    'c': {'d'},
    'd': {'e','f'},
    'e': {'g'},
    'f': {'g'}
}

parallel_events = {('b', 'c'), ('c', 'b')}

as well as we can find the following sets of start and end events:

start_set_events = {'a'}
end_set_events = {'g'}

For the purpose of coding, we can also define the inverted causality for these events in causality relation which have just one successor (this is not the part of the alpha algorithm itself, but can be helpful as a temporary variable for using the alpha mining patterns):

inv_causality = {
    'd': {'b', 'c'}, 
    'g': {'e', 'f'}
}

Next, use the following code to generate a simple BPMN process model:

bpmnalphatemplate.py

import graphviz
 
class MyGraph(graphviz.Digraph):
 
    def __init__(self, *args):
        super(MyGraph, self).__init__(*args)
        self.graph_attr['rankdir'] = 'LR'
        self.node_attr['shape'] = 'Mrecord'
        self.graph_attr['splines'] = 'ortho'
        self.graph_attr['nodesep'] = '0.8'
        self.edge_attr.update(penwidth='2')
 
    def add_event(self, name):
        super(MyGraph, self).node(name, shape="circle", label="")
 
    def add_and_gateway(self, *args):
        super(MyGraph, self).node(*args, shape="diamond",
                                  width=".6",height=".6",
                                  fixedsize="true",
                                  fontsize="40",label="+")
 
    def add_xor_gateway(self, *args, **kwargs):
        super(MyGraph, self).node(*args, shape="diamond",
                                  width=".6",height=".6",
                                  fixedsize="true",
                                  fontsize="35",label="×")
 
    def add_and_split_gateway(self, source, targets, *args):
        gateway = 'ANDs '+str(source)+'->'+str(targets)        
        self.add_and_gateway(gateway,*args)
        super(MyGraph, self).edge(source, gateway)
        for target in targets:
            super(MyGraph, self).edge(gateway, target)
 
    def add_xor_split_gateway(self, source, targets, *args):
        gateway = 'XORs '+str(source)+'->'+str(targets) 
        self.add_xor_gateway(gateway, *args)
        super(MyGraph, self).edge(source, gateway)
        for target in targets:
            super(MyGraph, self).edge(gateway, target)
 
    def add_and_merge_gateway(self, sources, target, *args):
        gateway = 'ANDm '+str(sources)+'->'+str(target)
        self.add_and_gateway(gateway,*args)
        super(MyGraph, self).edge(gateway,target)
        for source in sources:
            super(MyGraph, self).edge(source, gateway)
 
    def add_xor_merge_gateway(self, sources, target, *args):
        gateway = 'XORm '+str(sources)+'->'+str(target)
        self.add_xor_gateway(gateway, *args)
        super(MyGraph, self).edge(gateway,target)
        for source in sources:
            super(MyGraph, self).edge(source, gateway)
 
 
G = MyGraph()
 
# adding split gateways based on causality
for event in causality:
    if len(causality[event]) > 1:
        if tuple(causality[event]) in parallel_events:
            G.add_and_split_gateway(event,causality[event])
        else:
            G.add_xor_split_gateway(event,causality[event])
 
# adding merge gateways based on inverted causality
for event in inv_causality:
    if len(inv_causality[event]) > 1:
        if tuple(inv_causality[event]) in parallel_events:
            G.add_and_merge_gateway(inv_causality[event],event)
        else:
            G.add_xor_merge_gateway(inv_causality[event],event)
    elif len(inv_causality[event]) == 1:
        source = list(inv_causality[event])[0]
        G.edge(source,event)
 
# adding start event
G.add_event("start")
if len(start_set_events) > 1:
    if tuple(start_set_events) in parallel_events: 
        G.add_and_split_gateway(event,start_set_events)
    else:
        G.add_xor_split_gateway(event,start_set_events)
else: 
    G.edge("start",list(start_set_events)[0])
 
# adding end event
G.add_event("end")
if len(end_set_events) > 1:
    if tuple(end_set_events) in parallel_events: 
        G.add_and_merge_gateway(end_set_events,event)
    else:
        G.add_xor_merge_gateway(end_set_events,event)    
else: 
    G.edge(list(end_set_events)[0],"end")
 
G.render('simple_graphviz_graph')
G.view('simple_graphviz_graph')

Alpha Plus

Alpha+ is a simple extension of Alpha algorithm which can deal with short-loops and self-loops.

In the picture below, the models on the left-hand side are the models discovered by Alpha algorithm. The models on the right-hand side are the models which could be discovered using Alpha+ algorithm.

On the UPEL platform, there are articles about the algorithms that use Petri nets. There is no simple translation from the Petri nets to BPMN, however, you for Alpha+, the situation is similar like in the case of Heuristic miner (presented during the lecture).

So for discovering self-loops and short loops, you need to construct the additional matrix and exclude the discovered relations from the footprint matrix of the Alpha algorithm.

	a	b	c	…
aτa		✓
bτb
cτc
…

Excercise

Based on some log, the following direct succession relations were discovered:

direct_succession = {
    'a': {'b', 'f'},
    'b': {'c', 'd'},
    'c': {'d', 'e'},
    'd': {'c', 'e'},
    'e': {'h'},
    'f': {'g'},
    'g': {'h'},
    'h': {'i', 'j'},
    'i': {'k'},
    'j': {'k'}
}

Calculate (first manually, then in Python) the causality relations, determine parallel events as well as the set of start and end events, and try to discover the BPMN process model.

Warning: Alpha miner was originally invented to discover Petri net, so discovering directly BPMN with it is not straightforward. However, it shows the basic process mining idea and allows to notice some theoretical limits of process mining. Thus, Alpha miner is only of theoretical interest as it is too simple to be applicable to real-life logs.

It does not address various issues such as noise, frequency etc.
It has problems with some control-flow constructs (e.g. is not able to discover some kinds of structures like short loops, but also with some more complex structures).
It does not allow for duplicate events in the trace.
It can discover inconsistent models.

Report

This lab will be conducted 26.10.2020 and 2.11.2020.
The deadline for the report is ~~10.11.2020~~ 23.11.2020.

The lab on ~~9.11.2020~~ 16.11.2020 will be only as consultation hours for people who want to consult their mini-project concerning the implementation of Heuristic miner or some other advanced miner.

On the UPEL platform, you can find some log examples of various complexity:
https://upel2.cel.agh.edu.pl/weaiib/mod/resource/view.php?id=40748
Be aware that not for all of these logs you will be able to get a correct solution, as some of the constructs may not be supported by your algorithm. However, as the provided logs are quite simple, it should be easy to check your solutions using them.

The basic report should contain:

a python file or an ipython notebook implementing the algorithm,
an example input file for the implementation,
a short pdf report on the using of the implementation and the results for the provided example, with some discussion of the interesting elements of your implementation.

For 3 points (i.e. satisfactory grade 3.0):

It is required to provide the implementation of the Alpha algorithm which may mostly be based on the code from the lab instruction.

For 4 points (i.e. good grade 4.0):

It is required to provide your implementation of any extension of the Alpha algorithm (e.g. Alpha+, elements of Alpha++ or Alpha#) which may be based on the code from the lab instruction.

For 5 points (i.e. very good grade 5.0) [mini-project]:

It is required at least one consultation on ~~9.11.2020~~ 16.11.2020 (or other date according to the individual arrangement).
It is required to provide your implementation of Heuristic miner based on the previous implementation of an extension of the Alpha algorithm.
It is also possible to choose some other advanced miner like ILP Miner, Split Miner, Evolutionary Tree Miner, Local Process Model Miner. For such a choice the extension of the deadline till the end of December is possible. In the case of correct implementation and evaluation of these complex miners, no other reports will be necessary and the final score will be 5.0 (bdb).

Additional 0.5-1 point may be granted for the groups which will cover some of the following issues (this should be explained and justified in the report):

Optimize the code in some smart way and justify this in the report.
Provide a better visualization for the BPMN process model (as the existing layout may not be suitable for some complex examples, so the group may enhance the layouting mechanism or use some external library for visualizing BPMN models, and compare it to the simple mechanism provided in the lab instruction).