Both sides previous revision
Poprzednia wersja
Nowa wersja
|
Poprzednia wersja
|
pl:dydaktyka:dss:lab02 [2018/10/17 03:32] kkluza [Implementing a simple heuristic miner] |
pl:dydaktyka:dss:lab02 [2020/10/18 19:42] (aktualna) kkluza [Excercise] |
===== Requirements ===== | ===== Requirements ===== |
| |
Python 3.x, opyenxes, pygraphviz. | Python 3.x, opyenxes, pygraphviz (or graphviz). |
| |
| For this class you can use any Python environment available having the abovementioned libraries. \\ |
| It is also possible to use: https://colab.research.google.com. |
| |
| The codes in this lab instruction are based on the codes from the book \\ |
| [[https://www.springer.com/gp/book/9783319564272|A Primer on Process Mining. Practical Skills with Python and Graphviz]]. \\ The codes are not optimized and they are supposed to show a step by step process mining solution. |
===== Implementing a simple heuristic miner ===== | ===== Implementing a simple heuristic miner ===== |
| |
Using the following excerpt of code import a ''repairExample.xes'' file into your Python script: | Using [[https://opyenxes.readthedocs.io/en/latest/_modules/opyenxes/data_in/XUniversalParser.html|XUniversalParser]] in the following excerpt of code, import a {{ :pl:dydaktyka:dss:lab:repairexample.txt |repairexample.xes}} file into your Python script: |
| |
<code python> | <code python> |
| End | | | End | |
| |
| ===== Visualizing results using Pygraphviz ===== |
| |
Using [[https://pygraphviz.github.io/|Pygraphviz]], we can render an image depicting the process: | Using [[https://pygraphviz.github.io/|Pygraphviz]], we can render an image depicting the process: |
{{:pl:dydaktyka:dss:lab:simple_heuristic_net.png?550|}} | {{:pl:dydaktyka:dss:lab:simple_heuristic_net.png?550|}} |
| |
| If you don't have pygraphviz, you can use graphviz ([[#graphviz_instead_of_pygraphviz|check instruction at the bottom of the page]]). |
===== Diagram enhancing ===== | ===== Diagram enhancing ===== |
| |
| In Disco, we could see the frequencies of tasks. Let's count such frequency: |
| |
| <code python> |
| ev_counter = dict() |
| for w_trace in workflow_log: |
| for ev in w_trace: |
| ev_counter[ev] = ev_counter.get(ev, 0) + 1 |
| </code> |
| |
| Then, in our model, we can just change the label to include the result of calculation: |
| |
| <code python> |
| text = event + ' (' + str(ev_counter[event]) + ")" |
| G.add_node(event, label=text, style="rounded,filled", fillcolor="#ffffcc") # code for Pygraphviz |
| </code> |
| |
| We can also change the transparency of the discovered tasks based on their frequencies (code for Pygraphviz, so for graphviz, it should be adjusted): |
| |
| <code python> |
| color_min = min(ev_counter.values()) |
| color_max = max(ev_counter.values()) |
| |
| G = pgv.AGraph(strict=False, directed=True) |
| G.graph_attr['rankdir'] = 'LR' |
| G.node_attr['shape'] = 'Mrecord' |
| for event in w_net: |
| value = ev_counter[event] |
| color = int(float(color_max-value)/float(color_max-color_min)*100.00) |
| my_color = "#ff9933"+str(hex(color))[2:] |
| G.add_node(event, style="rounded,filled", fillcolor=my_color) |
| for preceding in w_net[event]: |
| G.add_edge(event, preceding) |
| |
| G.draw('simple_heuristic_net_with_colors.png', prog='dot') |
| </code> |
| |
| We can also try to discover start and end events and correct the model: |
| |
| <code python> |
| from functools import reduce |
| ev_source = set(w_net.keys()) |
| ev_target = reduce(lambda x,y: x|y, w_net.values()) |
| ev_start_set = ev_source - ev_target |
| print("start set: {}".format(ev_start_set)) |
| ev_end_set = ev_target - ev_source |
| print("end set: {}".format(ev_end_set)) |
| |
| for ev_end in ev_end_set: |
| end = G.get_node(ev_end) |
| end.attr['shape']='circle' |
| end.attr['label']='' |
| |
| G.add_node("start", shape="circle", label="") |
| for ev_start in ev_start_set: |
| G.add_edge("start", ev_start) |
| |
| G.draw('simple_heuristic_net_with_events.png', prog='dot') |
| </code> |
| |
| {{:pl:dydaktyka:dss:lab:simple_heuristic_net_colors.png?570|}} |
| |
| ===== graphviz instead of pygraphviz ===== |
| |
| It is possible to use graphviz instead of pygraphviz, but it has different syntax, e.g.: |
| |
| <code python> |
| import graphviz |
| G = graphviz.Digraph() |
| for event in w_net: |
| G.node(event, style="rounded,filled", fillcolor="#ffffcc") |
| for preceding in w_net[event]: |
| G.edge(event, preceding) |
| |
| G.graph_attr['rankdir'] = 'LR' |
| G.node_attr['shape'] = 'Mrecord' |
| G.edge_attr.update(penwidth='2') |
| G.node("End", shape="circle", label="") |
| G.render('simple_graphviz_graph') |
| display(G) |
| </code> |
| |
| {{:pl:dydaktyka:dss:lab:graphviz-example.png?570|}} |
| ===== Excercise ===== |
| |
| Extend process discovery with additional features: |
| - Try to discover the frequency of each transition (flow) and render the number of occurrences both as a label and the thickness of the line. |
| - Add some filtering option to show or hide tasks or flows according to the chosen threshold. |
| - Optimize code by avoiding creating additional lists, e.g. using ''itertools'', ''more_itertools'' or other Python tools. |
| - 8-o Only for interested students: Try to implement and discover relations according to the Alpha algorithm. |
| |
| <fc #ff0000>There is no report required after this lab.</fc> However, it is possible to submit an additional report for 5 points (for a very good score) presenting the implementation of at least two of the above exercises. |