--- pl:dydaktyka:dss:lab02 [2018/10/17 03:32]
kkluza [Implementing a simple heuristic miner]
+++ pl:dydaktyka:dss:lab02 [2020/10/18 19:42] (aktualna)
kkluza [Excercise]
@@ Linia 4: / Linia 4: @@
 ===== Requirements =====
-Python 3.x, opyenxes, pygraphviz.
+Python 3.x, opyenxes, pygraphviz (or graphviz).
+For this class you can use any Python environment available having the abovementioned libraries. \\
+It is also possible to use: https://colab.research.google.com.
+The codes in this lab instruction are based on the codes from the book \\
+[[https://www.springer.com/gp/book/9783319564272|A Primer on Process Mining. Practical Skills with Python and Graphviz]]. \\ The codes are not optimized and they are supposed to show a step by step process mining solution.
 ===== Implementing a simple heuristic miner =====
-Using the following excerpt of code import a ''repairExample.xes'' file into your Python script:
+Using [[https://opyenxes.readthedocs.io/en/latest/_modules/opyenxes/data_in/XUniversalParser.html|XUniversalParser]] in the following excerpt of code, import a {{ :pl:dydaktyka:dss:lab:repairexample.txt |repairexample.xes}} file into your Python script:
 <code python>
@@ Linia 72: / Linia 77: @@
 | End |
+===== Visualizing results using Pygraphviz =====
 Using [[https://pygraphviz.github.io/|Pygraphviz]], we can render an image depicting the process:
@@ Linia 90: / Linia 96: @@
 {{:pl:dydaktyka:dss:lab:simple_heuristic_net.png?550|}}
+If you don't have pygraphviz, you can use graphviz ([[#graphviz_instead_of_pygraphviz|check instruction at the bottom of the page]]).
 ===== Diagram enhancing =====
+In Disco, we could see the frequencies of tasks. Let's count such frequency:
+<code python>
+ev_counter = dict()
+for w_trace in workflow_log:
+    for ev in w_trace:
+        ev_counter[ev] = ev_counter.get(ev, 0) + 1
+</code>
+Then, in our model, we can just change the label to include the result of calculation:
+<code python>
+text = event + ' (' + str(ev_counter[event]) + ")"
+G.add_node(event, label=text, style="rounded,filled", fillcolor="#ffffcc") # code for Pygraphviz
+</code>
+We can also change the transparency of the discovered tasks based on their frequencies (code for Pygraphviz, so for graphviz, it should be adjusted):
+<code python>
+color_min = min(ev_counter.values())
+color_max = max(ev_counter.values())
+G = pgv.AGraph(strict=False, directed=True)
+G.graph_attr['rankdir'] = 'LR'
+G.node_attr['shape'] = 'Mrecord'
+for event in w_net:
+    value = ev_counter[event]
+    color = int(float(color_max-value)/float(color_max-color_min)*100.00)
+    my_color = "#ff9933"+str(hex(color))[2:]
+    G.add_node(event, style="rounded,filled", fillcolor=my_color)
+    for preceding in w_net[event]:
+        G.add_edge(event, preceding)
+G.draw('simple_heuristic_net_with_colors.png', prog='dot')
+</code>
+We can also try to discover start and end events and correct the model:
+<code python>
+from functools import reduce
+ev_source = set(w_net.keys())
+ev_target = reduce(lambda x,y: x|y, w_net.values())
+ev_start_set = ev_source - ev_target
+print("start set: {}".format(ev_start_set))
+ev_end_set = ev_target - ev_source
+print("end set: {}".format(ev_end_set))
+for ev_end in ev_end_set:
+    end = G.get_node(ev_end)
+    end.attr['shape']='circle'
+    end.attr['label']=''
+G.add_node("start", shape="circle", label="")
+for ev_start in ev_start_set:
+    G.add_edge("start", ev_start)
+G.draw('simple_heuristic_net_with_events.png', prog='dot')
+</code>
+{{:pl:dydaktyka:dss:lab:simple_heuristic_net_colors.png?570|}}
+===== graphviz instead of pygraphviz =====
+It is possible to use graphviz instead of pygraphviz, but it has different syntax, e.g.:
+<code python>
+import graphviz
+G = graphviz.Digraph()
+for event in w_net:
+    G.node(event, style="rounded,filled", fillcolor="#ffffcc")
+    for preceding in w_net[event]:
+        G.edge(event, preceding)
+G.graph_attr['rankdir'] = 'LR'
+G.node_attr['shape'] = 'Mrecord'
+G.edge_attr.update(penwidth='2')
+G.node("End", shape="circle", label="")
+G.render('simple_graphviz_graph')
+display(G)
+</code>
+{{:pl:dydaktyka:dss:lab:graphviz-example.png?570|}}
+===== Excercise =====
+Extend process discovery with additional features:
+  - Try to discover the frequency of each transition (flow) and render the number of occurrences both as a label and the thickness of the line.
+  - Add some filtering option to show or hide tasks or flows according to the chosen threshold.
+  - Optimize code by avoiding creating additional lists, e.g. using ''itertools'', ''more_itertools'' or other Python tools.
+  - 8-o Only for interested students: Try to implement and discover relations according to the Alpha algorithm.
+<fc #ff0000>There is no report required after this lab.</fc> However, it is possible to submit an additional report for 5 points (for a very good score) presenting the implementation of at least two of the above exercises.

pl/dydaktyka/dss/lab02.1539739962.txt.gz · ostatnio zmienione: 2019/06/27 15:57 (edycja zewnętrzna)

Pokaż stronę Poprzednie wersje

Menadżer multimediów Do góry

AIwiki

Menu

Dla Studentów

Old specialized AI courses

SMaDA/SMaIDA/AIDA

Informatyka (EAIiIB)

Studia Dr

Inne materialy dydaktyczne

Archiwum

Dyplomanci

Geist Season of Code

HeKatE

Public

Różnice