Różnice

Różnice między wybraną wersją a wersją aktualną.

Odnośnik do tego porównania

Both sides previous revision Poprzednia wersja
Nowa wersja
Poprzednia wersja
pl:dydaktyka:dss:lab02 [2018/10/17 03:32]
kkluza [Implementing a simple heuristic miner]
pl:dydaktyka:dss:lab02 [2020/10/18 19:42] (aktualna)
kkluza [Excercise]
Linia 4: Linia 4:
 ===== Requirements ===== ===== Requirements =====
  
-Python 3.x, opyenxes, pygraphviz.+Python 3.x, opyenxes, pygraphviz ​(or graphviz).
  
 +For this class you can use any Python environment available having the abovementioned libraries. \\ 
 +It is also possible to use: https://​colab.research.google.com.
 +
 +The codes in this lab instruction are based on the codes from the book \\
 +[[https://​www.springer.com/​gp/​book/​9783319564272|A Primer on Process Mining. Practical Skills with Python and Graphviz]]. \\ The codes are not optimized and they are supposed to show a step by step process mining solution.
 ===== Implementing a simple heuristic miner ===== ===== Implementing a simple heuristic miner =====
  
-Using the following excerpt of code import a ''​repairExample.xes'' ​file into your Python script:+Using [[https://​opyenxes.readthedocs.io/​en/​latest/​_modules/​opyenxes/​data_in/​XUniversalParser.html|XUniversalParser]] in the following excerpt of codeimport a {{ :​pl:​dydaktyka:​dss:​lab:​repairexample.txt |repairexample.xes}} file into your Python script:
  
 <code python> <code python>
Linia 72: Linia 77:
 | End | | End |
  
 +===== Visualizing results using Pygraphviz =====
  
 Using [[https://​pygraphviz.github.io/​|Pygraphviz]],​ we can render an image depicting the process: Using [[https://​pygraphviz.github.io/​|Pygraphviz]],​ we can render an image depicting the process:
Linia 90: Linia 96:
 {{:​pl:​dydaktyka:​dss:​lab:​simple_heuristic_net.png?​550|}} {{:​pl:​dydaktyka:​dss:​lab:​simple_heuristic_net.png?​550|}}
  
 +If you don't have pygraphviz, you can use graphviz ([[#​graphviz_instead_of_pygraphviz|check instruction at the bottom of the page]]).
 ===== Diagram enhancing ===== ===== Diagram enhancing =====
  
 +In Disco, we could see the frequencies of tasks. Let's count such frequency:
 +
 +<code python>
 +ev_counter = dict()
 +for w_trace in workflow_log:​
 +    for ev in w_trace:
 +        ev_counter[ev] = ev_counter.get(ev,​ 0) + 1
 +</​code>​
 +
 +Then, in our model, we can just change the label to include the result of calculation:​
 +
 +<code python>
 +text = event + ' (' + str(ev_counter[event]) + "​)"​
 +G.add_node(event,​ label=text, style="​rounded,​filled",​ fillcolor="#​ffffcc"​) # code for Pygraphviz
 +</​code>​
 +
 +We can also change the transparency of the discovered tasks based on their frequencies (code for Pygraphviz, so for graphviz, it should be adjusted):
 +
 +<code python>
 +color_min = min(ev_counter.values())
 +color_max = max(ev_counter.values())
 +
 +G = pgv.AGraph(strict=False,​ directed=True)
 +G.graph_attr['​rankdir'​] = '​LR'​
 +G.node_attr['​shape'​] = '​Mrecord'​
 +for event in w_net:
 +    value = ev_counter[event]
 +    color = int(float(color_max-value)/​float(color_max-color_min)*100.00)
 +    my_color = "#​ff9933"​+str(hex(color))[2:​]
 +    G.add_node(event,​ style="​rounded,​filled",​ fillcolor=my_color)
 +    for preceding in w_net[event]:​
 +        G.add_edge(event,​ preceding)
 +
 +G.draw('​simple_heuristic_net_with_colors.png',​ prog='​dot'​)
 +</​code>​
 +
 +We can also try to discover start and end events and correct the model:
 +
 +<code python>
 +from functools import reduce
 +ev_source = set(w_net.keys())
 +ev_target = reduce(lambda x,y: x|y, w_net.values())
 +ev_start_set = ev_source - ev_target
 +print("​start set: {}"​.format(ev_start_set))
 +ev_end_set = ev_target - ev_source
 +print("​end set: {}"​.format(ev_end_set))
 +
 +for ev_end in ev_end_set:
 +    end = G.get_node(ev_end)
 +    end.attr['​shape'​]='​circle'​
 +    end.attr['​label'​]=''​
 +
 +G.add_node("​start",​ shape="​circle",​ label=""​)
 +for ev_start in ev_start_set:​
 +    G.add_edge("​start",​ ev_start)
 +
 +G.draw('​simple_heuristic_net_with_events.png',​ prog='​dot'​)
 +</​code>​
 +
 +{{:​pl:​dydaktyka:​dss:​lab:​simple_heuristic_net_colors.png?​570|}}
 +
 +===== graphviz instead of pygraphviz =====
 +
 +It is possible to use graphviz instead of pygraphviz, but it has different syntax, e.g.:
 +
 +<code python>
 +import graphviz
 +G = graphviz.Digraph()
 +for event in w_net:
 +    G.node(event,​ style="​rounded,​filled",​ fillcolor="#​ffffcc"​)
 +    for preceding in w_net[event]:​
 +        G.edge(event,​ preceding)
 +
 +G.graph_attr['​rankdir'​] = '​LR'​
 +G.node_attr['​shape'​] = '​Mrecord'​
 +G.edge_attr.update(penwidth='​2'​)
 +G.node("​End",​ shape="​circle",​ label=""​)
 +G.render('​simple_graphviz_graph'​)
 +display(G)
 +</​code>​
 +
 +{{:​pl:​dydaktyka:​dss:​lab:​graphviz-example.png?​570|}}
 +===== Excercise =====
  
 +Extend process discovery with additional features:
 +  - Try to discover the frequency of each transition (flow) and render the number of occurrences both as a label and the thickness of the line.
 +  - Add some filtering option to show or hide tasks or flows according to the chosen threshold. ​
 +  - Optimize code by avoiding creating additional lists, e.g. using ''​itertools'',​ ''​more_itertools''​ or other Python tools. ​
 +  - 8-o Only for interested students: Try to implement and discover relations according to the Alpha algorithm. ​
  
 +<fc #​ff0000>​There is no report required after this lab.</​fc>​ However, it is possible to submit an additional report for 5 points (for a very good score) presenting the implementation of at least two of the above exercises.
pl/dydaktyka/dss/lab02.1539739962.txt.gz · ostatnio zmienione: 2019/06/27 15:57 (edycja zewnętrzna)
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0