Pokaż źródło strony

Ostatnie zmiany Indeks

AIwiki

Strona Główna

Dla Studentów

Zima / Winter 2021:

Computer Science: Introduction to AI
ISI: Podstawy Sztucznej Inteligencji

Old specialized AI courses

SMaDA/SMaIDA/AIDA

1. semester:

2. semester:

WSHOP -- Development Workshop

Informatyka (EAIiIB)

1. rok:

2. i 3. rok:

4. rok:

Systemy i technologie wirtualizacji

Studia Dr

HeKatE

Public

The KESE workshop (EN only)
Mindstorms (archive)

en
hekate
hekatedev
kese
mindstorms
misc
pl
- dydaktyka
  - ai
  - aml
  - asd
  - bim
  - cp
  - csp
  - dss
    - exam
    - projects
    - rules
    - lab01
    - lab02
    - lab03
    - lab04
    - lab05
    - lab06
    - lab07
    - lab1
    - lab2
    - lab3
    - lab4
    - lab5
    - start
  - est
  - games
  - ggp
  - jimp2
  - jsi
  - krr
  - labcode316
  - logic
  - mbn
  - mgr
  - miw
  - ml
  - pf
  - piw
  - planning
  - pp
  - psi
  - rules
  - sbd
  - semweb
  - sitw
  - so
  - unix
  - wdk
  - wshop
  - ztb
  - jsi2007
  - jsi2008
  - jsi2009
  - piw2008
  - start
- epp
- hekate
- hekatedev
- hexor
- mindstorms
- misc
- miw
- plnxt
- prolog
- wiki
- hexor
- mindstorms2
- miw
- start
- studentsidebar
playground
research
student
wiki
sidebar
sidebarold
start
tmp

To jest stara wersja strony!

Spis treści

Process mining in Python

Process mining in Python

Requirements

Python 3.x, opyenxes, pygraphviz.

Implementing a simple heuristic miner

Using the following excerpt of code import a repairExample.xes file into your Python script:

from opyenxes.data_in.XUniversalParser import XUniversalParser
 
path = 'repairExample.xes'
 
with open(path) as log_file:
    # parse the log
    log = XUniversalParser().parse(log_file)[0]

Take a look at the log variable. Using log.get_features() or log.get_attributes(), you can check some information about the log. As the parsed log consists of lists of events, you can also select a single event and check its attributes:

event = log[0][0]
event.get_attributes()

For ease of further work, we will create a workflow_log consisting of names of events:

workflow_log = []
for trace in log: 
    workflow_trace = []
    for event in trace[0::2]:
        # get the event name from the event in the log
        event_name = event.get_attributes()['Activity'].get_value()
        workflow_trace.append(event_name)
    workflow_log.append(workflow_trace)

To create a simple heuristic net of task (simplified process model like in Disco tool), we will create a structure in which for each event, we gather a set of all events that precede this event:

w_net = dict()
for w_trace in workflow_log:
    for i in range(0, len(w_trace)-1):
        ev_i, ev_j = w_trace[i], w_trace[i+1]
        if ev_i not in w_net.keys():
            w_net[ev_i] = set()
        w_net[ev_i].add(ev_j)

Take a closer look at the w_net dictionary:

{'Analyze Defect': {'Inform User', 'Repair (Complex)', 'Repair (Simple)'},
 'Archive Repair': {'End'},
 'Inform User': {'Archive Repair', 'End', ...}, 
 ...}

It represents the connections between events:

	Analyze Defect	Archive Repair	Inform User	…	End
Analyze Defect			→
Archive Repair					→
Inform User	→				→
…
End

Using Pygraphviz, we can render an image depicting the process:

import pygraphviz as pgv
G = pgv.AGraph(strict=False, directed=True)
G.graph_attr['rankdir'] = 'LR'
G.node_attr['shape'] = 'Mrecord'
for event in w_net:
    G.add_node(event, style="rounded,filled", fillcolor="#ffffcc")
    for preceding in w_net[event]:
        G.add_edge(event, preceding)
 
G.draw('simple_heuristic_net.png', prog='dot')

Diagram enhancing

In Disco, we could see the frequencies of tasks. Let's count such frequency:

ev_counter = dict()
for w_trace in workflow_log:
    for ev in w_trace:
        ev_counter[ev] = ev_counter.get(ev, 0) + 1

Then, in our model, we can just change the label to include the result of calculation:

text = event + ' (' + str(ev_counter[event]) + ")"
G.add_node(event, label=text, style="rounded,filled", fillcolor="#ffffcc")

We can also change the transparency of the discovered tasks based on their frequencies:

color_min = min(ev_counter.values())
color_max = max(ev_counter.values())
 
G = pgv.AGraph(strict=False, directed=True)
G.graph_attr['rankdir'] = 'LR'
G.node_attr['shape'] = 'Mrecord'
for event in w_net:
    value = ev_counter[event]
    color = int(float(color_max-value)/float(color_max-color_min)*100.00)
    my_color = "#ff9933"+str(hex(color))[2:]
    G.add_node(event, style="rounded,filled", fillcolor=my_color)
    for preceding in w_net[event]:
        G.add_edge(event, preceding)
 
G.draw('simple_heuristic_net_with_colors.png', prog='dot')

We can also try to discover start and end events and correct the model:

from functools import reduce
ev_source = set(w_net.keys())
ev_target = reduce(lambda x,y: x|y, w_net.values())
ev_start_set = ev_source - ev_target
print("start set: {}".format(ev_start_set))
ev_end_set = ev_target - ev_source
print("end set: {}".format(ev_end_set))
 
for ev_end in ev_end_set:
    end = G.get_node(ev_end)
    end.attr['shape']='circle'
    end.attr['label']=''
 
G.add_node("start", shape="circle", label="")
for ev_start in ev_start_set:
    G.add_edge("start", ev_start)
 
G.draw('simple_heuristic_net_with_events.png', prog='dot')

Excercise

pl/dydaktyka/dss/lab02.1539741116.txt.gz · ostatnio zmienione: 2019/06/27 15:57 (edycja zewnętrzna)

Pokaż źródło strony Poprzednie wersje

Menadżer multimediów Do góry