--- pl:dydaktyka:ml:lab3 [2013/03/12 20:51]
esimon [Narzędzia]
+++ pl:dydaktyka:ml:lab3 [2019/06/27 15:50]
@@ Linia 1: / Linia 1: @@
-====== Laboratorium 3 - Drzewa Decyzyjne ======
-Literatura: Tom M. Mitchell, Machine Learning, Rozdział 3.
-===== Lista i opis plików  =====
-Potrzebne pliki to
-program
-{{:pl:prolog:prolog_lab:ml:id3.pl}}
-oraz wcześniejsze dane
-{{:pl:prolog:prolog_lab:ml:animals.pl}},
-{{:pl:prolog:prolog_lab:ml:loandata.pl}}.
-Pliki uzupełniające dla przykładów z PMG:
-program
-{{:pl:prolog:prolog_lab:ml:dtlearn_bool.pl}},
-dane
-{{:pl:prolog:prolog_lab:ml:dtlearn_t2.pl}}
-i dla zadanie
-{{:pl:prolog:prolog_lab:ml:as11data.pl}}.
-===== Tworzenie, przeglądanie i dostęp do drzew decyzyjnych i reguł =====
-<code prolog>
-?- [id3].
-?- [animals].
-?- id3(2).
-</code>
-The parameter we specify (2) is used to control the growing of the tree. If a node covers
-less than or equal to 2 examples, then it becomes a leaf no matter what the class distribution
-is.
-To see the tree we use:
-<code prolog>
-?- showtree.
-  has_covering=feathers => [bird/2]
-  has_covering=scales
-    gills=f => [reptile/2]
-    gills=t => [fish/1]
-  has_covering=none => [amphibian/1, mammal/1]
-  has_covering=hair => [mammal/3]
-</code>
-The lists at the leaves show the class distribution in the examples covered. If we change
-the threshold we get a different tree. With threshold 1 we will have one-class examples
-at the leaves. For example:
-<code prolog>
-?- id3(1),showtree.
-  has_covering=feathers => [bird/2]
-  has_covering=scales
-    gills=f => [reptile/2]
-    gills=t => [fish/1]
-  has_covering=none
-    eggs=t => [amphibian/1]
-    eggs=f => [mammal/1]
-  has_covering=hair => [mammal/3]
-</code>
-The "id3" query also generates "if-then" rules. We can see these rules by using "listing":
-<code prolog>
-?- listing(if).
-:- dynamic if/1.
-if[has_covering=feathers]then bird.
-if[has_covering=scales, gills=f]then reptile.
-if[has_covering=scales, gills=t]then fish.
-if[has_covering=none, eggs=t]then amphibian.
-if[has_covering=none, eggs=f]then mammal.
-if[has_covering=hair]then mammal.
-</code>
-NOTE: "listing" is a Prolog built-in procedure that lists the contents of the database
-by specifying a name. This means that the if-then strucures are already in the database
-and their name is "if". In fact, the ID3 program when started by "?- id3(1)." creates a
-decision tree (that can be viewed by "?- showtree.") and loads in the database a set of
-rules, that equivalently represent the tree.
-The rule created by ID3 are Prolog facts, but put in a special prefix format.
-They can also be accessed directly by Prolog queries like these:
-<code prolog>
-?- if X then mammal.
-X = [has_covering=none, eggs=f] ;
-X = [has_covering=hair] ;
-?- if X then Y.
-X = [has_covering=feathers]
-Y = bird ;
-X = [has_covering=scales, gills=f]
-Y = reptile ;
-X = [has_covering=scales, gills=t]
-Y = fish ;
-X = [has_covering=none, eggs=t]
-Y = amphibian ;
-X = [has_covering=none, eggs=f]
-Y = mammal ;
-X = [has_covering=hair]
-Y = mammal
-</code>
-If you want to run ID3 with another data file, load the data only (no need to reload id3.pl).
-However NOTE that after executing "?- id3(1)" the previous tree and rules are deleted and
-replaced with those corresponding to the currently loaded data file. For example:
-<code prolog>
-?- [loandata].
-?- id3(1),showtree.
-  emp=no => [reject/3]
-  emp=yes
-    buy=car
-      married=no => [reject/1]
-      married=yes => [approve/1]
-    buy=comp => [approve/7]
-?- id3(9),showtree.
-  emp=no => [reject/3]
-  emp=yes => [approve/8, reject/1]
-</code>
-===== Narzędzia =====
-Wykorzystując  dane z pliku {{:pl:prolog:prolog_lab:ml:animals.pl}}, przetestuj działanie [[http://www.aispace.org/dTree/index.shtml|Narzędzie do budowania drzew decyzyjnych]].
-**Uwaga**  Oblicz Entropię i tzw. //Knowledge Gain// dla poszczególnych parametrów i wskaż najlepszy korzeń drzewa decyzyjnego. Czy narzędzie poprawnie wyznaczyło korzeń?
-===== Porównanie z uczeniem pojęć =====
-Assuming that the loandata tree and rules created by "?- id3(1)" in the database, try this
-(note the sets [2, 5, 11, 12] = [2, 11, 12] + [5]):
-<code prolog>
-?- if H then [reject/N],model(H,M).
-H = [emp=no]
-N = 3
-M = [2, 11, 12] ;
-H = [emp=yes, buy=car, married=no]
-N = 1
-M = [5] ;
-</code>
-This query accesses the two disjunctive components of the hypothesis for class "reject"
-(the two paths in the decision tree following to the "reject" leaves).
-Now, obviously we should be able to run VS on each of the two sets and produce
-consistent hypotheses. For the first part, [2, 11, 12] we put these examples in the
-beginning and all from class "approve" to follow (we may also leave the other "reject"
-example at the end). Let's call this new data file loandata1.pl and load it into the
-database.
-<code prolog>
-?- [loandata1].
-</code>
-We may try "listing" to verify our data:
-<code prolog>
-?- listing(example).
-example(2, reject, [emp=no, buy=comp, sex=f, married=yes]).
-example(11, reject, [emp=no, buy=comp, sex=m, married=yes]).
-example(12, reject, [emp=no, buy=car, sex=f, married=yes]).
-example(1, approve, [emp=yes, buy=comp, sex=f, married=no]).
-example(3, approve, [emp=yes, buy=comp, sex=m, married=no]).
-example(4, approve, [emp=yes, buy=car, sex=f, married=yes]).
-example(6, approve, [emp=yes, buy=comp, sex=f, married=yes]).
-example(7, approve, [emp=yes, buy=comp, sex=f, married=no]).
-example(8, approve, [emp=yes, buy=comp, sex=m, married=no]).
-example(9, approve, [emp=yes, buy=comp, sex=m, married=yes]).
-example(10, approve, [emp=yes, buy=comp, sex=m, married=yes]).
-example(5, reject, [emp=yes, buy=car, sex=f, married=no]).
-</code>
-Now, we load and run the candiate elimination algorithm (vs.pl) in batch mode.
-<code prolog>
-?- [vs].
-?- batch.
-+[no, comp, f, yes]
-G-set: [[?, ?, ?, ?]]
-S-set: [[no, comp, f, yes]]
-+[no, comp, m, yes]
-G-set: [[?, ?, ?, ?]]
-S-set: [[no, comp, ?, yes]]
-+[no, car, f, yes]
-G-set: [[?, ?, ?, ?]]
-S-set: [[no, ?, ?, yes]]
--[yes, comp, f, no]
-G-set: [[?, ?, ?, yes], [no, ?, ?, ?]]
-S-set: [[no, ?, ?, yes]]
--[yes, comp, m, no]
--[yes, car, f, yes]
-G-set: [[no, ?, ?, ?]]
-S-set: [[no, ?, ?, yes]]
--[yes, comp, f, yes]
--[yes, comp, f, no]
--[yes, comp, m, no]
--[yes, comp, m, yes]
--[yes, comp, m, yes]
-+[yes, car, f, no]
-G-set: []
-S-set: []
-There is no consistent concept description in this language !
-</code>
-NOTE that the inconsistancy occurs because of the last example (# 5) which is a part
-of the other hypothesis. The hypothesis created by ID3, [emp=no] is contained
-in the version space:
-<code prolog>
-G-set: [[no, ?, ?, ?]]
-S-set: [[no, ?, ?, yes]]
-</code>
-In fact, it is the one in the G-set. Interestingly, we have another one (the S-set).
-This means that there may be another more specific decision tree, which looks like this:
-<code prolog>
-  emp=no
-    married=yes => reject        THIS IS THE NEW BRANCH
-    married=no => approve        THIS HAS TO BE ADDED TO COMPLETE THE SUBTREE
-  emp=yes
-    buy=car
-      married=no => reject
-      married=yes => approve
-    buy=comp => approve
-<code>
-NOTE that this tree is created MANUALLY. With loandata1.pl the ID3 algorithm will
-create the previous tree, because the order of the examples does not matter.
-Let's verify what the new brach covers. First, we need the "model" procedure back
-(it's in id3.pl).
-<code prolog>
-?- [id3].
-?- model([emp=no,married=yes],M).
-M = [2, 11, 12]
-</code>
-NOTE that this is the same coverage as before.
-<code prolog>
-?- model([emp=no,married=no],M).
-M = []
-</code>
-The brach we added to complete the tree does not cover anything, but this is ok.
-Similarly the other disjunctive componenet of the ID3's "reject" hypothesis,
-<code prolog>
-if [emp=yes, buy=car, married=no] then reject
-</code>
-may be learned with VS. First, we create file loandata2.pl with the following
-order of the examples: [5, 1, 3, 4, 6, 7, 8, 9, 10, 2, 11, 12].
-Then running VS:
-<code prolog>
-?- batch.
-+[yes, car, f, no]
-G-set: [[?, ?, ?, ?]]
-S-set: [[yes, car, f, no]]
--[yes, comp, f, no]
-G-set: [[?, car, ?, ?]]
-S-set: [[yes, car, f, no]]
--[yes, comp, m, no]
--[yes, car, f, yes]
-G-set: [[?, car, ?, no]]
-S-set: [[yes, car, f, no]]
--[yes, comp, f, yes]
--[yes, comp, f, no]
--[yes, comp, m, no]
--[yes, comp, m, yes]
--[yes, comp, m, yes]
-+[no, comp, f, yes]
-G-set: []
-S-set: []
-There is no consistent concept description in this language !
-</code>
-NOTE that the version space here is bigger:
-<code prolog>
-G-set: [[?, car, ?, no]]
-S-set: [[yes, car, f, no]]
-</code>
-It has four hypotheses, among which is the one created by ID3.
-===== Agorytm OneR =====
-<code prolog>
-?- [id3].
-?- [loandata].
-?- id3(1),showtree.
-  emp=no => [reject/3]
-  emp=yes
-    buy=car
-      married=no => [reject/1]
-      married=yes => [approve/1]
-    buy=comp => [approve/7]
-</code>
-The id3.pl file defines procedures to compute the entopy and the distribution
-of classes in a set of examples. For example:
-<code prolog>
-?- if H then C, model(H,M),entropy(M,I).
-H = [emp=no]
-C = [reject/3]
-M = [2, 11, 12]
-I = 0 ;
-H = [emp=yes, buy=car, married=no]
-C = [reject/1]
-M = [5]
-I = 0 ;
-H = [emp=yes, buy=car, married=yes]
-C = [approve/1]
-M = [4]
-I = 0 ;
-H = [emp=yes, buy=comp]
-C = [approve/7]
-M = [1, 3, 6, 7, 8, 9, 10]
-I = 0
-</code>
-These are all 100% correct hypotheses and that's way the entropy of the
-class distribution is 0. If we allow mixed classes at the leaves we may get
-non-zero entropy. For example:
-<code prolog>
-?- id3(2),showtree.
-  emp=no => [reject/3]
-  emp=yes
-    buy=car => [approve/1, reject/1]
-    buy=comp => [approve/7]
-?- if H then C, model(H,M),entropy(M,I).
-H = [emp=no]
-C = [reject/3]
-M = [2, 11, 12]
-I = 0 ;
-H = [emp=yes, buy=car]
-C = [approve/1, reject/1]
-M = [4, 5]
-I = 1 ;
-H = [emp=yes, buy=comp]
-C = [approve/7]
-M = [1, 3, 6, 7, 8, 9, 10]
-I = 0 ;
-</code>
-The id3.pl program inlcudes a procedure to compute the class distribution of a set of
-examples. For example:
-<code prolog>
-?- if H then C, model(H,M),distr(M,D).
-H = [emp=no]
-C = [reject/3]
-M = [2, 11, 12]
-D = [reject/3] ;
-H = [emp=yes, buy=car]
-C = [approve/1, reject/1]
-M = [4, 5]
-D = [approve/1, reject/1] ;
-H = [emp=yes, buy=comp]
-C = [approve/7]
-M = [1, 3, 6, 7, 8, 9, 10]
-D = [approve/7] ;
-</code>
-This we may see the class distribution and the entropy in a set of examples.
-For example:
-<code prolog>
-?- distr([1,2,3,4],D),entropy([1,2,3,4],I).
-D = [approve/3, reject/1]
-I = 0.811278
-</code>
-Here is the entropy of a hypothesis that covers mixed class examples:
-<code prolog>
-?- model([buy=car],M),distr(M,D),entropy(M,I).
-M = [4, 5, 12]
-D = [approve/1, reject/2]
-I = 0.918296
-</code>
-The distribution of classes can be used to compute the error.
-For example, asuming that we assigned the majority class label to
-the set above (reject), the error will be 1/3 (one wrong classification
-out of 3).
-The OneR algorithm generates a one-level decision tree, using an
-error based attribute selection. The following query can be used to
-find the OneR decision tree for our loan data ("attr" and "vals" are
-described in the header of id3.pl. Try the query without writeln and fail to
-understand better how it works).
-<code prolog>
-?- attr(L),member(A,L),vals(A,W),nl,member(V,W),model([V],M),distr(M,D),writeln([V]-D),fail.
-[buy=car]-[approve/1, reject/2]
-[buy=comp]-[approve/7, reject/2]
-[emp=no]-[reject/3]
-[emp=yes]-[approve/8, reject/1]
-[married=no]-[approve/4, reject/1]
-[married=yes]-[approve/4, reject/3]
-[sex=f]-[approve/4, reject/3]
-[sex=m]-[approve/4, reject/1]
-</code>
-The class distributions above allow us to compute the total error in each
-of the four possible one-level trees:
-<code prolog>
-. buy=car => reject [approve/1, reject/2]
-   buy=comp => approve [approve/7, reject/2]
-   -----------------------------------------
-   Error = (1+2)/12 = 0.25
-. emp=no => reject [reject/3]
-   emp=yes => approve [approve/8, reject/1]
-   ----------------------------------------
-   Error = (0+1)/12 = 0.0833333
-. married=no => approve [approve/4, reject/1]
-   married=yes => approve [approve/4, reject/3]
-   --------------------------------------------
-   Error = (1+3)/12 = 0.333333
-. sex=f => approve [approve/4, reject/3]
-   sex=m => approve [approve/4, reject/1]
-   --------------------------------------
-   Error = (3+1)/12 = 0.333333
-</code>
-Obviously the best OneR tree is #2, because it has the lowest error rate of 0.0833333.
-Interestingly the entropy-based attribute selection used by ID3 is in agreement
-with the error-based one. This is clear from the fact the pruned ID tree is
-the same as the OneR one.

pl/dydaktyka/ml/lab3.txt · ostatnio zmienione: 2019/06/27 15:50 (edycja zewnętrzna)

Pokaż stronę Poprzednie wersje

Menadżer multimediów Do góry

AIwiki

Menu

Dla Studentów

Old specialized AI courses

SMaDA/SMaIDA/AIDA

Informatyka (EAIiIB)

Studia Dr

Inne materialy dydaktyczne

Archiwum

Dyplomanci

Geist Season of Code

HeKatE

Public

Różnice