Różnice

Różnice między wybraną wersją a wersją aktualną.

Odnośnik do tego porównania

pl:dydaktyka:ml:lab3 [2013/03/12 20:51]
esimon [Narzędzia]
pl:dydaktyka:ml:lab3 [2019/06/27 15:50]
Linia 1: Linia 1:
-====== Laboratorium 3 - Drzewa Decyzyjne ====== 
-Literatura: Tom M. Mitchell, Machine Learning, Rozdział 3.  
- 
-===== Lista i opis plików ​ ===== 
-Potrzebne pliki to  
-program 
-{{:​pl:​prolog:​prolog_lab:​ml:​id3.pl}} 
-oraz wcześniejsze dane 
-{{:​pl:​prolog:​prolog_lab:​ml:​animals.pl}}, ​ 
-{{:​pl:​prolog:​prolog_lab:​ml:​loandata.pl}}. 
- 
-Pliki uzupełniające dla przykładów z PMG: 
-program 
-{{:​pl:​prolog:​prolog_lab:​ml:​dtlearn_bool.pl}}, ​ 
-dane 
-{{:​pl:​prolog:​prolog_lab:​ml:​dtlearn_t2.pl}} 
-i dla zadanie 
-{{:​pl:​prolog:​prolog_lab:​ml:​as11data.pl}}. 
- 
-===== Tworzenie, przeglądanie i dostęp do drzew decyzyjnych i reguł ===== 
- 
-<code prolog> 
-?- [id3]. 
-?- [animals]. 
- 
-?- id3(2). 
-</​code>​ 
- 
-The parameter we specify (2) is used to control the growing of the tree. If a node covers 
-less than or equal to 2 examples, then it becomes a leaf no matter what the class distribution 
-is. 
- 
-To see the tree we use: 
-<code prolog> 
-?- showtree. 
- 
-  has_covering=feathers => [bird/2] 
-  has_covering=scales 
-    gills=f => [reptile/2] 
-    gills=t => [fish/1] 
-  has_covering=none => [amphibian/​1,​ mammal/1] 
-  has_covering=hair => [mammal/3] 
-</​code>​ 
- 
-The lists at the leaves show the class distribution in the examples covered. If we change 
-the threshold we get a different tree. With threshold 1 we will have one-class examples 
-at the leaves. For example: 
- 
-<code prolog> 
-?- id3(1),​showtree. 
- 
-  has_covering=feathers => [bird/2] 
-  has_covering=scales 
-    gills=f => [reptile/2] 
-    gills=t => [fish/1] 
-  has_covering=none 
-    eggs=t => [amphibian/​1] 
-    eggs=f => [mammal/1] 
-  has_covering=hair => [mammal/3] 
-</​code>​ 
- 
-The "​id3"​ query also generates "​if-then"​ rules. We can see these rules by using "​listing":​ 
-<code prolog> 
-?- listing(if). 
- 
-:- dynamic if/1. 
- 
-if[has_covering=feathers]then bird. 
-if[has_covering=scales,​ gills=f]then reptile. 
-if[has_covering=scales,​ gills=t]then fish. 
-if[has_covering=none,​ eggs=t]then amphibian. 
-if[has_covering=none,​ eggs=f]then mammal. 
-if[has_covering=hair]then mammal. 
-</​code>​ 
- 
-NOTE: "​listing"​ is a Prolog built-in procedure that lists the contents of the database 
-by specifying a name. This means that the if-then strucures are already in the database 
-and their name is "​if"​. In fact, the ID3 program when started by "?- id3(1)."​ creates a  
-decision tree (that can be viewed by "?- showtree."​) and loads in the database a set of 
-rules, that equivalently represent the tree. 
- 
-The rule created by ID3 are Prolog facts, but put in a special prefix format. 
-They can also be accessed directly by Prolog queries like these: 
- 
-<code prolog> 
-?- if X then mammal. 
- 
-X = [has_covering=none,​ eggs=f] ; 
- 
-X = [has_covering=hair] ; 
- 
-?- if X then Y. 
- 
-X = [has_covering=feathers] 
-Y = bird ; 
- 
-X = [has_covering=scales,​ gills=f] 
-Y = reptile ; 
- 
-X = [has_covering=scales,​ gills=t] 
-Y = fish ; 
- 
-X = [has_covering=none,​ eggs=t] 
-Y = amphibian ; 
- 
-X = [has_covering=none,​ eggs=f] 
-Y = mammal ; 
- 
-X = [has_covering=hair] 
-Y = mammal 
-</​code>​ 
- 
-If you want to run ID3 with another data file, load the data only (no need to reload id3.pl). 
-However NOTE that after executing "?- id3(1)"​ the previous tree and rules are deleted and  
-replaced with those corresponding to the currently loaded data file. For example: 
- 
-<code prolog> 
-?- [loandata]. 
- 
-?- id3(1),​showtree. 
- 
-  emp=no => [reject/3] 
-  emp=yes 
-    buy=car 
-      married=no => [reject/1] 
-      married=yes => [approve/1] 
-    buy=comp => [approve/7] 
- 
-?- id3(9),​showtree. 
- 
-  emp=no => [reject/3] 
-  emp=yes => [approve/8, reject/1] 
- 
-</​code>​ 
- 
-===== Narzędzia ===== 
- 
-Wykorzystując ​ dane z pliku {{:​pl:​prolog:​prolog_lab:​ml:​animals.pl}},​ przetestuj działanie [[http://​www.aispace.org/​dTree/​index.shtml|Narzędzie do budowania drzew decyzyjnych]]. 
- 
-**Uwaga** ​ Oblicz Entropię i tzw. //Knowledge Gain// dla poszczególnych parametrów i wskaż najlepszy korzeń drzewa decyzyjnego. Czy narzędzie poprawnie wyznaczyło korzeń? 
- 
- 
-===== Porównanie z uczeniem pojęć ===== 
- 
- 
-Assuming that the loandata tree and rules created by "?- id3(1)"​ in the database, try this 
-(note the sets [2, 5, 11, 12] = [2, 11, 12] + [5]): 
- 
-<code prolog> 
-?- if H then [reject/​N],​model(H,​M). 
- 
-H = [emp=no] 
-N = 3 
-M = [2, 11, 12] ; 
- 
-H = [emp=yes, buy=car, married=no] 
-N = 1 
-M = [5] ; 
-</​code>​ 
- 
-This query accesses the two disjunctive components of the hypothesis for class "​reject"​ 
-(the two paths in the decision tree following to the "​reject"​ leaves). 
- 
-Now, obviously we should be able to run VS on each of the two sets and produce 
-consistent hypotheses. For the first part, [2, 11, 12] we put these examples in the 
-beginning and all from class "​approve"​ to follow (we may also leave the other "​reject"​ 
-example at the end). Let's call this new data file loandata1.pl and load it into the  
-database. 
-<code prolog> 
-?- [loandata1]. 
-</​code>​ 
- 
-We may try "​listing"​ to verify our data: 
- 
-<code prolog> 
-?- listing(example). 
- 
-example(2, reject, [emp=no, buy=comp, sex=f, married=yes]). 
-example(11, reject, [emp=no, buy=comp, sex=m, married=yes]). 
-example(12, reject, [emp=no, buy=car, sex=f, married=yes]). 
-example(1, approve, [emp=yes, buy=comp, sex=f, married=no]). 
-example(3, approve, [emp=yes, buy=comp, sex=m, married=no]). 
-example(4, approve, [emp=yes, buy=car, sex=f, married=yes]). 
-example(6, approve, [emp=yes, buy=comp, sex=f, married=yes]). 
-example(7, approve, [emp=yes, buy=comp, sex=f, married=no]). 
-example(8, approve, [emp=yes, buy=comp, sex=m, married=no]). 
-example(9, approve, [emp=yes, buy=comp, sex=m, married=yes]). 
-example(10, approve, [emp=yes, buy=comp, sex=m, married=yes]). 
-example(5, reject, [emp=yes, buy=car, sex=f, married=no]). 
-</​code>​ 
- 
-Now, we load and run the candiate elimination algorithm (vs.pl) in batch mode. 
- 
-<code prolog> 
-?- [vs]. 
- 
-?- batch. 
-+[no, comp, f, yes] 
-G-set: [[?, ?, ?, ?]] 
-S-set: [[no, comp, f, yes]] 
-+[no, comp, m, yes] 
-G-set: [[?, ?, ?, ?]] 
-S-set: [[no, comp, ?, yes]] 
-+[no, car, f, yes] 
-G-set: [[?, ?, ?, ?]] 
-S-set: [[no, ?, ?, yes]] 
--[yes, comp, f, no] 
-G-set: [[?, ?, ?, yes], [no, ?, ?, ?]] 
-S-set: [[no, ?, ?, yes]] 
--[yes, comp, m, no] 
--[yes, car, f, yes] 
-G-set: [[no, ?, ?, ?]] 
-S-set: [[no, ?, ?, yes]] 
--[yes, comp, f, yes] 
--[yes, comp, f, no] 
--[yes, comp, m, no] 
--[yes, comp, m, yes] 
--[yes, comp, m, yes] 
-+[yes, car, f, no] 
-G-set: [] 
-S-set: [] 
-There is no consistent concept description in this language ! 
-</​code>​ 
- 
-NOTE that the inconsistancy occurs because of the last example (# 5) which is a part 
-of the other hypothesis. The hypothesis created by ID3, [emp=no] is contained 
-in the version space: 
- 
-<code prolog> 
-G-set: [[no, ?, ?, ?]] 
-S-set: [[no, ?, ?, yes]] 
-</​code>​ 
- 
-In fact, it is the one in the G-set. Interestingly,​ we have another one (the S-set). 
-This means that there may be another more specific decision tree, which looks like this: 
- 
-<code prolog> 
-  emp=no ​ 
-    married=yes => reject ​       THIS IS THE NEW BRANCH 
-    married=no => approve ​       THIS HAS TO BE ADDED TO COMPLETE THE SUBTREE 
-  emp=yes 
-    buy=car 
-      married=no => reject 
-      married=yes => approve 
-    buy=comp => approve 
-<​code>​ 
- 
-NOTE that this tree is created MANUALLY. With loandata1.pl the ID3 algorithm will  
-create the previous tree, because the order of the examples does not matter. 
- 
-Let's verify what the new brach covers. First, we need the "​model"​ procedure back 
-(it's in id3.pl). 
- 
-<code prolog> 
-?- [id3]. 
- 
-?- model([emp=no,​married=yes],​M). 
- 
-M = [2, 11, 12] 
-</​code>​ 
- 
-NOTE that this is the same coverage as before. 
- 
-<code prolog> 
-?- model([emp=no,​married=no],​M). 
- 
-M = [] 
-</​code> ​ 
- 
-The brach we added to complete the tree does not cover anything, but this is ok. 
- 
-Similarly the other disjunctive componenet of the ID3's "​reject"​ hypothesis, 
-<code prolog> 
-if [emp=yes, buy=car, married=no] then reject 
-</​code>​ 
- 
-may be learned with VS. First, we create file loandata2.pl with the following ​ 
-order of the examples: [5, 1, 3, 4, 6, 7, 8, 9, 10, 2, 11, 12]. 
- 
-Then running VS: 
- 
-<code prolog> 
-?- batch. 
-+[yes, car, f, no] 
-G-set: [[?, ?, ?, ?]] 
-S-set: [[yes, car, f, no]] 
--[yes, comp, f, no] 
-G-set: [[?, car, ?, ?]] 
-S-set: [[yes, car, f, no]] 
--[yes, comp, m, no] 
--[yes, car, f, yes] 
-G-set: [[?, car, ?, no]] 
-S-set: [[yes, car, f, no]] 
--[yes, comp, f, yes] 
--[yes, comp, f, no] 
--[yes, comp, m, no] 
--[yes, comp, m, yes] 
--[yes, comp, m, yes] 
-+[no, comp, f, yes] 
-G-set: [] 
-S-set: [] 
-There is no consistent concept description in this language ! 
-</​code>​ 
- 
-NOTE that the version space here is bigger: 
-<code prolog> 
-G-set: [[?, car, ?, no]] 
-S-set: [[yes, car, f, no]] 
-</​code>​ 
- 
-It has four hypotheses, among which is the one created by ID3. 
- 
- 
-===== Agorytm OneR ===== 
- 
-<code prolog> 
-?- [id3]. 
-?- [loandata]. 
- 
-?- id3(1),​showtree. 
- 
-  emp=no => [reject/3] 
-  emp=yes 
-    buy=car 
-      married=no => [reject/1] 
-      married=yes => [approve/1] 
-    buy=comp => [approve/7] 
-</​code>​ 
- 
-The id3.pl file defines procedures to compute the entopy and the distribution ​ 
-of classes in a set of examples. For example: 
- 
-<code prolog> 
-?- if H then C, model(H,​M),​entropy(M,​I). 
- 
-H = [emp=no] 
-C = [reject/3] 
-M = [2, 11, 12] 
-I = 0 ; 
- 
-H = [emp=yes, buy=car, married=no] 
-C = [reject/1] 
-M = [5] 
-I = 0 ; 
- 
-H = [emp=yes, buy=car, married=yes] 
-C = [approve/1] 
-M = [4] 
-I = 0 ; 
- 
-H = [emp=yes, buy=comp] 
-C = [approve/7] 
-M = [1, 3, 6, 7, 8, 9, 10] 
-I = 0 
-</​code>​ 
- 
-These are all 100% correct hypotheses and that's way the entropy of the 
-class distribution is 0. If we allow mixed classes at the leaves we may get 
-non-zero entropy. For example: 
- 
-<code prolog> 
-?- id3(2),​showtree. 
- 
-  emp=no => [reject/3] 
-  emp=yes 
-    buy=car => [approve/1, reject/1] 
-    buy=comp => [approve/7] 
- 
-?- if H then C, model(H,​M),​entropy(M,​I). 
- 
-H = [emp=no] 
-C = [reject/3] 
-M = [2, 11, 12] 
-I = 0 ; 
- 
-H = [emp=yes, buy=car] 
-C = [approve/1, reject/1] 
-M = [4, 5] 
-I = 1 ; 
- 
-H = [emp=yes, buy=comp] 
-C = [approve/7] 
-M = [1, 3, 6, 7, 8, 9, 10] 
-I = 0 ; 
-</​code>​ 
- 
-The id3.pl program inlcudes a procedure to compute the class distribution of a set of  
-examples. For example: 
- 
-<code prolog> 
-?- if H then C, model(H,​M),​distr(M,​D). 
- 
-H = [emp=no] 
-C = [reject/3] 
-M = [2, 11, 12] 
-D = [reject/3] ; 
- 
-H = [emp=yes, buy=car] 
-C = [approve/1, reject/1] 
-M = [4, 5] 
-D = [approve/1, reject/1] ; 
- 
-H = [emp=yes, buy=comp] 
-C = [approve/7] 
-M = [1, 3, 6, 7, 8, 9, 10] 
-D = [approve/7] ; 
-</​code>​ 
- 
-This we may see the class distribution and the entropy in a set of examples. 
-For example: 
- 
-<code prolog> 
-?- distr([1,​2,​3,​4],​D),​entropy([1,​2,​3,​4],​I). 
- 
-D = [approve/3, reject/1] 
-I = 0.811278 
-</​code>​ 
- 
-Here is the entropy of a hypothesis that covers mixed class examples: 
- 
-<code prolog> 
-?- model([buy=car],​M),​distr(M,​D),​entropy(M,​I). 
- 
-M = [4, 5, 12] 
-D = [approve/1, reject/2] 
-I = 0.918296 
-</​code>​ 
- 
-The distribution of classes can be used to compute the error. 
-For example, asuming that we assigned the majority class label to 
-the set above (reject), the error will be 1/3 (one wrong classification 
-out of 3). 
- 
-The OneR algorithm generates a one-level decision tree, using an 
-error based attribute selection. The following query can be used to 
-find the OneR decision tree for our loan data ("​attr"​ and "​vals"​ are 
-described in the header of id3.pl. Try the query without writeln and fail to 
-understand better how it works). 
- 
-<code prolog> 
-?- attr(L),​member(A,​L),​vals(A,​W),​nl,​member(V,​W),​model([V],​M),​distr(M,​D),​writeln([V]-D),​fail. 
- 
-[buy=car]-[approve/​1,​ reject/2] 
-[buy=comp]-[approve/​7,​ reject/2] 
- 
-[emp=no]-[reject/​3] 
-[emp=yes]-[approve/​8,​ reject/1] 
- 
-[married=no]-[approve/​4,​ reject/1] 
-[married=yes]-[approve/​4,​ reject/3] 
- 
-[sex=f]-[approve/​4,​ reject/3] 
-[sex=m]-[approve/​4,​ reject/1] 
-</​code>​ 
- 
-The class distributions above allow us to compute the total error in each 
-of the four possible one-level trees: 
- 
-<code prolog> 
-1. buy=car => reject [approve/1, reject/​2] ​ 
-   ​buy=comp => approve [approve/7, reject/2] 
-   ​----------------------------------------- 
-   Error = (1+2)/12 = 0.25 
- 
-2. emp=no => reject [reject/3] 
-   ​emp=yes => approve [approve/8, reject/1] 
-   ​---------------------------------------- 
-   Error = (0+1)/12 = 0.0833333 
- 
-3. married=no => approve [approve/4, reject/1] 
-   ​married=yes => approve [approve/4, reject/3] 
-   ​-------------------------------------------- 
-   Error = (1+3)/12 = 0.333333 
- 
-4. sex=f => approve [approve/4, reject/3] 
-   sex=m => approve [approve/4, reject/1] 
-   ​-------------------------------------- 
-   Error = (3+1)/12 = 0.333333 
-</​code>​ 
- 
-Obviously the best OneR tree is #2, because it has the lowest error rate of 0.0833333. 
-Interestingly the entropy-based attribute selection used by ID3 is in agreement ​ 
-with the error-based one. This is clear from the fact the pruned ID tree is  
-the same as the OneR one. 
  
pl/dydaktyka/ml/lab3.txt · ostatnio zmienione: 2019/06/27 15:50 (edycja zewnętrzna)
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0