?- [id3].
?- [animals].
?- id3(2).
The parameter we specify (2) is used to control the growing of the tree. If a node covers
less than or equal to 2 examples, then it becomes a leaf no matter what the class distribution
is.
To see the tree we use:
?- showtree.
has_covering=feathers => [bird/2]
has_covering=scales
gills=f => [reptile/2]
gills=t => [fish/1]
has_covering=none => [amphibian/1, mammal/1]
has_covering=hair => [mammal/3]
The lists at the leaves show the class distribution in the examples covered. If we change
the threshold we get a different tree. With threshold 1 we will have one-class examples
at the leaves. For example:
?- id3(1),showtree.
has_covering=feathers => [bird/2]
has_covering=scales
gills=f => [reptile/2]
gills=t => [fish/1]
has_covering=none
eggs=t => [amphibian/1]
eggs=f => [mammal/1]
has_covering=hair => [mammal/3]
The „id3” query also generates „if-then” rules. We can see these rules by using „listing”:
?- listing(if).
:- dynamic if/1.
if[has_covering=feathers]then bird.
if[has_covering=scales, gills=f]then reptile.
if[has_covering=scales, gills=t]then fish.
if[has_covering=none, eggs=t]then amphibian.
if[has_covering=none, eggs=f]then mammal.
if[has_covering=hair]then mammal.
NOTE: „listing” is a Prolog built-in procedure that lists the contents of the database
by specifying a name. This means that the if-then strucures are already in the database
and their name is „if”. In fact, the ID3 program when started by „?- id3(1).” creates a
decision tree (that can be viewed by „?- showtree.”) and loads in the database a set of
rules, that equivalently represent the tree.
The rule created by ID3 are Prolog facts, but put in a special prefix format.
They can also be accessed directly by Prolog queries like these:
?- if X then mammal.
X = [has_covering=none, eggs=f] ;
X = [has_covering=hair] ;
?- if X then Y.
X = [has_covering=feathers]
Y = bird ;
X = [has_covering=scales, gills=f]
Y = reptile ;
X = [has_covering=scales, gills=t]
Y = fish ;
X = [has_covering=none, eggs=t]
Y = amphibian ;
X = [has_covering=none, eggs=f]
Y = mammal ;
X = [has_covering=hair]
Y = mammal
If you want to run ID3 with another data file, load the data only (no need to reload id3.pl).
However NOTE that after executing „?- id3(1)” the previous tree and rules are deleted and
replaced with those corresponding to the currently loaded data file. For example:
?- [loandata].
?- id3(1),showtree.
emp=no => [reject/3]
emp=yes
buy=car
married=no => [reject/1]
married=yes => [approve/1]
buy=comp => [approve/7]
?- id3(9),showtree.
emp=no => [reject/3]
emp=yes => [approve/8, reject/1]
Assuming that the loandata tree and rules created by „?- id3(1)” in the database, try this
(note the sets [2, 5, 11, 12] = [2, 11, 12] + [5]):
?- if H then [reject/N],model(H,M).
H = [emp=no]
N = 3
M = [2, 11, 12] ;
H = [emp=yes, buy=car, married=no]
N = 1
M = [5] ;
This query accesses the two disjunctive components of the hypothesis for class „reject”
(the two paths in the decision tree following to the „reject” leaves).
Now, obviously we should be able to run VS on each of the two sets and produce
consistent hypotheses. For the first part, [2, 11, 12] we put these examples in the
beginning and all from class „approve” to follow (we may also leave the other „reject”
example at the end). Let's call this new data file loandata1.pl and load it into the
database.
?- [loandata1].
We may try „listing” to verify our data:
?- listing(example).
example(2, reject, [emp=no, buy=comp, sex=f, married=yes]).
example(11, reject, [emp=no, buy=comp, sex=m, married=yes]).
example(12, reject, [emp=no, buy=car, sex=f, married=yes]).
example(1, approve, [emp=yes, buy=comp, sex=f, married=no]).
example(3, approve, [emp=yes, buy=comp, sex=m, married=no]).
example(4, approve, [emp=yes, buy=car, sex=f, married=yes]).
example(6, approve, [emp=yes, buy=comp, sex=f, married=yes]).
example(7, approve, [emp=yes, buy=comp, sex=f, married=no]).
example(8, approve, [emp=yes, buy=comp, sex=m, married=no]).
example(9, approve, [emp=yes, buy=comp, sex=m, married=yes]).
example(10, approve, [emp=yes, buy=comp, sex=m, married=yes]).
example(5, reject, [emp=yes, buy=car, sex=f, married=no]).
Now, we load and run the candiate elimination algorithm (vs.pl) in batch mode.
?- [vs].
?- batch.
+[no, comp, f, yes]
G-set: [[?, ?, ?, ?]]
S-set: [[no, comp, f, yes]]
+[no, comp, m, yes]
G-set: [[?, ?, ?, ?]]
S-set: [[no, comp, ?, yes]]
+[no, car, f, yes]
G-set: [[?, ?, ?, ?]]
S-set: [[no, ?, ?, yes]]
-[yes, comp, f, no]
G-set: [[?, ?, ?, yes], [no, ?, ?, ?]]
S-set: [[no, ?, ?, yes]]
-[yes, comp, m, no]
-[yes, car, f, yes]
G-set: [[no, ?, ?, ?]]
S-set: [[no, ?, ?, yes]]
-[yes, comp, f, yes]
-[yes, comp, f, no]
-[yes, comp, m, no]
-[yes, comp, m, yes]
-[yes, comp, m, yes]
+[yes, car, f, no]
G-set: []
S-set: []
There is no consistent concept description in this language !
NOTE that the inconsistancy occurs because of the last example (# 5) which is a part
of the other hypothesis. The hypothesis created by ID3, [emp=no] is contained
in the version space:
G-set: [[no, ?, ?, ?]]
S-set: [[no, ?, ?, yes]]
In fact, it is the one in the G-set. Interestingly, we have another one (the S-set).
This means that there may be another more specific decision tree, which looks like this:
emp=no
married=yes => reject THIS IS THE NEW BRANCH
married=no => approve THIS HAS TO BE ADDED TO COMPLETE THE SUBTREE
emp=yes
buy=car
married=no => reject
married=yes => approve
buy=comp => approve
<code>
NOTE that this tree is created MANUALLY. With loandata1.pl the ID3 algorithm will
create the previous tree, because the order of the examples does not matter.
Let's verify what the new brach covers. First, we need the "model" procedure back
(it's in id3.pl).
<code prolog>
?- [id3].
?- model([emp=no,married=yes],M).
M = [2, 11, 12]
NOTE that this is the same coverage as before.
?- model([emp=no,married=no],M).
M = []
The brach we added to complete the tree does not cover anything, but this is ok.
Similarly the other disjunctive componenet of the ID3's „reject” hypothesis,
if [emp=yes, buy=car, married=no] then reject
may be learned with VS. First, we create file loandata2.pl with the following
order of the examples: [5, 1, 3, 4, 6, 7, 8, 9, 10, 2, 11, 12].
Then running VS:
?- batch.
+[yes, car, f, no]
G-set: [[?, ?, ?, ?]]
S-set: [[yes, car, f, no]]
-[yes, comp, f, no]
G-set: [[?, car, ?, ?]]
S-set: [[yes, car, f, no]]
-[yes, comp, m, no]
-[yes, car, f, yes]
G-set: [[?, car, ?, no]]
S-set: [[yes, car, f, no]]
-[yes, comp, f, yes]
-[yes, comp, f, no]
-[yes, comp, m, no]
-[yes, comp, m, yes]
-[yes, comp, m, yes]
+[no, comp, f, yes]
G-set: []
S-set: []
There is no consistent concept description in this language !
NOTE that the version space here is bigger:
G-set: [[?, car, ?, no]]
S-set: [[yes, car, f, no]]
It has four hypotheses, among which is the one created by ID3.
?- [id3].
?- [loandata].
?- id3(1),showtree.
emp=no => [reject/3]
emp=yes
buy=car
married=no => [reject/1]
married=yes => [approve/1]
buy=comp => [approve/7]
The id3.pl file defines procedures to compute the entopy and the distribution
of classes in a set of examples. For example:
?- if H then C, model(H,M),entropy(M,I).
H = [emp=no]
C = [reject/3]
M = [2, 11, 12]
I = 0 ;
H = [emp=yes, buy=car, married=no]
C = [reject/1]
M = [5]
I = 0 ;
H = [emp=yes, buy=car, married=yes]
C = [approve/1]
M = [4]
I = 0 ;
H = [emp=yes, buy=comp]
C = [approve/7]
M = [1, 3, 6, 7, 8, 9, 10]
I = 0
These are all 100% correct hypotheses and that's way the entropy of the
class distribution is 0. If we allow mixed classes at the leaves we may get
non-zero entropy. For example:
?- id3(2),showtree.
emp=no => [reject/3]
emp=yes
buy=car => [approve/1, reject/1]
buy=comp => [approve/7]
?- if H then C, model(H,M),entropy(M,I).
H = [emp=no]
C = [reject/3]
M = [2, 11, 12]
I = 0 ;
H = [emp=yes, buy=car]
C = [approve/1, reject/1]
M = [4, 5]
I = 1 ;
H = [emp=yes, buy=comp]
C = [approve/7]
M = [1, 3, 6, 7, 8, 9, 10]
I = 0 ;
The id3.pl program inlcudes a procedure to compute the class distribution of a set of
examples. For example:
?- if H then C, model(H,M),distr(M,D).
H = [emp=no]
C = [reject/3]
M = [2, 11, 12]
D = [reject/3] ;
H = [emp=yes, buy=car]
C = [approve/1, reject/1]
M = [4, 5]
D = [approve/1, reject/1] ;
H = [emp=yes, buy=comp]
C = [approve/7]
M = [1, 3, 6, 7, 8, 9, 10]
D = [approve/7] ;
This we may see the class distribution and the entropy in a set of examples.
For example:
?- distr([1,2,3,4],D),entropy([1,2,3,4],I).
D = [approve/3, reject/1]
I = 0.811278
Here is the entropy of a hypothesis that covers mixed class examples:
?- model([buy=car],M),distr(M,D),entropy(M,I).
M = [4, 5, 12]
D = [approve/1, reject/2]
I = 0.918296
The distribution of classes can be used to compute the error.
For example, asuming that we assigned the majority class label to
the set above (reject), the error will be 1/3 (one wrong classification
out of 3).
The OneR algorithm generates a one-level decision tree, using an
error based attribute selection. The following query can be used to
find the OneR decision tree for our loan data („attr” and „vals” are
described in the header of id3.pl. Try the query without writeln and fail to
understand better how it works).
?- attr(L),member(A,L),vals(A,W),nl,member(V,W),model([V],M),distr(M,D),writeln([V]-D),fail.
[buy=car]-[approve/1, reject/2]
[buy=comp]-[approve/7, reject/2]
[emp=no]-[reject/3]
[emp=yes]-[approve/8, reject/1]
[married=no]-[approve/4, reject/1]
[married=yes]-[approve/4, reject/3]
[sex=f]-[approve/4, reject/3]
[sex=m]-[approve/4, reject/1]
The class distributions above allow us to compute the total error in each
of the four possible one-level trees:
1. buy=car => reject [approve/1, reject/2]
buy=comp => approve [approve/7, reject/2]
-----------------------------------------
Error = (1+2)/12 = 0.25
2. emp=no => reject [reject/3]
emp=yes => approve [approve/8, reject/1]
----------------------------------------
Error = (0+1)/12 = 0.0833333
3. married=no => approve [approve/4, reject/1]
married=yes => approve [approve/4, reject/3]
--------------------------------------------
Error = (1+3)/12 = 0.333333
4. sex=f => approve [approve/4, reject/3]
sex=m => approve [approve/4, reject/1]
--------------------------------------
Error = (3+1)/12 = 0.333333
Obviously the best OneR tree is #2, because it has the lowest error rate of 0.0833333.
Interestingly the entropy-based attribute selection used by ID3 is in agreement
with the error-based one. This is clear from the fact the pruned ID tree is
the same as the OneR one.