Using Impurity and Depth for Decision Trees Pruning
D. Fournier and B. Cr? milleux e GREYC, CNRS - UPRESA 6072 Universit? de Caen e F-14032 Caen C? dex France e fournier,cremilleux @info.unicaen.fr
Abstract
Most pruning methods for decision trees minimize a classi?cation error rate. In uncertain domains, some subtrees which do not lessen the error rate can be relevant to point out some populations of speci?c interest or to give a representation of a large data ?le. We propose here a new pruning method (called pruning) which takes into account the complexity of sub-trees and which is able to keep sub-trees with leaves yielding to determinate relevant decision rules, even when keeping these ones does not increase the classi?cation ef?ciency.
1 Introduction
Data mining and Knowledge Discovery in Databases (KDD) are ?elds of increasing interest combining databases, arti?cial intelligence, machine learning and statistics. Broadly speaking, the purpose of KDD is to extract from large amounts of data non trivial “nuggets” of information in an easily understandable form. Such discovered knowledge may be for instance regularities or exceptions. In this context, with the growth of databases, methods which explore data - like for example decision trees - are required. Such methods can give a summary of the data (which is easier to analyze than the raw data) or can be used to build a tool (like for example a classi?er) to help a user for many different decision makings. Brie?y, a decision tree is built from a set of training data having attribute values and a class name. The result of the process is represented as a tree which nodes specify attributes and branches specify attribute values. Leaves of the tree correspond to sets of examples with the same class or to elements in which no more attributes are available. Construction of decision trees is described, among others, by Breiman et al. (1984) [1] who present an important and well-known monograph on classi?ca-
? ¤?
? ¤?
?
? ? ¤?
tion trees. A number of standard techniques have been developed, for example like the basic algorithms ID3 [9] and CART [1]. Nevertheless, in many areas, like in medicine, data are uncertain: that means that there are always some examples which escape from the rules. Translated in the context of decision trees, that means these examples seem similar but in fact differ from their classes. In these situations, it is well-known (see [1], [3]) that decision tree algorithms tend to divide nodes having few examples and that the resulting trees tend to be very large and overspeci?ed. Some branches, especially towards the bottom, are present due to sample variability and are statistically meaningless (one can also say that they are due to noise in the sample). Pruning methods (see [1], [8], [9]) try to cut such branches in order to avoid this drawback. In uncertain domains, the understanding of the mechanism of these methods is a key point for their use in practice: in order to achieve a fruitful process of extraction of information, these methods requires declarativity during treatments. We will see in Section 3 that we include this point in our pruning strategy. This paper is organized as follows. Section 2 outlines existing decision trees pruning methods and sets out the question of pruning in uncertain domains. We will see that the principal pruning methods are based on a classi?cation error rate and that it may be a drawback in uncertain domains. So, in Section 3, we propose a quality index (called for Depth-Impurity quality index) which is a trade-off between the depth and the impurity of nodes of a tree. From this index, we infer a new pruning method for decision trees (denoted pruning) which is appropriate in uncertain domains: unlike usual methods, this method is not bound to the possible use of a tree for classi?cation. It is able to give an ef?cient description of a data ?le oriented by a priori classi?cation of its elements and to highlight interesting sub-populations, even if the classi?cation error rate does not decrease. We think that it is a major point for the use of decision trees in uncer-
tain domains. In Section 4, we present examples in a real world domain where we use pruning as a tool to optimize a ?nal decision tree.
3 Depth-Impurity and pruning
quality
index
2 Motivations
The principal methods for pruning decision trees are examined in [6], [8] and [10]. Most of these pruning methods are based on minimizing a classi?cation error rate when each element of the same node is classi?ed in the most frequent class in this node. The latter is estimated with a test ?le or using statistical methods such as crossvalidation or bootstrap. These pruning methods are inferred from situations where the built tree will be used as a classi?er and they systematically discard a sub-tree which doesn’t improve the used classi?cation error rate. Let us consider the subtree depicted in Fig. 1. is the class and it is here bivalued. In each node the ?rst (resp. second) value indicates the number of examples having the ?rst (resp. second) value of . This sub-tree doesn’t lessen the error rate, which is both in its root or in its leaves; nevertheless the sub-tree is of interest since it points out a speci?c population with a constant value of while in the remaining population it’s impossible to predict a value for .
We notice that these conditions are part of the usual framework to properly de?ne suitable attribute selection criteria to build decision trees ([1], [4]). This framework states that a theoretical level, criteria derived from an impurity measure ([7]) perform comparably ([1], [2] and [4])1 . Such criteria are called C.M. criteria (concavemaximum criteria) because an impurity measure, among other characteristics, is de?ned by a concave function. The most commonly used criteria which are the Shannon entropy (in the family of C4.5 software [11]) and the Gini criterion (in CART algorithms [1]), are C.M. criteria. The idea is then to use known properties (based on an impurity measure) of C.M. criteria to de?ne a quality index. In order to better understand the genesis of the quality index that we present below, we have to glance at the question of attribute selection criteria in decision trees and its usual notations. Let us call the class of the examples of a data set, with values , a node and any attribute de?ned on , with values . Let us note a C.M. criterion (see Fig. 2).
?
(90,10)
(79,0)
(11,10)
Y = y1
Y = y2
Figure 1: A tree which could be interesting although it doesn’t decrease the number of errors.
?1
?2
We know (see [1], [7] and [4]) that where are the sub-nodes yielded by , is the rate of in , and is an impurity measure of . can be viewed as a combined measure of impurity of the sub-nodes induced by . The
used here the term of impurity because it is the most usual in the machine learning community ([1] [7]).
1 We
3 1 ) 42$ 0(
$
9
) 1 0( ? # # 2$ 6 7 $ @#) & # E6 DCBA# 7 1 6 @96 8&6 5
Our aim is to propose a pruning method which does not systematically discard a sub-tree which classi?cation error rate is equal to the rate of the root. The following section describes a quality index of a tree which is able to highlight such interesting - in our opinion - subpopulations. This method is not based on a classi?cation error rate and we claim that it is a major point in uncertain domains.
#
Figure 2: Splitting of a node
using an attribute
$
&'% % # !" ?
#
ii) Depth of
is 1.
i) All the leaves of
are pure.
(
$
?
? ¤?
?
?§ ¨?
?
?
?
3.1 Framework for a quality index
Let us formulate the question of a quality index. We claim that the quality index of a tree has its maximum value if and only if the two following conditions are satis?ed:
Y = ym
?m
.
1 2
is decreasing.
Constraint 1 corresponds to the damping process and 2 is necessary to satisfy condition ii) of our framework. If we choose an impurity measure normalized between , as , we will see immediately with these two constraints that the value of is between . The higher the value of , the better quality of . Furthermore, we add the three following constraints:
attributes. 4 5
where to respect constraint 1)
(1)
We thus choose the following function :
h
Proof. duce:
, we de. We thus get from 2: . This equality implies that is an exponential function. From
1 `) h ?
h
m k 3 rnl Rh i e1 `) h
i h d ? 3 X) ? i h d d 3 gfbj?1 h 1 ? P) h? P gfe? ?1 `) h 3 1 ??`) h
h
where are the leaves of , is the rate of in , and is an impurity measure normalized between . Let us note that as is an impurity measure, de?nes a purity measure. The depth of a leaf is computed from the root of the whole tree (and not from the root of ). For instance, in Fig. 3, to calculate the quality of tree of root , the depth of the leaf is with respect . If we come back to the example in Fig. 3 given at the end of the previous section (comparison between trees of root versus the one of root ), we notice that the quality index of the tree of root is lower because it is deeper. This choice to compute the depth allows comparisons between any nodes. By introducing a
? ¤?
? ¤?
We propose here a quality index called (for DepthImpurity) which corresponds to a trade-off between depth and impurity of each leaf of a tree. Equation 1 de?nes the quality index (for DepthImpurity) of a tree (we note either or , the quality index of the tree of root ):
?
? ¤?
)? 1 `#Y¤?
I PA# I d # H # # g? # H # e ? yd) x? Cw 1 v§ 9 9 # g # g& r9 g # 6 # 67 g #) ts qi h1 g e ? 6 3 )? 1 61 E@uPWrp`) PP1 6E`#) 9 fd) 6 7 & c b1 aY¤?
XW? ) 1 ¤? V? ?
? ¤?
#
T US
3.2 The quality index
Constraint 3 means that a leaf which has a depth similar to the total number of attributes is likely to be unreliable, so it seems sensible that the value of its quality is close to the minimum value of . Constraint 4 allows the values of the damping function and of the purity (or impurity) of a leaf to have same rough estimates. Over the achievement of a linear damping, constraint 5 is suggested by algorithmic considerations: we will see that it allows to have a linear computation of as indicated by the remark below. So, as the number of elementary steps in the whole process is limited, the computational cost of pruning (see Section 3.3) is particularly low and it is tractable even with large databases. Moreover, the choice of does not modify the result of the pruning process (see Section 3.3). Translated into a mathematical form, this set of constraints leads to choose an exponential function for .
?? ? ????
v? ? pw v§ ?? 1 X) h ??81 ?VX) h 3 ? ? ??? 1 `) h 0§ ?
3
when
tends towards the total number of
(in practice,
x? C? v§
h
? ¤?
?¤? ? 3 6 8&6 5 7
h
§ ?1 X) h ?
? ?1 d) h 3 ?
h
x? C? v§
value of the criterion in a node re?ects how appropriately the chosen attribute divides the data: it is a way to quantify the quality of a tree of depth 1. For example, theoretical results on C.M. criteria claim that the minimum value is reached if and only if the sub-nodes induced of by are pure with respect to , that has its maximum value if and only the frequency distributions of in and in the sub-nodes induced by are equal. In fact, these criteria, used to quantify the quality of a splitting (that means of a tree of depth 1) can be straightforwardly extended in order to evaluate the quality of a tree of any depth: the sub-nodes of depth 1 induced by are replaced by the leaves of any depth of the tree. Furthermore, we shall not forget that our aim is to offer a suitable pruning method in uncertain domains. In this context, we claim that a deep tree is less relevant than a small: the deeper a tree, the less understandable and reliable. For example, in Fig. 3, we prefer the tree which to the one with root (even has the root denoted if the frequency distributions of are the same on all leaves). On the other hand, both trees with root and have the same number of miss-classi?ed examples, but we prefer the tree with root because it is simpler. So, we have to properly de?ne a quality index which takes into account both impurities and depths of leaves. The next section presents this index.
is able to take into acdamping function (denoted ), count the depth of the leaves: the deeper a leaf, the lower its quality. We are led to address now the question of de?ning the damping function . The argument of is a depth of a node (denoted ). A minimal set of constraints is:
? ¤? h
?
H E#
I PA#
1 ) G$ 0(
$
?
H E#
?
H E#
$ 1 ) G$ 0(
#
Q R#
$
F
(2)
330-220 DI=0.51
0-80
90-10 DI=0.56
90-10 DI=0.46
90-90 DI=0.54
60-30 DI=0.06
79-0
11-10
60-10 DI=0.36
30-0
0-80
90-10 DI=0.40
20-10
49-0
11-10
79-0
11-10
(3)
where
(4)
With (4), (3) becomes :
(5)
?
? ¤?
It follows easily that the de?nition of 1) can be rewritten:
(see Equation
From experiments, we noticed that the degree of pruning is quite bound to the uncertainty embedded in data. In practice, that means the damping process has to be adjusted according to data in order to obtain, in all situations, a relevant number of pruned trees. For that, we introduce a parameter (denoted ) to control the damping process: the higher , the more extensive the pruning stage (i. e. the more sub-trees are cut). Equation (8)
? ¤?
e ? )? 1 @#) 9 f?? 1 aY¤?
?
? ¤?
#
#
Proof. As used previously, of a tree of root . Let us call We have:
are the leaves a son of a node .
(7)
)? 1 XW¤?
)? 1 aY¤?
? ¤?
Remark: with regard to the algorithmic point of view, computation of is not expensive. We can easily write according to the values of sons of : in other words, the computation of is the weighted average of the values of the sons of .
leads to a straightforward way to prune sub-trees: the main idea is to compare with the purity of its root. When is lower than the purity of its root, we infer that the “cost” of (due to its complexity) is less useful than the bene?t of the explanation generated by . So, is cut and replaced by its root which becomes a leaf of the original tree. The cost-complexity method of Breiman [1] uses a similar process (that means a trade-off between a “cost” and a “bene?t” to cut sub-trees), but the cost is based on a classi?cation error rate. So, a straightforward pruning method (that we call pruning because it goes with ) consists in pruning all sub-trees of root of the original tree which satisfy the following condition:
T US
3.3
? ¤?
If we come back to the tree given in Fig. 3, we notice has a greater that the tree which has the root denoted value than the one with root : impurities of leaves of these trees are equal, but the latter is more complex than the former. This has been taken into account by .
By ensuring that is computed for each node in one step, this point will allow a low computational cost for the pruning method. We will now move to this subject.
pruning
? ¤? ! )? 1 ! `#Y¤? 1 ! @#) 7 c 1 `#Y¤? 3 )?
where is the the total number of attributes (let us note that any exponential function can be chosen).
? ¤?
Figure 3: Examples of
values
m }s r | dReq z mt o
yo
m }s r | dEbq p mt o
m eq v w mt o
xm o
s r Ebq u mt o
{o
@m?bq Ebq ms t r o ss mt r o
)? 1 aY¤?
? ¤?
m eq v aymt o
#) 1 ! `#) 7 3 g @! 1 g `#) 7 e1 ? 7 ? ! `#Y¤? i g @) AP) g @) ! 7 # e ? # )? ? 3 ?c 1 m t? ? n ? 1 l ? Bo ?g?? a?dk h P1 9 1 ? ?
i g `#) Ad) e ? ml ?rt? o ???? a?dk h 1 1 9 ?n ? ?
mo z o
? ¤?
| Bo
H E#
! # g& #D g # ? ¤?
s r Ebq X{mt o
m eq v tw Bo Q R#
p Bo
XW¤? )? ? V? 1
sEbq r uBo t
#) ? 1 ! #`#) 7 3 ! @WV? 1 g @) 7 ? c e1 ?
sEbq r tx Bo # ~ ? ¤?
40-20
(6)
gives the damping function updated by this parameter (as usual, is the total number of attributes): with k
(8)
4.2 Results
The initial large tree (see Fig. 5) has 29 nodes with pure leaves. Its quality (0.848) is high, especially for a medical domain. This result is due to the relevance of the used attributes.
DI=0.848 -1- (30/245) gram <= 0: DI=0.818 | -3- (16/243) prot <= 0.95: DI=0.826 | | -5- (9/223) purpura = 0: DI=0.818 | | | -10- (7/24) age <= 1.66: DI=0.722 | | | | -12- (0/14) vs <= 78: I=0 Class = vir | | | | -13- (7/10) vs > 78: DI=0.631 | | | | | -14- (0/7) age <= 0.58: I=0 Class = vir | | | | | -15- (7/3) age > 0.58: DI=0.514 | | | | | | -16- (1/3) fever <= 39.7: I=0.811 Class = vir | | | | | | -17- (6/0) fever > 39.7: I=0 Class = bact | | | -11- (2/199) age > 1.66: DI=0.832 | | | | -18- (0/119) lung = 0: I=0 Class = vir | | | | -19- (0/54) lung = 1: I=0 Class = vir | | | | -20- (0/20) lung = 2: I=0 Class = vir | | | | -21- (2/6) lung = 3: DI=0.797 | | | | | -22- (0/1) season = winter: I=0 Class = vir | | | | | -23- (2/0) season = spring: I=0 Class = bact | | | | | -24- (0/5) season = summer: I=0 Class = vir | | -6- (1/10) purpura = 1: DI=0.873 | | | -26- (0/10) cytol <= 400: I=0 Class = vir | | | -27- (1/0) cytol > 400: I=0 Class = bact | | -7- (0/10) purpura = 2: I=0 Class = vir | | -8- (4/0) purpura = 3: I=0 Class = bact | | -9- (2/0) purpura = 4: I=0 Class = bact | -4- (14/2) prot > 0.95: DI=0.685 | | -28- (2/2) cytol <= 600: I=1 Class = bact | | -29- (12/0) cytol > 600: I=0 Class = bact -2- (54/0) gram > 0: I=0 Class = bact
4 Experiments
We have designed an induction software called UnDeT (for Uncertain Decision Trees) which produces decision trees. Two paradigms of attribute selection criteria are available in UnDeT (the choice of one of these depends on the kind of data and the aim wished by quality inthe user: see [5]). UnDeT indicates dex of each node and prunes trees with pruning. UnDeT is available on the web at the following address: ”http://cush.info.unicaen.fr/UnDeT/”. UnDeT is included in a more general data mining tool-box in which another major part is devoted to the treatment of missing values [12]. In this section we describe the results obtained by running UnDeT on a real world database. We perform UnDeT with the gain criterion (Shannon entropy) [11] because it is the most commonly used. The data set is a medical database coming from the University Hospital at Grenoble (France) and runs on child’s meningitis.
Figure 5: Initial large tree
? ?u
Faced with a child with a case of acute meningitis, the clinician must quickly decide which medical course
Q ?
4.1 Child’s meningitis
the subOn the tree depicted in Fig. 5, let us call tree of root labelled -4- and the sub-tree of root labelled -10-. has an impurity in its root equal to 0.54 (2 miss-classi?ed instances among 16) and has an impurity in its root equal to 0.77 (7 miss-classi?ed in-
Q
? d
? ¤?
With , we ?nd again Equation 2. By varying , pruning produces a family of pruned trees spreading from the initial large tree to the tree restricted to its root. In practice, it should be not easy to select automatically the “best” pruned tree (but it is not the main aim of this stage of this work). Nevertheless, curves of as a function of and as a function of the number of pruned nodes give a pragmatic method to stop the pruning process. Furthermore, if we are eager to obtain automatically a single “best” pruned tree, we can use a procedure requiring a test ?le [1] [10] and the ?nal tree should be the one which maximizes (we will indicate this way in Conclusion). Fig. 4 shows the pruned tree obtained (with ) from the whole tree indicated in Fig. 3. Two sub-trees (of root and ) have been cut. Although the subtrees of root and are identical, only the latter is cut because it is deeper than the former. Moreover, let us note that the frequency distributions of in and are equal, the sub-tree of root is removed (and in not the sub-tree of root ), because is more complex than .
should be taken. Brie?y, the majority of these cases are viral infections for which a simple medical supervision is suf?cient, whereas about one quarter cases are caused by bacteria and need treatment with suitable antibiotics. In typical cases, the diagnosis could be considered as obvious and a few simple rules enable a quasi certain decision. However, nearly one third of these cases are presented with non-typical clinical and biological data: the dif?culty of the diagnosis lies in the fact that the observed attributes, considered separately, have little diagnostic signi?cation. The aim of this section is to study the relevance of pruning in such domains. The used data set is composed of 329 instances, described by 23 (quantitative or qualitative) attributes. The class is bivalued (viral versus bacterial).
? 3 ?
H E#
? ¤?
? ¤?
?
? ¤?
? ? ??
Q R# Q R#
? 3 ? ) m ? i 3 `"! ? rnl ?k? Rh ?1 h ? ¤? H E# I PA# I PA# H E# ? Q R#
H #
~
? ¤? ? Q R#
?
330-220 DI=0.53
0-80
90-10 DI=0.56
90-10
90-90 DI=0.54
79-0
11-10
0-80
90-10
Figure 6: Pruned tree (
keep sub-trees which do not lessen an error rate but which can point out some populations of speci?c interest; yet usual methods are based on a classi?cation error rate. We claim that it is a major point in uncertain domains. Further work has to normalize by taking into account the number of instances of with regard to the root of the whole tree: this improvement will allow to compare properly trees with same frequency distributions of in the nodes but different number of instances. Another direction is to use a test ?le to improve the choice of the “best” pruned tree. We will move then to an other stage of our work, which will be to select a pruned tree which re?ects - in general - the sound knowledge of the
5 Conclusion
We have proposed a quality index for decision trees for uncertain domains which realizes a trade-off between the impurities and the depth of leaves of a tree. Stemming from , we present a pruning method which is able to
? 3 ?
)? 1 aY¤?
stances among 31). Nevertheless, (which equal to 0.722) is higher than (0.685). Furthermore, is deeper than , which also penalized the quality of . is better, because it better explains in?nally leads to a single miss-classi?ed instances ( stance), but is complex (we will come back below to this point with the pruning stage). Fig. 6 represents the pruned tree with . The subtree of root labelled -11- becomes a leaf. This pruning introduces 2 miss-classi?ed instances. This number is not high regarding the 201 instances of the node. Furthermore, in order to properly classify these 2 instances, the initial large tree had to build a complex sub-tree with deep leaves. Such leaves appear not to be very reliable in uncertain domains. Finally, Fig. 7 indicates the pruned tree with . The sub-tree of root labelled -5- becomes a leaf (in this case, the sub-tree is cut: its complexity to classify a single instance is not dependable). It is important to notice that the sub-tree of root labelled -4- is not destroyed: even if it does not decrease the number of miss-classi?ed examples (which is 2 on both root and leaves), this sub-tree highlights a reliable sub-population when the attribute “cytol” is higher than 600 (this result is checked by the medical expert). It is typically the situation that we presented in our motivations (see Sect. 2).
DI=0.829 -1- (30/245) gram <= 0: DI=0.796 | -3- (16/243) prot <= 0.95: DI=0.803 | | -5- (9/223) purpura = 0: DI=0.792 | | | -10- (7/24) age <= 1.66: DI=0.722 | | | | -12- (0/14) vs <= 78: I=0 Class = vir | | | | -13- (7/10) vs > 78: DI=0.631 | | | | | -14- (0/7) age <= 0.58: I=0 Class = vir | | | | | -15- (7/3) age > 0.58: DI=0.514 | | | | | | -16- (1/3) fever <= 39.7: I=0.811 Class = vir | | | | | | -17- (6/0) fever > 39.7: I=0 Class = bact | | | -11- (2/199) age > 1.66: I=0.080 Class = vir | | -6- (1/10) purpura = 1: DI=0.873 | | | -26- (0/10) cytol <= 400: I=0 Class = vir | | | -27- (1/0) cytol > 400: I=0 Class = bact | | -7- (0/10) purpura = 2: I=0 Class = vir | | -8- (4/0) purpura = 3: I=0 Class = bact | | -9- (2/0) purpura = 4: I=0 Class = bact | -4- (14/2) prot > 0.95: DI=0.685 | | -28- (2/2) cytol <= 600: I=1 Class = bact | | -29- (12/0) cytol > 600: I=0 Class = bact -2- (54/0) gram > 0: I=0 Class = bact
? ¤?
Figure 4: Examples of
values
m }s r | dEbq yo
60-30
}s r w y z dEbq xm o
{o
@m?bq ms t r o
?
? ?3 ?
}s r w y z dEbq z o ? 3 ? ? ) ? 1 duaY¤? ? ¤? m eq v tw Bo
mo
Q )? 1 ?aY¤?
p Bo
sEbq r uBo t
sEbq r tx Bo ? d ? d ? ) ? ? duaY¤? ?u Q ? 1 ? ?u ? ?u ? ¤?
)
[7] U. M. Fayyad and K. B. Irani. The attribute selection problem in decision tree generation. In proceedings of Tenth National Conference on Arti?cial Intelligence, pages 104–110, Cambridge, 1992. MA: AAAI Press/MIT Press. [8] J. Mingers. An empirical comparison of pruning methods for decision-tree induction. Machine Learning 4, pages 227–243, 1989. [9] J. R. Quinlan. Induction of decision trees. Machine Learning 1, pages 81–106, 1986.
Figure 7: Pruned tree (
)
studied data.
Acknowledgements
The authors wish to thank Dr P. Francois (University ? Hospital of Grenoble, France) for providing data on child’s meningitis and many valuable comments.
[10] J. R. Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies, 27:221– 234, 1987. [11] J.R Quinlan. C4.5 Programs for machine learning. Morgan Kaufmann, San Mateo, Californie, 1993. [12] A. Ragel and Cr? milleux B. Mvc - a preprocessing e method to deal with missing values. KnowledgeBased Systems, pages 285–291, 1999.
References
[1] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classi?cation and regression trees. Statistics probability series. Wadsworth, Belmont, 1984. [2] W. Buntine and T. Niblett. A further comparison of splitting rules for decision-tree induction. Machine Learning 8, pages 75–85, 1992. [3] J. Catlett. Overpruning large decision trees. In proceedings of the Twelfth International Joint Conference on Arti?cial Intelligence IJCAI 91, pages 764– 769, Sydney, Australia, 1991. [4] B. Cr? milleux and C. Robert. A theoretical framee work for decision trees in uncertain domains: Application to medical data sets. In E. Keravnou, C. Garbay, R. Baud, and J. Wyatt, editors, 6th Conference on Arti?cial Intelligence In Medicine Europe (AIME 97), Lecture notes in arti?cial intelligence. N 1211, pages 145–156, Grenoble (France), 1997. Springer-Verlag. [5] B. Cr? milleux, C. Robert, and M. Gaio. Uncertain e domains and decision trees: Ort versus c.m. criteria. In 7th Conference on Information Processing and Management of Uncertainty in Knowledgebased Systems, pages 540–546, Paris (France), 1998. EDK.
?
DI=0.762 -1- (30/245) gram <= 0: DI=0.716 | -3- (16/243) prot <= 0.95: DI=0.718 | | -5- (9/223) purpura = 0: I=0.237 Class = vir | | -6- (1/10) purpura = 1: DI=0.873 | | | -26- (0/10) cytol <= 400: I=0 Class = vir | | | -27- (1/0) cytol > 400: I=0 Class = bact | | -7- (0/10) purpura = 2: I=0 Class = vir | | -8- (4/0) purpura = 3: I=0 Class = bact | | -9- (2/0) purpura = 4: I=0 Class = bact | -4- (14/2) prot > 0.95: DI=0.685 | | -28- (2/2) cytol <= 600: I=1 Class = bact | | -29- (12/0) cytol > 600: I=0 Class = bact -2- (54/0) gram > 0: I=0 Class = bact
[6] F. Esposito, D. Malerba, and G. Semeraro. Decision tree pruning as search in the state space. In P. B. Brazdil, editor, proceedings of European Conference on Machine Learning ECML 93, Lecture notes in arti?cial intelligence. N 667, pages 165– 184, Vienna (Austria), 1993. Springer-Verlag.
? ?3 ?
?