|
DECISION TREES
The aim of Data
Mining is to find the
answer to a question from the data you have gathered.
This question has to be defined from a field in the
data (called 'field to explain'), such as: "Which
entries, in my whole dataset, have a certain value
for the field to explain?".
Then with the data mining tool, you wil discover which criteria has the most significant
impact on the field to explain, that is you can separate
the whole population of the recordset into subpopulations
with diverse behavior according to the field to explain.
Decision trees
are the fastest, easiest way to Data
Mining. Let's take the example of a credit
officer in a credit firm.
His database contains data about last
year's customers who were granted a small loan. Customers
are described by age, wages, bank seniority, number
of children, etc...
One field (named Success) in the database
shows whether the customer had trouble paying back
the loan.
After importing his data into
ALICE d'ISoft, the Credit Officer builds a tree.
|
|

Parent Node
Child Nodes
A tree is composed
of nodes.
The leftmost node is the root
of the tree.
The rightmost nodes are leaves.
Each node, except the leaves, is a parent node which
is linked to its children.
Each node contains a subset of the initial
population. The root contains the whole population
Nodes can display a variety of information:
the number of customers, the number
and percentage of customers with trouble paying back
the loan (value N for the Success field), the number
and percentage of customers with no trouble paying
back the loan (value Y for the Success field), the
graphical chart of the values Y and N...etc....
|
|