Decision Trees

build predictive models using the most informative features

interior nodes → query on some descriptive feature of the dataset leaf nodes → decision/predicted classification/predicted value

shallow trees >>

prevent Overfitting

informative features split the dataset into more homogenous or pure sets.

Measures of purity

entropy & information gain
information gain ratio
gini index
variance

Entropy

Entropy
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠

Text Elements
Link to original

Information Gain

Kullback–Leibler divergence

Information Gain
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠

Text Elements
Link to original

Information Gain Ratio

Information Gain has a preference for features with many values

Information Gain Ratio divides information gain with the amount of information used to determine the value of the feature

Information Gain Ratio
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠

Text Elements
Link to original

Gini Index

Gini Index
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠

Text Elements
Link to original

Variance

Variance
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠

Text Elements
Link to original

Used for regression trees

Variance for Regression Trees
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠

Text Elements
Link to original

Continuous Descriptive features

preprocessing like binning
turn into Boolean features using some threshold value
- < threshold value and >= threshold value
- sort the dataset according to the continuous feature
- adjacent instances with different y are possible threshold values
  - threshold value → lies between the continuous feature value of the two instances {((x2 - x1)/2) + x1}
- optimal threshold - compute information gain (or other measure) for each split and select the split with highest information gain

Algorithm

ID3
CART both are greedy algos, they don’t check whether the best possible splits at a high level lead to lowest possible impurity at lower levels.

The likelihood of over-fitting occurring increases as a tree gets deeper because the resulting classifications are based on smaller and smaller subsets as the dataset is partitioned after each feature test in the path.

Pruning

📝Clint's Notes

Explorer

Decision Trees

Measures of purity

Entropy

Entropy

Text Elements

Information Gain

Information Gain

Text Elements

Information Gain Ratio

Information Gain Ratio

Text Elements

Gini Index

Gini Index

Text Elements

Variance

Variance

Text Elements

Variance for Regression Trees

Text Elements

Continuous Descriptive features

Algorithm

Overfitting

Ensemble Decision Trees

Graph View

Table of Contents

Backlinks