From the course: Machine Learning with Python: Decision Trees

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

How do classification trees measure impurity?

How do classification trees measure impurity? - Python Tutorial

From the course: Machine Learning with Python: Decision Trees

How do classification trees measure impurity?

- [Instructor] Classification trees are built using a process known as recursive partitioning. The objective of recursive partitioning is to create child partitions that are purer than their parents. Classification tree algorithms use a mathematical formula to quantify the degree of impurity within a partition. Two of the most commonly used measures of impurity are entropy and Gini. Entropy is the preferred measure of impurity for the C5.0 decision tree algorithm. It is a concept that is borrowed from information theory. And when applied to decision trees, represents a quantification of the level of randomness or disorder that exists within a partition. A partition with low entropy is one in which most of the items have the same value or outcome, while a partition with high entropy is one that has no dominant outcome. The entropy of a partition S, with C possible outcomes or labels, is calculated as shown here. To…

Contents