Introduction#

Clustering is an unsupervised machine learning technique for grouping of similar objects into sets. It can be employed for example to:

  • Identify groups in a data set
  • Support analyzing a data set
  • Label efficiently a data set in semisupervised approach
  • Segment an image by clustering the pixel color
  • Reduce the dimension of a data set
  • Detect outliers of a data set
  • Improve machine learning models by feature engineering

This section will introduce into following cluster approaches which are available in scikit-learn:

  1. K-Means

  2. Gaussian Mixtures

  3. DBScan and HDBScan

An exercise will close this section.