Introduction#

Clustering is an unsupervised machine learning technique for grouping of similar objects into sets. It can be employed for example to:

  • Identify groups in a data set
  • Support analyzing a data set
  • Label efficiently a data set in semisupervised approach
  • Segment an image by clustering the pixel color
  • Reduce the dimension of a data set
  • Detect outliers of a data set
  • Improve machine learning models by feature engineering

This section will introduce into following cluster approaches which are available in scikit-learn:

  1. K-Means

  2. Gaussian Mixtures

  3. DBSCAN and HDBSCAN

An exercise will close this section. The session was prepared by Dr. Stefan Zahn (IOM). If you have any questions, you can contact him.