Introduction#
Clustering is an unsupervised machine learning technique for grouping of similar objects into sets. It can be employed for example to:
- Identify groups in a data set
- Support analyzing a data set
- Label efficiently a data set in semisupervised approach
- Segment an image by clustering the pixel color
- Reduce the dimension of a data set
- Detect outliers of a data set
- Improve machine learning models by feature engineering
This section will introduce into following cluster approaches which are available in scikit-learn:
K-Means
Gaussian Mixtures
DBScan and HDBScan
An exercise will close this section.