Multiclass Classification

Multiclass Classification#

Multiclass classifier can distinguish between more than two classes. Some classifier are real multiclass classifiers, for example random forests. Other classifiers are limited to binary classification, for example support vector machines. However, you can use a binary classifier also for multiclass classification by applying multiple binary classifiers.

The one-over-the-rest strategy trains a binary classifier for each class. If you have \(N\)-classes, \(N\) binary classifier must be trained. This gives access to a decision score for each class. The highest probability in all trained binary classifiers is used to predict the label. This strategy is most often employed for binary classifiers.

The one-versus-one strategy trains a binary classifier for each pair of classes. Thus, \(0.5(N^2-N)\) classifiers are needed for \(N\) classes. The predicted class is based on the one which wins most direct comparisons. The main advantage of this approach is that each binary classifier need solely the training instances of the two target classes it should distinguish during training. Thus, this strategy is reasonable for classifiers showing a poor computational scaling with the number of instances \(m\).

Scikit-learn supports you by selecting automatically the most suited strategy for a binary classifier to be applied for a multiclass problem. Let us train a support vector machine for a problem with the classes “group1”, “group2”, “group3”, “group4” and “group5”:

import pandas as pd
data = pd.read_csv('cl2_data.csv')
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2500 entries, 0 to 2499
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   feature1  2500 non-null   float64
 1   feature2  2500 non-null   float64
 2   label     2500 non-null   object 
dtypes: float64(2), object(1)
memory usage: 58.7+ KB

data["label"].value_counts()

label
group1    500
group2    500
group3    500
group4    500
group5    500
Name: count, dtype: int64

from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.compose import make_column_selector
import numpy as np
from sklearn.svm import SVC

train_set, test_set = train_test_split(data, test_size=0.2, random_state=42)

y_train = train_set['label']
X_train = train_set.drop(['label'], axis=1)

num_pipeline = make_pipeline(StandardScaler()) 

preprocessing = ColumnTransformer([("num",num_pipeline, make_column_selector(dtype_include=np.number))])

model_svc = make_pipeline(preprocessing, SVC(kernel='rbf', C=1.0, random_state=42)) 
model_svc.fit(X_train, y_train)

Pipeline(steps=[('columntransformer',
                 ColumnTransformer(transformers=[('num',
                                                  Pipeline(steps=[('standardscaler',
                                                                   StandardScaler())]),
                                                  <sklearn.compose._column_transformer.make_column_selector object at 0x7024f5217fe0>)])),
                ('svc', SVC(random_state=42))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Pipeline

?Documentation for PipelineiFitted

Parameters

	steps	[('columntransformer', ...), ('svc', ...)]
	transform_input	None
	memory	None
	verbose	False

columntransformer: ColumnTransformer

?Documentation for columntransformer: ColumnTransformer

Parameters

	transformers	[('num', ...)]
	remainder	'drop'
	sparse_threshold	0.3
	n_jobs	None
	transformer_weights	None
	verbose	False
	verbose_feature_names_out	True
	force_int_remainder_cols	'deprecated'

num

<sklearn.compose._column_transformer.make_column_selector object at 0x7024f5217fe0>

StandardScaler

?Documentation for StandardScaler

Parameters

	copy	True
	with_mean	True
	with_std	True

SVC

?Documentation for SVC

Parameters

	C	1.0
	kernel	'rbf'
	degree	3
	gamma	'scale'
	coef0	0.0
	shrinking	True
	probability	False
	tol	0.001
	cache_size	200
	class_weight	None
	verbose	False
	max_iter	-1
	decision_function_shape	'ovr'
	break_ties	False
	random_state	42

We can use our model to predict the label for a given set of input features:

check_model_svc = pd.DataFrame([[0.6, 0.3],[0.7,0.4]], columns=['feature1', 'feature2'])
model_svc.predict(check_model_svc)

array(['group5', 'group2'], dtype=object)

In case of SVC, scikit-learn uses the one-versus-one strategy. Remember: Support vector machines with the kernel trick scale between \(O(m^2\cdot n)\) and \(O(m^3\cdot n)\). Thus, SVC is getting really slow for data sets with a large number of instances \(m\) due to poor computational scaling. Therefore, the one-versus-one strategy is reasonable since it allows to reduce the number of instances during training of a single classifier. The decision function will give a list of won duels plus or minus a small tweak (max ±0.33). The random tweak is used to break ties. Therefore, the parameter “random_number” in our SVC is essential for reproducible results.

check_scores_svc = model_svc.decision_function(check_model_svc)
print(check_scores_svc.round(2))

[[ 1.91  3.23 -0.27  0.73  4.29]
 [-0.28  4.29  1.76  0.75  3.27]]

We can use following code to check which class belongs to a given score in the list:

model_svc.classes_

array(['group1', 'group2', 'group3', 'group4', 'group5'], dtype=object)

If we want to change the classifier strategy, we can do this with the OneVsOneClassifier or OneVsRestClassifier:

from sklearn.multiclass import OneVsRestClassifier

model_svc_ovr = make_pipeline(preprocessing, OneVsRestClassifier(SVC(kernel='rbf', C=1.0, random_state=42)))
model_svc_ovr.fit(X_train, y_train)

Pipeline(steps=[('columntransformer',
                 ColumnTransformer(transformers=[('num',
                                                  Pipeline(steps=[('standardscaler',
                                                                   StandardScaler())]),
                                                  <sklearn.compose._column_transformer.make_column_selector object at 0x7024f5217fe0>)])),
                ('onevsrestclassifier',
                 OneVsRestClassifier(estimator=SVC(random_state=42)))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Pipeline

?Documentation for PipelineiFitted

Parameters

	steps	[('columntransformer', ...), ('onevsrestclassifier', ...)]
	transform_input	None
	memory	None
	verbose	False

columntransformer: ColumnTransformer

?Documentation for columntransformer: ColumnTransformer

Parameters

	transformers	[('num', ...)]
	remainder	'drop'
	sparse_threshold	0.3
	n_jobs	None
	transformer_weights	None
	verbose	False
	verbose_feature_names_out	True
	force_int_remainder_cols	'deprecated'

num

<sklearn.compose._column_transformer.make_column_selector object at 0x7024f5217fe0>

StandardScaler

?Documentation for StandardScaler

Parameters

	copy	True
	with_mean	True
	with_std	True

onevsrestclassifier: OneVsRestClassifier

?Documentation for onevsrestclassifier: OneVsRestClassifier

Parameters

	estimator	SVC(random_state=42)
	n_jobs	None
	verbose	0

estimator: SVC

SVC(random_state=42)

SVC

?Documentation for SVC

Parameters

	C	1.0
	kernel	'rbf'
	degree	3
	gamma	'scale'
	coef0	0.0
	shrinking	True
	probability	False
	tol	0.001
	cache_size	200
	class_weight	None
	verbose	False
	max_iter	-1
	decision_function_shape	'ovr'
	break_ties	False
	random_state	42

We get the distance to the decision boundary for each classifier by the decision function:

model_svc_ovr.decision_function(check_model_svc).round(2)

array([[-1.97, -1.32, -2.19, -2.72,  1.02],
       [-2.57,  0.92, -3.38, -2.55, -0.78]])

We can carry out also a cross validation for both strategies and compare the \(F_1\) score:

from sklearn.model_selection import cross_val_score

f1_svc_ovo = cross_val_score(model_svc, X_train, y_train, cv=5, scoring="f1_weighted")
print(f"F1 score of each subset of the cross validation for OvO:\n{f1_svc_ovo}\n")
print("This is an average F1 score of %0.3f for OvO.\n" % (f1_svc_ovo.mean()))

f1_svc_ovr = cross_val_score(model_svc_ovr, X_train, y_train, cv=5, scoring="f1_weighted")
print(f"F1 score of each subset of the cross validation for OvR:\n{f1_svc_ovr}\n")
print("This is an average F1 score of %0.3f for OvR.\n" % (f1_svc_ovr.mean()))

F1 score of each subset of the cross validation for OvO:
[0.92790759 0.94253014 0.93509162 0.95261094 0.91751834]

This is an average F1 score of 0.935 for OvO.

F1 score of each subset of the cross validation for OvR:
[0.93288525 0.94777851 0.93999501 0.94761709 0.91740247]

This is an average F1 score of 0.937 for OvR.

Please note, “f1_weighted” take a weighted average, where the weight is based on the number of instances in each class. If you have imbalanced classes and want to give each class the same weight, use “f1_macro” instead.