Support Vector Regression#
Support vector regression (SVR) has two central concepts.
SVR tries to fit as many instances as possible on a linear street. The width of the street is controlled by the hyperparameter \(\epsilon\). Thus, instances off the street will be minimized. Adding more training instances on the street will not affect the predictions. Therefore, SVR is less sensitive to outliers and noisy data compared to traditional regression techniques.
The kernel trick makes it possible to train a nonlinear SVR model. In principle, it carries out a nonlinear transformation of the input features to a space where the labels show a better linear behavior. The kernel trick allows this without having to explicitly transform the inputs.
The training time for nonlinear SVR scales commonly between (\(m^2\cdot n\)) and (\(m^3\cdot n\)). Thus, SVR get slow when the number of training instances \(m\) get large.
# code from previous notebooks of this section
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.compose import make_column_selector
import numpy as np
hb_data = pd.read_csv('HB_data.csv')
train_set, test_set = train_test_split(hb_data, test_size=0.2, random_state=42)
y_train = train_set['energy']
X_train = train_set.drop(['energy'], axis=1)
hb_data = pd.read_csv('HB_data.csv')
train_set, test_set = train_test_split(hb_data, test_size=0.2, random_state=42)
num_pipeline = make_pipeline(StandardScaler())
cat_pipeline = make_pipeline(OneHotEncoder(sparse_output=False))
preprocessing = ColumnTransformer([("num",num_pipeline, make_column_selector(dtype_include=np.number)),
("cat",cat_pipeline, make_column_selector(dtype_include=object)),])
# end code from previous notebooks of this section
from sklearn.svm import SVR
model_svr = make_pipeline(preprocessing, SVR(kernel="rbf", gamma='scale', C=5000, epsilon=0.1))
scores = -cross_val_score(model_svr, X_train, y_train, scoring="neg_root_mean_squared_error", cv=5)
print(f"Root mean square error of each validation in kJ/mol:\n{scores}\n")
print("This is an average root mean square error of %0.2f kJ/mol with a standard deviation of %0.2f kJ/mol\n" % (scores.mean(), scores.std()))
Root mean square error of each validation in kJ/mol:
[1.91642635 2.08795435 3.11139885 1.65561589 1.83002044]
This is an average root mean square error of 2.12 kJ/mol with a standard deviation of 0.51 kJ/mol
This is our best model so far. We have selected a radial basis function (RBF) kernel which has following form:
\( k(x_i,x_j)=e^{-\gamma||x_i-x_j||^2} \)
\(||x_i-x_j||\) is the euclidian distance while the hyperparameter \(\gamma\) affects how strong a single training example influence the predictions based on the euclidian distance. In our example, \(\gamma\) is set to a good guess based on the number of input features and the variance of our input features. The strength of the SVR regularization is inversely proportional to the hyperparameter C. Thus, a too small C might result in underfitting while a too large C might lead to overvitting. The RBF kernel is similar to the k-nearest neighbors regression but has the advantage, that it must not store all data points for prediction. Storing the support vectors is sufficient.
Due to the kernel trick, SVR is a versatile and powerful approach for regression tasks of small and medium sized data sets. The choice of kernel is central since it strongly affects how your input features are transformed before the “real” fit is done.