Exercise on binary classifier

Exercise on binary classifier#

You can find the file “HB_cl1_data.csv” in the same directory as this notebook. It is the data set you know already from the section on regression. However, we have removed the atom types and the hydrogen bond energy. You find two classes in the new feature HB-type:

import pandas as pd

hb_data = pd.read_csv('HB_cl1_data.csv')
hb_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1638 entries, 0 to 1637
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   bo-acc    1638 non-null   float64
 1   bo-donor  1638 non-null   float64
 2   q-acc     1638 non-null   float64
 3   q-donor   1638 non-null   float64
 4   q-hatom   1638 non-null   float64
 5   dist-dh   1638 non-null   float64
 6   dist-ah   1638 non-null   float64
 7   HB-type   1638 non-null   object 
dtypes: float64(7), object(1)
memory usage: 102.5+ KB

The “HB-type” has two classes:

hb_data["HB-type"].value_counts()

HB-type
strong    1227
weak       411
Name: count, dtype: int64

The “weak” class indicates hydrogen bonds with a small bond energy. Make a grid search to find a model which can identify weak hydrogen bonds. Take the \(F_1\) score as scoring function to identify the best hyperparameter C for a support vector machine with a RBF kernel. Train the model with optimized hyperparameters on the full data set and determine precision and recall on the test data set.

# your code here