Exercise on binary classifier#
You can find the file “HB_cl1_data.csv” in the same directory as this notebook. It is the data set you know already from the section on regression. However, we have removed the atom types and the hydrogen bond energy. You find two classes in the new feature HB-type:
import pandas as pd
hb_data = pd.read_csv('HB_cl1_data.csv')
hb_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1638 entries, 0 to 1637
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 bo-acc 1638 non-null float64
1 bo-donor 1638 non-null float64
2 q-acc 1638 non-null float64
3 q-donor 1638 non-null float64
4 q-hatom 1638 non-null float64
5 dist-dh 1638 non-null float64
6 dist-ah 1638 non-null float64
7 HB-type 1638 non-null object
dtypes: float64(7), object(1)
memory usage: 102.5+ KB
The “HB-type” has two classes:
hb_data["HB-type"].value_counts()
HB-type
strong 1227
weak 411
Name: count, dtype: int64
The “weak” class indicates hydrogen bonds with a small bond energy. Make a grid search to find a model which can identify weak hydrogen bonds. Take the \(F_1\) score as scoring function to identify the best hyperparameter C for a support vector machine with a RBF kernel. Train the model with optimized hyperparameters on the full data set and determine precision and recall on the test data set.
# your code here