Exercise on regression#

You would like to make a regression model based on the k-nearest neighbor regression for “housing_data.csv” found in the same directory as this notebook. An advantage of k-nearest neighbor regression is that it requires no assumptions about the data. Furthermore, it has no training phase. The model is defined by the hyperparameters. A disadvantage is its high memory requirement because you need the full training data set for predictions. Furthermore, it struggles with imbalanced classes.

You should investigate the role of at least following hyperparameters:

  • How many neighbors should be used to predict the price?
  • Is it better to equally weight all neighbors or to give closer neighbors more weight?
Information on the implementation of the k-nearest neighbors regressor in scikit-learn can be found here: Link

Advanced task: Regression models, where predictions are based on close instances (e.g. nearest neighbor regression or support vector machines with a RBF kernel), might produce better results if the number of input features is reduced. This is the case for the example of this exercise. Can you identify the feature, which should be removed from the input features, without trial and error?

# your code here