UMAP

UMAP#

Exercise: Supervised and Unsupervised Machine Learning Methods for Urban Sound dataset#

Workflow for urban sound classification using KNN and CNN classifiers, including UMAP-based feature visualization.

In this exercise, we will apply supervised and unsupervised machine learning techniques to classify urban sounds using the UrbanSound8K dataset. After extracting features from audio files, we will train a K-Nearest Neighbors (KNN) classifier and visualize the data using UMAP (Uniform Manifold Approximation and Projection). Next, we will use the same features to train a Convolutional Neural Network (CNN) and compare its performance to KNN. UMAP will also be used to visualize one of the CNN’s last layers.

Installation#

(Optional) Create a conda environment for the exercise. This is not necessary, but it is a good practice to keep your packages organized and avoid version conflicts.

conda create --name ai4seis python=3.10
conda activate ai4seis

Install the required packages:

pip3 install torch torchvision torchaudio pandas numpy matplotlib seaborn librosa scikit-learn tqdm umap-learn datamapplot

pip3 install ipywidgets widgetsnbextension

Download the UrbanSound8K dataset by following Dataset to download section. Unzeip the dataset and place it in the ./data folder. The folder structure should look like this:

AI4Seismology
├── README.md
├── data                           # Data folder, where the UrbanSound8K dataset is stored
│   ├── UrbanSound8K.csv
│   ├── fold1
│   ├── fold2
│   ├── ...
│   ├── fold9
│   └── fold10
├── images
│   └── concept.png
└── notebooks                      # Jupyter notebook for the exercise
    ├── ML_UrbanSound8K.ipynb      # Main notebook, follows the steps of the exercise
    ├── audio_processing.py        # Audio processing functions
    ├── config.py                  # Configuration file, where the parameters for the exercise are stored
    ├── data_utils.py              # Data loading and preprocessing functions
    └── model_utils.py             # Model training and evaluation functions

Dataset to download#

In this exercise, we will use the UrbanSound8K dataset. This dataset is a collection of urban sound recordings that can be used for sound classification tasks. It contains 8,732 labeled sound excerpts (<=4s) of urban sounds from the following 10 classes:

air_conditioner
car_horn
children_playing
dog_bark, drilling
enginge_idling
gun_shot
jackhammer
siren
street_music

The classes are drawn from the urban sound taxonomy. For a detailed description of the dataset and how it was compiled please refer to Salamon et. al, 2014.

All sound excerpts are taken from field recordings uploaded to Freesound. The dataset is divided into 10 folds, which can be used for cross-validation.

In addition to the sound excerpts, a CSV file containing metadata about each excerpt is also provided.

The dataset can be downloaded from Kaggle:

https://www.kaggle.com/datasets/chrisfilo/urbansound8k?resource=download

Why this dataset in AI4Seismology?#

We are using the UrbanSound8K dataset for this exercise because it is a well-established benchmark for urban sound classification. The dataset contains a diverse range of urban audio recordings that are already labeled, allowing us to work efficiently without needing to spend time on data collection or annotation.

At approximately 6 GB, UrbanSound8K is a manageable size for most computers, making it practical for quick experimentation and model iteration. Its diversity and structure enable us to test and compare various machine learning techniques, including supervised and unsupervised approaches.

Additionally, experience gained with UrbanSound8K is applicable to other sound-related projects, such as seismology using signals from whales, ships, lions, elephants, Taylor Swift concerts or other soundscapes.

Resources#

Here are two links for unsupervised methods for visualisation, which I find quite helpful:

Credits#

This exercise is based on the UrbanSound8K dataset, and the code examples were taken from: