UMAP#
Exercise: Supervised and Unsupervised Machine Learning Methods for Urban Sound dataset#
Workflow for urban sound classification using KNN and CNN classifiers, including UMAP-based feature visualization.
In this exercise, we will apply supervised and unsupervised machine learning techniques to classify urban sounds using the UrbanSound8K dataset. After extracting features from audio files, we will train a K-Nearest Neighbors (KNN) classifier and visualize the data using UMAP (Uniform Manifold Approximation and Projection). Next, we will use the same features to train a Convolutional Neural Network (CNN) and compare its performance to KNN. UMAP will also be used to visualize one of the CNNβs last layers.
Table of Contents#
Installation#
(Optional) Create a conda environment for the exercise. This is not necessary, but it is a good practice to keep your packages organized and avoid version conflicts.
conda create --name ai4seis python=3.10
conda activate ai4seis
Install the required packages:
pip3 install torch torchvision torchaudio pandas numpy matplotlib seaborn librosa scikit-learn tqdm umap-learn datamapplot
pip3 install ipywidgets widgetsnbextension
Download the UrbanSound8K dataset by following Dataset to download section. Unzeip the dataset and place it in the
./datafolder. The folder structure should look like this:
AI4Seismology
βββ README.md
βββ data # Data folder, where the UrbanSound8K dataset is stored
βΒ Β βββ UrbanSound8K.csv
βΒ Β βββ fold1
βΒ Β βββ fold2
βΒ Β βββ ...
βΒ Β βββ fold9
βΒ Β βββ fold10
βββ images
βΒ Β βββ concept.png
βββ notebooks # Jupyter notebook for the exercise
βββ ML_UrbanSound8K.ipynb # Main notebook, follows the steps of the exercise
βββ audio_processing.py # Audio processing functions
βββ config.py # Configuration file, where the parameters for the exercise are stored
βββ data_utils.py # Data loading and preprocessing functions
βββ model_utils.py # Model training and evaluation functions
Dataset to download#
In this exercise, we will use the UrbanSound8K dataset. This dataset is a collection of urban sound recordings that can be used for sound classification tasks. It contains 8,732 labeled sound excerpts (<=4s) of urban sounds from the following 10 classes:
air_conditioner
car_horn
children_playing
dog_bark, drilling
enginge_idling
gun_shot
jackhammer
siren
street_music
The classes are drawn from the urban sound taxonomy. For a detailed description of the dataset and how it was compiled please refer to Salamon et. al, 2014.
All sound excerpts are taken from field recordings uploaded to Freesound. The dataset is divided into 10 folds, which can be used for cross-validation.
In addition to the sound excerpts, a CSV file containing metadata about each excerpt is also provided.
The dataset can be downloaded from Kaggle:
Why this dataset in AI4Seismology?#
We are using the UrbanSound8K dataset for this exercise because it is a well-established benchmark for urban sound classification. The dataset contains a diverse range of urban audio recordings that are already labeled, allowing us to work efficiently without needing to spend time on data collection or annotation.
At approximately 6 GB, UrbanSound8K is a manageable size for most computers, making it practical for quick experimentation and model iteration. Its diversity and structure enable us to test and compare various machine learning techniques, including supervised and unsupervised approaches.
Additionally, experience gained with UrbanSound8K is applicable to other sound-related projects, such as seismology using signals from whales, ships, lions, elephants, Taylor Swift concerts or other soundscapes.
Resources#
Here are two links for unsupervised methods for visualisation, which I find quite helpful:
Credits#
This exercise is based on the UrbanSound8K dataset, and the code examples were taken from: