Part 1: Get Data#

1.1 Introduction to the dataset and study area#

This notebook introduces you to the data of interest. We analyse data from the broadband station SV13 of network X9, which was deployed during the KISS experiment (Klyuchevskoy volcanic group experiment) from 2015 to 2016. Station SV13 is located close to the summit of the Klyuchevskoy volcano on the Kamchatka Peninsula, Russia. During the tutorial we will analyse two days of seismic data recorded on the east component from the 4th of December 2015 to the 6th of December 2015. In this time window, volcanic tremors occur, indicating the activation of the plumbing system. We will use methods of unsupervised learning to explore the seismogram data and find groups of similar types of signals.

Interesting links and papers related to KISS experiment:

The image below shows the setup of the KISS experiment, taken from AGU EOS article about KISS.

Description

1.2 Data download, preprocessing and visualization#

We first setup a connection to the GFZ data server to download the data of interest. We then perform some preprocessing on the downloaded stream object.

# import necessary libraries
from obspy.clients.fdsn import Client
from obspy.core import UTCDateTime
from scipy.signal import spectrogram
import matplotlib.pyplot as plt
import numpy as np
# Connect to the IRIS datacenter to collect waveforms from the datacenter
client = Client("GFZ")
stream = client.get_waveforms(
    network="X9",
    station="SV13",
    location="*",
    channel="HHE",
    starttime=UTCDateTime("2015-12-04T00:00"),
    endtime=UTCDateTime("2015-12-06T00:00"),
)
# preprocess the data
stream.merge(method=1)
stream.detrend("linear")
stream.filter(type="highpass", freq=1)

# plot the data and show information
print(stream)
stream.plot(rasterized=True)
1 Trace(s) in Stream:
X9.SV13..HHE | 2015-12-04T00:00:00.000000Z - 2015-12-06T00:00:00.000000Z | 50.0 Hz, 8640001 samples
../../../../_images/a0001ac6d41506419567b15d944d594d8ec0936a2f3a09b1e2afa4dee4e758d8.png ../../../../_images/a0001ac6d41506419567b15d944d594d8ec0936a2f3a09b1e2afa4dee4e758d8.png
# calculate the spectrogram
data = stream[0].data
dt = stream[0].stats.sampling_rate
nperseg = 4096
f, t, Sxx = spectrogram(data, fs=dt, nperseg=nperseg)
# show the spectrogram
plt.figure(figsize=(10, 5))
plt.pcolormesh(t/3600, f, 10 * np.log10(Sxx), shading="gouraud", cmap='inferno')
plt.title("Spectrogram of SV13")
plt.ylabel("Frequency [Hz]")
plt.xlabel("Time [hr]")
plt.yscale("log")
plt.ylim(0.1, 25)
plt.colorbar(label="Amplitude [dB]")
<matplotlib.colorbar.Colorbar at 0x74fea6d0cbd0>
../../../../_images/ea78b1e099b85484f2a56d794da204b5e3d52caa60b25dde273f56bc9b8cc8c4.png

Before applying something fancy like unsupervised learning, it always makes sense to get an idea of the data by visualizing it (if possible).

What processes do you see in the time series or spectrogram representation? Do both representations show the same information? Why would you apply unsupervised learning here?

1.3 Save the data#

To work with the downloaded data in the following notebooks, we will save the stream object using the build-in obspy function write.

# save the stream data for the other notebooks
stream.write("../data/stream.mseed", format="MSEED")
/home/steinre/anaconda3/envs/scads2025-unsupervised/lib/python3.11/site-packages/obspy/io/mseed/core.py:773: UserWarning: The encoding specified in trace.stats.mseed.encoding does not match the dtype of the data.
A suitable encoding will be chosen.
  warnings.warn(msg, UserWarning)