Audio Segmentation Using Flattened Local Trimmed Range for Ecological Acoustic Space Analysis
Vega Viera, Giovany
Corrada Bravo, Carlos J. (Consejero)
The acoustic space in a given environment is lled with footprints arising from three processes: biophony, geophony and anthrophony. Bioacoustic research using passive acoustic sensors can result in thousands of recordings. An important component of processing these recordings is to automate signal detection. In this thesis we describe a new spectrogram-based approach for extracting individual audio events. Spectrogram-based audio event detection (AED) relies on separating the spectrogram into background (i.e., noise) and foreground (i.e., signal) classes using a threshold such as a global threshold, a per-band threshold, or one given by a classi er. These methods are either too sensitive to noise, designed for an individual species, or require prior training data. Our goal is to develop an algorithm that is not sensitive to noise, does not need any prior training data and works with any type of audio event. To do this, we propose a spectrogram ltering method, the Flattened Local Trimmed Range (FLTR) method, which models the spectrogram as a mixture of stationary and non-stationary energy processes and mitigates the e ect of the stationary processes; and an unsupervised algorithm that uses this lter to detect audio events. We measured the performance of this algorithm using a set of six thoroughly validated audio recordings and obtained a sensitivity of 94% and a positive predictive value of 89%. These sensitivity and positive predictive values are very high, given that the validated recordings are diverse and obtained from eld conditions. The algorithm was then used to extract audio events in three datasets. Features of these audio events were plotted and showed the unique aspect of the three acoustic communities.