January 14, 2025

Overview

In 2024, a teammate and I coordinated an international research challenge focused on audio-based machine learning by developing the dataset used in the competition.

Role: Machine Learning Engineer

Introduction

The DCASE Challenge is an annual competition organized by IEEE AASP, with sponsorship from major companies such as Google, HUAWEI, BOSE, HITACHI, MITSUBISHI, SONY, and SAMSUNG.

This challenge encourages innovation in Detection and Classification of Acoustic Scenes and Events (DCASE), spanning multiple audio-related tasks, including:

Acoustic Scene Classification
Sound Event Localization and Detection
Audio Tagging
Bioacoustics Analysis
Anomalous Sound Detection
Language-Based Audio Retrieval

Representing STMicroelectronics, my team and I served as coordinators for Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring.

Contribution

This year, my teammate and I joined the Task 2 coordination board to develop and curate the 2024 dataset. We introduced a novel audio dataset recorded from previously unseen machinery:

Robotic Arm
Brushless Motor

Objectives

Designing a competition dataset requires balancing complexity—it must be challenging enough to inspire participants while remaining approachable to avoid discouragement.

Instead of training models on a fixed dataset, we reversed the traditional machine learning problem:

🔹 Fixed Model: The challenge baseline was a classic Autoencoder.
🔹 Variable Dataset: We fine-tuned the data complexity to achieve a targeted baseline model performance.

Approach & Solution

Dataset Complexity Tweaking Mechanism

To optimize dataset difficulty, we applied a structured approach:

1. Post-Processing with Background Noises

To introduce environmental variability, we added different background noise types to simulate real-world conditions. This ensured that models had to generalize beyond clean, ideal recordings.

2. Controlling Data Variability

We modulated dataset complexity based on two key dimensions:

Domain Shifts¹: Varying the number and types of domain shifts made generalization more challenging for models.
Anomaly Characteristics: The more similar an anomaly was to normal conditions, the harder it became to detect.

Through careful design, we ensured that the dataset mirrored real-world challenges while pushing the limits of unsupervised anomaly detection.

Recording Setup

Device: STWIN.box (evaluation kit from STMicroelectronics)
Microphone: IMP23ABSU (analog microphone, 16 kHz sampling rate)
Environment: Anechoic chambers at STMicroelectronics

To fine-tune dataset complexity, we applied controlled post-processing:

🎛️ Noise Mixing: Clean recordings were mixed with industrial noise at adjustable Signal-to-Noise Ratios (SNRs).
🎛️ Room Effects Simulation: We introduced simulated reverberation to emulate real-life acoustic nonlinearities.

This ensured the dataset closely reflected industrial environments, making the challenge practical and relevant.

Data Pipeline

Implemented in Python and PyTorch, our pipeline followed a structured process:

1️⃣ Data Conversion → Stored in Parquet/HDF5 for efficiency
2️⃣ Dataset Splitting → Training & test sets stratified by domain shifts and anomaly type
3️⃣ Audio Preprocessing → Background noise and reverberation applied
4️⃣ Tracking & Logging → Ensured experiment reproducibility and easy comparisons

Technology Stack

📌 Audio Recording: STWIN.box, IMP23ABSU
📌 ML Experiment Tracking: MLFlow (open-source tracking, visualization, and model comparison)
📌 Dataset Versioning: DVC (open-source data version control)
📌 Machine Learning Framework: PyTorch
📌 Audio Processing: Librosa, Pyroomacoustics

Different operational and environmental conditions applied on the working machines ↩︎

Filippo Augusti

Coordination of DCASE 2024 Challenge: The Dataset Development