
Overview
In 2024, a teammate and I coordinated an international research challenge focused on audio-based machine learning by developing the dataset used in the competition.
Role: Machine Learning Engineer
Introduction
The DCASE Challenge is an annual competition organized by IEEE AASP, with sponsorship from major companies such as Google, HUAWEI, BOSE, HITACHI, MITSUBISHI, SONY, and SAMSUNG.
This challenge encourages innovation in Detection and Classification of Acoustic Scenes and Events (DCASE), spanning multiple audio-related tasks, including:
- Acoustic Scene Classification
- Sound Event Localization and Detection
- Audio Tagging
- Bioacoustics Analysis
- Anomalous Sound Detection
- Language-Based Audio Retrieval
Representing STMicroelectronics, my team and I served as coordinators for Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring.
Contribution
This year, my teammate and I joined the Task 2 coordination board to develop and curate the 2024 dataset. We introduced a novel audio dataset recorded from previously unseen machinery:
- Robotic Arm
- Brushless Motor
Objectives
Designing a competition dataset requires balancing complexity—it must be challenging enough to inspire participants while remaining approachable to avoid discouragement.
Instead of training models on a fixed dataset, we reversed the traditional machine learning problem:
🔹 Fixed Model: The challenge baseline was a classic Autoencoder.
🔹 Variable Dataset: We fine-tuned the data complexity to achieve a targeted baseline model performance.
Approach & Solution
Dataset Complexity Tweaking Mechanism
To optimize dataset difficulty, we applied a structured approach:
1. Post-Processing with Background Noises
To introduce environmental variability, we added different background noise types to simulate real-world conditions. This ensured that models had to generalize beyond clean, ideal recordings.
2. Controlling Data Variability
We modulated dataset complexity based on two key dimensions:
- Domain Shifts1: Varying the number and types of domain shifts made generalization more challenging for models.
- Anomaly Characteristics: The more similar an anomaly was to normal conditions, the harder it became to detect.
Through careful design, we ensured that the dataset mirrored real-world challenges while pushing the limits of unsupervised anomaly detection.
Recording Setup
- Device: STWIN.box (evaluation kit from STMicroelectronics)
- Microphone: IMP23ABSU (analog microphone, 16 kHz sampling rate)
- Environment: Anechoic chambers at STMicroelectronics
To fine-tune dataset complexity, we applied controlled post-processing:
🎛️ Noise Mixing: Clean recordings were mixed with industrial noise at adjustable Signal-to-Noise Ratios (SNRs).
🎛️ Room Effects Simulation: We introduced simulated reverberation to emulate real-life acoustic nonlinearities.
This ensured the dataset closely reflected industrial environments, making the challenge practical and relevant.
Data Pipeline
Implemented in Python and PyTorch, our pipeline followed a structured process:
1️⃣ Data Conversion → Stored in Parquet/HDF5 for efficiency
2️⃣ Dataset Splitting → Training & test sets stratified by domain shifts and anomaly type
3️⃣ Audio Preprocessing → Background noise and reverberation applied
4️⃣ Tracking & Logging → Ensured experiment reproducibility and easy comparisons
Technology Stack
📌 Audio Recording: STWIN.box, IMP23ABSU
📌 ML Experiment Tracking: MLFlow (open-source tracking, visualization, and model comparison)
📌 Dataset Versioning: DVC (open-source data version control)
📌 Machine Learning Framework: PyTorch
📌 Audio Processing: Librosa, Pyroomacoustics
- Different operational and environmental conditions applied on the working machines ↩︎