In 2024, my colleague and I developed the Industrial Multi-sensor Anomaly Detection under Domain Shift Conditions (IMAD-DS) dataset, which we presented at the DCASE 2024 Workshop. This dataset addresses the challenges of anomaly detection (AD) in industrial settings, particularly under varying operational and environmental conditions known as domain shifts.
Project Overview
The IMAD-DS dataset was created to simulate real-world industrial scenarios where machines operate under different conditions, leading to potential domain shifts that can affect the performance of AD systems. Our goal was to provide a comprehensive dataset that includes multi-sensor data from industrial machines, facilitating the development of robust AD algorithms capable of handling such variability.
Role and Responsibilities
As a Machine Learning Engineer, my responsibilities in this project included:
- Dataset Development: Collecting and processing multi-sensor data from two scaled industrial machines—a robotic arm and a brushless motor—under various operating conditions.
- Simulation of Domain Shifts: Introducing domain shifts by varying operational parameters such as speed and load, and adding different types of background noise to the audio data to simulate environmental changes.
- Benchmarking: Evaluating the impact of these domain shifts on AD performance using an autoencoder model, demonstrating the challenges posed by such variability.
Key Contributions
- Multi-Sensor Data Collection: Utilized IoT sensor boards to gather comprehensive data, including vibration, angular acceleration, and audio signals, providing a rich dataset for AD research.
- Domain Shift Simulation: Systematically varied operational conditions and environmental factors to create realistic domain shifts, enhancing the dataset’s applicability to real-world scenarios.
- Performance Evaluation: Conducted benchmark tests that highlighted the significant impact of domain shifts on AD systems, underscoring the importance of developing robust algorithms.
Recording Setup
- Device: STWIN.box (evaluation kit from STMicroelectronics)
- Microphone: IMP23ABSU (analog microphone, 16 kHz sampling rate)
- Environment: Anechoic chambers at STMicroelectronics
To fine-tune dataset complexity, we applied controlled post-processing:
🎛️ Noise Mixing: Clean recordings were mixed with industrial noise at adjustable Signal-to-Noise Ratios (SNRs).
🎛️ Room Effects Simulation: We introduced simulated reverberation to emulate real-life acoustic nonlinearities.
This ensured the dataset closely reflected industrial environments, making the challenge practical and relevant.
Data Pipeline
Implemented in Python and PyTorch, our pipeline followed a structured process:
1️⃣ Data Conversion → Stored in Parquet/HDF5 for efficiency
2️⃣ Dataset Splitting → Training & test sets stratified by domain shifts and anomaly type
3️⃣ Audio Preprocessing → Background noise and reverberation applied
4️⃣ Tracking & Logging → Ensured experiment reproducibility and easy comparisons
Technology Stack
📌 Recording: STWIN.box, IMP23ABSU, ISM330DHCX (accelerometer and gyroscope)
📌 ML Experiment Tracking: MLFlow (open-source tracking, visualization, and model comparison)
📌 Dataset Versioning: DVC (open-source data version control)
📌 Machine Learning Framework: PyTorch
📌 Audio Processing: Librosa, Pyroomacoustics
Impact and Future Work
The IMAD-DS dataset provides a valuable resource for researchers and practitioners aiming to develop AD systems that are resilient to domain shifts. By highlighting the challenges associated with varying operational and environmental conditions, this work paves the way for future research into domain adaptation and generalization techniques in industrial anomaly detection.
Resources
For a detailed description of the dataset and our findings, please refer to our paper presented at the DCASE 2024 Workshop:
To download the dataset and make your own experiments, refer to the online repository: IMAD-DS