TrojanRoom

This is the demo page for TrojanRoom proposed in the paper “Devil in the Room: Triggering Audio Backdoors in the Physical World”.

Abstract

Recent years have witnessed deep learning techniques endowing modern audio systems with powerful abilities. However, the latest studies have revealed its strong reliance on training data raising serious threats from backdoor attacks. Different from most existing works validating the effectiveness of audio backdoors in the digital world, we observe the mismatch between the trigger and backdoor in the physical space by investigating the sound channel distortion. Inspired by this observation, this paper proposes TrojanRoom to bridge the gap between digital and physical audio backdoor attacks. TrojanRoom adopts room impulse response (RIR) as a physical trigger to enable injection-free backdoor activation. By synthesizing dynamic RIRs and poisoning a source class of samples during data augmentation, TrojanRoom allows any adversary to launch an effective and stealthy attack using the specific impulse response in a room. The evaluation shows over 92% and 97% attack success on both state-of-the-art speech command recognition and speaker recognition systems with negligible impact on normal accuracy below 3% at a distance over 5m. The experiments also demonstrate that TrojanRoom could bypass human inspection and voice liveness detection and resist trigger disruption and backdoor erasing.

RIR Trigger

Existing audio backdoor attacks performs trigger injection over the line while ignoring the physical issues. Hence, these attacks degrade in the physical world where the triger is injected over the air. This is due to the sound channel distortion including ambient reverberation and noise, which break the connection between the distorted trigger and implanted backdoor.

over-the-air and over-the-line activation

To bridge the gap between digital and physical audio backdoor attacks, TrojanRoom turns the sound channel itself as a trigger injection path, i.e., channel as a trigger. TrojanRoom models the reverberation as a Room Impulse Response (RIR) and proposes a RIR-based physical trigger to enable an effective, stealthy and injection-free audio backdoor attack in the physical world.

injection-free activation

Baselines Attacks

We compare TrojanRoom with state-of-the-art audio backdoor attacks with different trigger designs:

FreqTone injects a 500ms low-volume single-frequency tone of 1kHz at the end of speech
UltraSound injects a 250ms ultrasound signal of 21kHz at the end of speech
BackNoise injects a 200ms background noise at the beginning of speech
AdvPerturb injects a 200ms adversarial perturbation at a random position of speech

Here is an example of benign sample (speech command “yes”) and poisoned samples with different triggers: baseline