TrojanRoom
This is the demo page for TrojanRoom proposed in the paper “Devil in the Room: Triggering Audio Backdoors in the Physical World”.
Abstract
Recent years have witnessed deep learning techniques endowing modern audio systems with powerful abilities. However, the latest studies have revealed its strong reliance on training data raising serious threats from backdoor attacks. Different from most existing works validating the effectiveness of audio backdoors in the digital world, we observe the mismatch between the trigger and backdoor in the physical space by investigating the sound channel distortion. Inspired by this observation, this paper proposes TrojanRoom to bridge the gap between digital and physical audio backdoor attacks. TrojanRoom adopts room impulse response (RIR) as a physical trigger to enable injection-free backdoor activation. By synthesizing dynamic RIRs and poisoning a source class of samples during data augmentation, TrojanRoom allows any adversary to launch an effective and stealthy attack using the specific impulse response in a room. The evaluation shows over 92% and 97% attack success on both state-of-the-art speech command recognition and speaker recognition systems with negligible impact on normal accuracy below 3% at a distance over 5m. The experiments also demonstrate that TrojanRoom could bypass human inspection and voice liveness detection and resist trigger disruption and backdoor erasing.
RIR Trigger
Existing audio backdoor attacks performs trigger injection over the line while ignoring the physical issues. Hence, these attacks degrade in the physical world where the triger is injected over the air. This is due to the sound channel distortion including ambient reverberation and noise, which break the connection between the distorted trigger and implanted backdoor.
To bridge the gap between digital and physical audio backdoor attacks, TrojanRoom turns the sound channel itself as a trigger injection path, i.e., channel as a trigger. TrojanRoom models the reverberation as a Room Impulse Response (RIR) and proposes a RIR-based physical trigger to enable an effective, stealthy and injection-free audio backdoor attack in the physical world.
Baselines Attacks
We compare TrojanRoom with state-of-the-art audio backdoor attacks with different trigger designs:
- FreqTone injects a 500ms low-volume single-frequency tone of 1kHz at the end of speech
- UltraSound injects a 250ms ultrasound signal of 21kHz at the end of speech
- BackNoise injects a 200ms background noise at the beginning of speech
- AdvPerturb injects a 200ms adversarial perturbation at a random position of speech
Here is an example of benign sample (speech command “yes”) and poisoned samples with different triggers:
Audio Samples
We provide the following benign and poisoned audio samples for comparing the stealthiness of different triggers in terms of human perception.
Short Speech Command
Trigger | Command "yes" | Command "no" | Command "right" |
---|---|---|---|
benign | |||
freqtone | |||
ultrasound | |||
backnoise | |||
advperturb | |||
rir |
Long Speaker Utterance
Trigger | Speaker 0 | Speaker 1 | Speaker 2 |
---|---|---|---|
benign | |||
freqtone | |||
ultrasound | |||
backnoise | |||
advperturb | |||
rir |