Matthew Rice*     Christian J. Steinmetz*     George Fazekas     Joshua D. Reiss
* These authors contributed equally.
Although the design and application of audio effects is well understood, the inverse problem of removing these effects is significantly more challenging and far less studied. Recently, deep learning has been applied to audio effect removal, however existing approaches have focused on narrow formulations considering only one effect or source type at time. In realistic scenarios, multiple effects are applied with varying source content. This motivates a more general task, which we refer to as general purpose audio effect removal. We developed a dataset for this task using five audio effects across four different sources and used it to train and evaluate a set of existing architectures. We found that no single model performed optimally on all effect types and sources. To address this, we introduced RemFX, an approach designed to mirror the compositionality of apply effects. We first trained a set of the best performing effect-specific removal models and then leveraged an audio effect classification model to dynamically construct a graph of our models at inference. We found our approach outperforms single model baselines, however examples with many effects present remain challenging.
@inproceedings{rice2023general,
title={General purpose audio effect removal},
author={Rice, Matthew and Steinmetz, Christian J. and Fazekas, George and Reiss, Joshua D.},
booktitle={IEEE Workshop on Applications of Signal Processing to Audio and Acoustics},
year={2023}
}
Below are audio examples from the test set including unseen examples of singing voice, drums, bass, and acoustic guitar. We show both the input recording which contains different effects along with the target recording with no effects. Then we compare performance of our models RemFX-Oracle and RemFX-Detect with the monolithic baselines, which include Hybrid Demucs and DCUNet. RemFX-Oracle uses the ground truth effect labels to select the appropriate effect-specific removal models, while RemFX-Detect uses the effect classification model to select the removal models.
N | Effects | Input | Target (No Effects) |
Hybrid Demucs (Défossez et al.) |
DCUNet (Choi et al.) |
RemFX-Oracle (Ours) |
RemFX-Detect (Ours) |
---|---|---|---|---|---|---|---|
N=1 | Reverb | ||||||
Compressor | |||||||
Delay | |||||||
Distortion | |||||||
Chorus | |||||||
N=2 | Reverb + Distortion | ||||||
Reverb + Compressor | |||||||
Chorus + Compressor | |||||||
N=3 | Reverb + Chorus + Compressor | ||||||
Reverb + Chorus + Distortion | |||||||
N=4 | Reverb + Delay + Distortion + Compressor | ||||||
Reverb + Chorus + Delay + Distortion | |||||||
N=5 | Reverb + Chorus + Delay + Distortion + Compressor | ||||||
Reverb + Chorus + Delay + Distortion + Compressor |