Style Transfer of Audio Effects with Differentiable Signal Processing

Abstract

We present a framework that can impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. We train a deep neural network to analyze an input recording and a style reference recording, and predict the control parameters of audio effects used to render the output. In contrast to past work, we integrate audio effects as differentiable operators in our framework, perform backpropagation through audio effects, and optimize end-to-end using an audio-domain loss. We use a self-supervised training strategy enabling automatic control of audio effects without the use of any labeled or paired training data. We survey a range of existing and new approaches for differentiable signal processing, showing how each can be integrated into our framework while discussing their trade-offs. We evaluate our approach on both speech and music tasks, demonstrating that our approach generalizes both to unseen recordings and even to sample rates different than those seen during training. Our approach produces convincing production style transfer results with the ability to transform input recordings to produced recordings, yielding audio effect control parameters that enable interpretability and user interaction.

Examples

Realistic Style Transfer

Example	Style	Input	Ref	RB-DSP	AD	SPSA	NP-HH
Speech
DAPS 1	Bright to Broadcast
DAPS 2	Telephone to Neutral
DAPS 3	Warm to Bright
Music
MUSDB18 1	Neutral to Warm
MUSDB18 2	Telephone to Neutral
MUSDB18 3	Bright to Broadcast

Synthetic Style Transfer

Seech

Example	Input	Ref	RB-DSP	AD	SPSA	NP	NP-HH	NP-FH	cTCN1	cTCN2
LibriTTS 1
LibriTTS 2
LibriTTS 3
LibriTTS 4
DAPS 1
DAPS 2
DAPS 3
DAPS 4

Music

Example	Input	Ref	RB-DSP	AD	SPSA	NP	NP-HH	NP-FH	cTCN1	cTCN2
Jamendo 1
Jamendo 2
MUSDB18 1
MUSDB18 2

Bibtex

                
    @article{steinmetz2022style,
        title={Style Transfer of Audio Effects with Differentiable Signal Processing}, 
        author={Christian J. Steinmetz and Nicholas J. Bryan and Joshua D. Reiss},
        year={2022},
        eprint={2207.08759},
        archivePrefix={arXiv},
        primaryClass={cs.SD}
    }

Style Transfer of Audio Effects with Differentiable Signal Processing

Abstract

Examples

Realistic Style Transfer

Speech

Music

Synthetic Style Transfer

Seech

Music

Bibtex

Style Transfer of Audio Effects with
Differentiable Signal Processing