VST plug-in for room impulse responses synthesis via a spectral autoencoder.
Running in REAPER64 on macOS
NeuralReverberator is a VST2 plug-in built with the MATLAB Audio System Toolbox. An autoencoder was trained in Keras using normalized, log-power spectra of room impulse responses downsampled to 16 kHz.
After training the network, the decoder was used to generate log-power spectra for 2000 room impulse responses. This was achieved by navigating through the three dimensional latent space of the model. A left and right channel were generated for each step by slightly perturbing the latent space embedding of the left channel and then passing it through the decoder to generate the right channel.
Librosa was used to perform a transformation to audio samples that match the generated log-power spectra with the use of the Griffin-Lim algorithm. These .wav files were then loaded into a MATLAB struct and then could be loaded in the AudioPlugin class where these room impulses can be convolved with input audio in a DAW.
You can download builds of the compiled VST2 plug-in below:
- 64-bit macOS
- 64-bit Windows
Windows builds will be available shortly (maybe...)
Once you have downloaded the proper build you need to place it in the directory where your DAW expects to find VST2 plugins.
On macOS this is commonly Library/Audio/Plug-ins/VST
or on Windows commonly C:\Program Files\Common Files\VST2
.
After restarting your DAW you should see the plug-in appear as NeuralReverberator
Traversing the latent space of the trained model
The VST plug-in interface provides the user with a number of controls.
Allows the input signal fed to the convolution to be attenuated if clipping is occurring.
Controls to navigate the three dimensional latent space of the decoder. The range 0-9 corresponds to latent vector values between -2 and 2.
Adjust the percieved stereo width by modifying the room impulse reponses used for the left and right channels. Mono mode when Width = 0 and Stereo mode when Width = 1. Other settings allow for extreme width and strange effects.
Apply a delay to the wet signal to increase the time of arrival of first reflections.
Roll off high frequency content with an enable/disable and Fc control.
Roll off low frequency content with an enable/disable and Fc control.
Ratio of the wet and dry signals at the output.
Enable or disable resampling of the 16 kHz room impulse responses to match the current operating sample rate. This currently only supports samples rate of 32 kHz, 44.1 kHz, 48 kHz, and 96 kHz. All other samples rates will not result in resampling regardless of this controls setting.
You can optionally build the plug-in from source with MATLAB. This requires the following Toolboxes:
To build the plugin in the plug-in
directory open NeuralReverberator.m
and then open the audioTestBench
>> audioTestBench NeuralReverberator
This should open a window like the one shown below.
You can demo the plug-in as MATLAB code if you would like. Press the plug-in generation button in the top toolbar to bring up the Generate Audio Plugin
window.
When you are ready to generate the plug-in press Validate and generate
. This process will validate the plug-in code and then compile the MATLAB code down to C/C++ code to operate as a VST 2 plug-in. Note that this process can take up 15 minutes to complete and can consume a significant chunk of memory (up to 15GB).
Note: MATLAB on Windows requires the Visual Studio C++ Compiler in order to build VST plugins
Data preprocessing, training, and output generation is all handled in Python with our good friends Keras and librosa.
The room impulse response dataset to train the autoencoder is not provided due to licensing concerns. Trained models are provided in train/models
. These can be loaded into Keras and used. An encoder and decoder are provided.
The encoder expects a log-power spectogram of size (513, 128, 1) for channels last configuration. It will then produce a 3 dimensional latent representation of shape (1, 1, 1, 3). These can easily be flattened using numpy.reshape()
.
For example, below is the log-power spectra of a ~2 second impulse response.
When passed through the trained encoder it generates a 3 dimentional embedding.
z = -0.68267184 0.5946066 -0.5709323
The decoder expects a 3 dimentional latent representation of shape (1, 1, 1, 3) with float values (roughly in the range of -2 to 2) and produces a spectrogram of shape (513, 128, 1) which can be flattened down to (513, 128) and either plotted via librosa.display.specshow()
or converted back to audio using the ispecgram()
method located in util.py.
When the above latent representation is passed through the decoder it generates a spectrogram.
Notice that this spectrogram closely follows the characteristics of the input, but has been 'blurred'. This is a well established drawback of naive autoencoders. For listening examples check out the blog post on my website.
You can see the presentation slides here: https://csteinmetz1.github.io/NeuralReverberator/slides
This work was presented as part of the Audio Engineering Society MATLAB Plugin Competition held at the 145th AES Convention in NYC where NeuralReveberator was given the Silver Award.