Part of ILLIXR, the Illinios Extended Reality testbed. The audio pipeline is responsible for both recording and playing back spatialized audio for XR.
Clone this repo using the following command:
git clone https://github.com/ILLIXR/audio_pipeline.git --recursive
The audio pipeline relies upon C++17, so clang 5.0+ is required. If you are using a relatively new OS such as Ubuntu 18.04 or later, sudo apt-get install clang
should suffice. If you're on an older OS, you can install the latest version of clang by following the steps on the LLVM nightly packages page. If you already have clang 3.8+ and would not like to upgrade, you can replace -std=c++17
with -std=c++1z
in the Makefile.
Build debug: make
or make solo.dbg.exe
Build release: make solo.opt.exe
If you are switching between builds, please do make deepclean
.
Also note that the release build (-O3) is considerably faster than the debug build.
./solo.dbg.exe <number of 1024-sample-blocks to process> [optional: decode/encode]
- Number of audio blocks to process is a required parameter.
- decode/encode specifies the different audio processing steps, which are specificially designed for profiling. No output would be generated.
If encode or decode is not specified, the code will perform both encoding and decoding on a preset input sound sample file, and output spatialized audio output in output.wav
.
This will generate ~11 seconds of spatialized audio using the two sound samples under ./sample/
: ./solo.dbg.exe 500
This will encode 2000 1024-sample-blocks of audio into HOA format: ./solo.dbg.exe 2000 encode
This will decode (binauralize) 2000 1024-sample-blocks of encoded HOA audio: ./solo.dbg.exe 2000 decode
Note: The input and output are hardcoded to be 48 KHz sample rate, 16-bit PCM wav files. If you want to hear the output sound, limit the number of processed sample blocks so that the output is not longer than input! Otherwise, garbage sound samples would be generated.
audio_pipeline
subscribes to the most recent pose, does ambisonics encoding, spatial zoom and
rotation according to the pose, ambisonics decoding and binauralization to output a block of 1024
sound samples each time at 48000 Hz sample rate. Therefore it has a 21.3ms period to process each
block. If it misses a deadline, it keeps doing its current work for the next deadline.
Currently this component is for profiling purpose only. It does read a pose from illixr, but the pose is not used by spaitial zoom and rotation. Performance-wise, the audio pipeline is input invariant.
This component uses the logger to record its runtime.
- Aysnchronously reads
fast_pose
Submodule libspatialaudio provides the backend library for Ambisonics encoding, decoding, rotation, zoom, and binauralization (HRTF included).
Describes a sound source in the ambisonics sound-field, including the input file for the sound source and its position in the sound-field.
Encapsulate preset processing steps of sound source reading, encoding, rotating, zooming, and decoding.
This code is available under the University of Illinois/NCSA Open Source License. The sound samples provided in ./sample/
are available under the Creative Commons 0 license.