Skip to content

Liu-Feng-deeplearning/FastCqt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastCqt

The constant-Q transform feature(CQT), is often useful for some recognized task, especiall for some task about music. This repo is for c++ code for Efficient solver of CQT and also offer pyWrapper for easily using.

How to use it

git clone https://github.com/Liu-Feng-deeplearning/FastCqt.git 
cd FastCqt

prepare for eigen and fftw

cd 3rd
tar -zxf eigen-3.3.7.tar.gz  # eigen do not need to install

tar -zxf fftw-3.3.10.tar.gz
./configure --enable-shared --enable-float  -prefix=${PWD}/build 
make CFLAGS="-fPIC"
make install

cd ../

Now we prepared 3rd for FastCqt

build

mkdir build && cd build && cmake ../
make -j

After building, we get libFastCqt.so

usage

Besides of using c++ code, we also offer py-wrapper for the cqt feature. Example python code can be seen at src/PyWrapper.py

import librosa, FastCqt
import numpy as np
sig, _ = librosa.load("demo/demo.wav", sr=16000)
sig = np.ascontiguousarray(sig, dtype=np.float32)
cpp_cqt_ext = FastCqt(sample_rate=16000, hop_size=0.04)
cpp_cqt = cpp_cqt_ext.compute_cqt(sig)
print("input length:{}".format(np.shape(sig)))
print("output shape:{}".format(np.shape(cpp_cqt)))

we can get result as below:

input length:(1459200,)
output shape:(2280, 96)

Toy Example

Compared with other speech features such as mel, the biggest advantage of the Cqt is that it is very convenient to handle the shift of the pitch. Here is an example of audio "Red high-heeled shoes"(红色高跟鞋 in Chinese) by Yaxun Xiao. We cut of signal with head 30s and compute cqt featrue with hop-size of 0.04, so we get matrix of 96x750(or 750x96). Compare init data and pitch-shifted data(increase by 20%), we can see than time domain pitch-shift equals to frequence domain role alone with freq-axis. And it is useful at some task because we can do data augmentation easily.

Efficiency

This implement is only an approximation algorithm which is to multiply between fft of signal and transformed matrix. It is fast enough and result is verified by some
different task such as Cover-song-identification(CSI). According to my experience, set hop_size as 0.04 is suitable for audio with sample-rate of 16k.

This is benchmark for FastCqt and librosa.cqt:

chunks[s] FastCqt rtf librosa.cqt rtf
5 0.0051 0.0107
15 0.0054 0.0078
45 0.0054 0.0069
135 0.0037 0.0045

Note: when program is run, we use "task -c 1" to make sure cpu usage is about 100%

About

Fast constant-Q transform feature, c++ implement

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published