Skip to content

Commit

Permalink
Integrate Whisper CPP and write a wrapper module in Aprapipes (#324)
Browse files Browse the repository at this point in the history
* Add custom port vcpkg for whisper

* Add whisper stream

* Add whisper stream header

* Add whisper cpp to Cmake list

* Add test frame type and minor changes

* Add whisper to vcpkg

* Add vcpkg custom overlay ports to thirdparty

* Modify with whisper option

* Send whisper output as text frames

* revert changes to sound record test

* Add whisper UT

* Fix PS to remove whisper from vcpkg json

* Revert changes to OPTIONS section, remove WHISPER option, rename Whisper source files to generic AudioToTextXForm

* Move pcm to git lfs

* Add pcm and model bin file to lfs

* Fix UT name

* Throw AIP exception for unknown strategy

* Revert sound_record_tests.cpp changes

* Revert changes to vcpkg indentation and remove Whisper option

* Linux -> OFF to ON Windows ON -> OFF

* Add reserve statement for vector
Move constructor impl

* update submodule for pipeline to run

* Update whisper port with install fix

* update submodule

* Update vcpkg version

* Add changes to handle props change

* Improve UT and refactor for changing sample strategy during run time.

* Add apt-get install libx11-dev libgles2-mesa-dev for libepoxy error

* Add memory type check in validate input pins and throw exception if model path changes.

* update submodule

* update vcpkg mysys2

* update submodule

* Address nits

* Export env variable overlay port for building in arm64

* added fix-for-arm64.patch for whisper

* update fix-vcpkg-json.ps1

* update CMakeLists.txt

* update vcpkg url for build

* update whisper tests threshold

* update code formatting

* update whisper test

* added EOS for small buffer size

---------

Co-authored-by: Kushal Jain <[email protected]>
Co-authored-by: Vinayak Y-B <[email protected]>
  • Loading branch information
3 people authored Feb 28, 2024
1 parent 110a2e2 commit 5358310
Show file tree
Hide file tree
Showing 22 changed files with 797 additions and 15 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/CI-Linux-ARM64.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
cuda: 'ON'
prep-cmd: 'echo skipping builder prep as I can not sudo'
cache-path: './none'
cmake-conf-cmd: 'export VCPKG_FORCE_SYSTEM_BINARIES=1 && cmake -B . -DENABLE_ARM64=ON ../base'
cmake-conf-cmd: 'export VCPKG_FORCE_SYSTEM_BINARIES=1 && export VCPKG_OVERLAY_PORTS=../thirdparty/custom-overlay && cmake -B . -DENABLE_ARM64=ON ../base'
nProc: 6
jetson-publish:
needs: jetson-build-test
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-test-lin-container.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ on:
prep-cmd:
type: string
description: 'commands required to be run on a builder to prep it for build'
default: 'sudo apt-get update -qq && sudo apt-get -y install ca-certificates curl zip unzip tar autoconf automake autopoint build-essential flex git-core libass-dev libfreetype6-dev libgnutls28-dev libmp3lame-dev libsdl2-dev libtool libsoup-gnome2.4-dev libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev libncurses5-dev libncursesw5-dev ninja-build pkg-config texinfo wget yasm zlib1g-dev nasm gperf bison python3 python3-pip dos2unix && pip3 install meson'
default: 'sudo apt-get update -qq && sudo apt-get -y install ca-certificates curl zip unzip tar autoconf automake autopoint build-essential flex git-core libass-dev libfreetype6-dev libgnutls28-dev libmp3lame-dev libsdl2-dev libtool libsoup-gnome2.4-dev libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev libncurses5-dev libncursesw5-dev ninja-build pkg-config texinfo wget yasm zlib1g-dev nasm gperf bison python3 python3-pip dos2unix libx11-dev libgles2-mesa-dev && pip3 install meson'
required: false
prep-check-cmd:
type: string
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-test-lin-wsl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ on:
prep-cmd:
type: string
description: 'commands required to be run on a builder to prep it for build'
default: 'sudo apt-get update -qq && sudo apt-get -y install ca-certificates curl zip unzip tar autoconf automake autopoint build-essential flex git-core libass-dev libfreetype6-dev libgnutls28-dev libmp3lame-dev libsdl2-dev libtool libsoup-gnome2.4-dev libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev libncurses5-dev libncursesw5-dev ninja-build pkg-config texinfo wget yasm zlib1g-dev nasm gperf bison python3 python3-pip dos2unix && pip3 install meson'
default: 'sudo apt-get update -qq && sudo apt-get -y install ca-certificates curl zip unzip tar autoconf automake autopoint build-essential flex git-core libass-dev libfreetype6-dev libgnutls28-dev libmp3lame-dev libsdl2-dev libtool libsoup-gnome2.4-dev libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev libncurses5-dev libncursesw5-dev ninja-build pkg-config texinfo wget yasm zlib1g-dev nasm gperf bison python3 python3-pip dos2unix libx11-dev libgles2-mesa-dev && pip3 install meson'
required: false
prep-check-cmd:
type: string
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-test-lin.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ on:
prep-cmd:
type: string
description: 'commands required to be run on a builder to prep it for build'
default: 'sudo apt-get update -qq && sudo apt-get -y install ca-certificates curl zip unzip tar autoconf automake autopoint build-essential flex git-core libass-dev libfreetype6-dev libgnutls28-dev libmp3lame-dev libsdl2-dev libtool libsoup-gnome2.4-dev libva-dev libvdpau-dev libvorbis-dev libxdamage-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev libncurses5-dev libncursesw5-dev ninja-build pkg-config texinfo wget yasm zlib1g-dev nasm gperf bison python3 python3-pip dos2unix && pip3 install meson'
default: 'sudo apt-get update -qq && sudo apt-get -y install ca-certificates curl zip unzip tar autoconf automake autopoint build-essential flex git-core libass-dev libfreetype6-dev libgnutls28-dev libmp3lame-dev libsdl2-dev libtool libsoup-gnome2.4-dev libva-dev libvdpau-dev libvorbis-dev libxdamage-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev libncurses5-dev libncursesw5-dev ninja-build pkg-config texinfo wget yasm zlib1g-dev nasm gperf bison python3 python3-pip dos2unix libx11-dev libgles2-mesa-dev && pip3 install meson'
required: false
prep-check-cmd:
type: string
Expand Down
16 changes: 9 additions & 7 deletions base/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ OPTION(ENABLE_ARM64 "Use this switch to enable ARM64" OFF)
OPTION(ENABLE_WINDOWS "Use this switch to enable WINDOWS" OFF)

set(VCPKG_INSTALL_OPTIONS "--clean-after-build")
set(VCPKG_OVERLAY_PORTS "${CMAKE_CURRENT_SOURCE_DIR}/../thirdparty/custom-overlay")

IF(ENABLE_CUDA)
add_compile_definitions(APRA_CUDA_ENABLED)
ENDIF(ENABLE_CUDA)
Expand All @@ -23,6 +25,7 @@ IF(ENABLE_ARM64)
add_compile_definitions(ARM64)
set(VCPKG_OVERLAY_PORTS ../vcpkg/ports/cudnn)
set(VCPKG_OVERLAY_TRIPLETS ../vcpkg/triplets/community/arm64-linux.cmake)
set(CMAKE_CUDA_COMPILER /usr/local/cuda/bin/nvcc)
ENDIF(ENABLE_ARM64)

#use /MP only for language CXX (and not CUDA) and MSVC for both targets
Expand All @@ -38,8 +41,6 @@ project(APRAPIPES)
message(STATUS $ENV{PKG_CONFIG_PATH}">>>>>> PKG_CONFIG_PATH")

find_package(PkgConfig REQUIRED)


find_package(Boost COMPONENTS system thread filesystem serialization log chrono unit_test_framework REQUIRED)
find_package(JPEG REQUIRED)
find_package(OpenCV CONFIG REQUIRED)
Expand All @@ -50,6 +51,7 @@ find_package(FFMPEG REQUIRED)
find_package(ZXing CONFIG REQUIRED)
find_package(bigint CONFIG REQUIRED)
find_package(SFML COMPONENTS system window audio graphics CONFIG REQUIRED)
find_package(whisper CONFIG REQUIRED)

IF(ENABLE_CUDA)
if((NOT DEFINED CMAKE_CUDA_ARCHITECTURES) OR (CMAKE_CUDA_ARCHITECTURES STREQUAL ""))
Expand Down Expand Up @@ -280,10 +282,9 @@ SET(IP_FILES
src/OverlayFactory.h
src/OverlayFactory.cpp
src/TestSignalGeneratorSrc.cpp
src/AudioToTextXForm.cpp
)




SET(IP_FILES_H
include/HistogramOverlay.h
include/CalcHistogramCV.h
Expand All @@ -306,10 +307,9 @@ SET(IP_FILES_H
include/TextOverlayXForm.h
include/ColorConversionXForm.h
include/Overlay.h
include/AudioToTextXForm.h
)



SET(CUDA_CORE_FILES
src/apra_cudamalloc_allocator.cu
src/apra_cudamallochost_allocator.cu
Expand Down Expand Up @@ -561,6 +561,7 @@ SET(UT_FILES
test/mp4_dts_strategy_tests.cpp
test/overlaymodule_tests.cpp
test/testSignalGeneratorSrc_tests.cpp
test/audioToTextXform_tests.cpp
${ARM64_UT_FILES}
${CUDA_UT_FILES}
)
Expand Down Expand Up @@ -607,6 +608,7 @@ target_link_libraries(aprapipesut
liblzma::liblzma
bigint::bigint
sfml-audio
whisper::whisper
)

IF(ENABLE_WINDOWS)
Expand Down
4 changes: 4 additions & 0 deletions base/fix-vcpkg-json.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ if ($removeCUDA.IsPresent)
$v.dependencies |
Where-Object { $_.name -eq 'opencv4' } |
ForEach-Object { $_.features = $_.features -ne 'cuda' -ne 'cudnn' }

$v.dependencies |
Where-Object { $_.name -eq 'whisper' } |
ForEach-Object { $_.features = $_.features -ne 'cuda' }
}

if($removeOpenCV.IsPresent)
Expand Down
4 changes: 4 additions & 0 deletions base/fix-vcpkg-json.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ if $removeCUDA; then
# Remove "cuda" and "cudnn" features for this "opencv4" instance
v=$(echo "$v" | jq ".dependencies[$index].features |= map(select(. != \"cuda\" and . != \"cudnn\"))")
fi
if [ "$name" == "whisper"]; then
# Remove "cuda" features for this "whisper" instance
v=$(echo "$v" | jq ".dependencies[$index].features |= map(select(. != \"cuda\"))")
fi
done
fi

Expand Down
57 changes: 57 additions & 0 deletions base/include/AudioToTextXForm.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#pragma once

#include "Module.h"

// size of audio to process should be a parameter.
// Cache variable to collect frames for processing

class AudioToTextXFormProps : public ModuleProps
{
public:
enum DecoderSamplingStrategy {
GREEDY,
BEAM_SEARCH
};

DecoderSamplingStrategy samplingStrategy;
std::string modelPath;
int bufferSize;

AudioToTextXFormProps(
DecoderSamplingStrategy _samplingStrategy,
std::string _modelPath,
int _bufferSize);
size_t getSerializeSize();


private:
friend class boost::serialization::access;

template <class Archive>
void serialize(Archive& ar, const unsigned int version);
};

class AudioToTextXForm : public Module
{

public:
AudioToTextXForm(AudioToTextXFormProps _props);
virtual ~AudioToTextXForm();
bool init();
bool term();
void setProps(AudioToTextXFormProps& props);
AudioToTextXFormProps getProps();

protected:
bool process(frame_container& frames);
bool processSOS(frame_sp& frame);
bool validateInputPins();
bool validateOutputPins();
void addInputPin(framemetadata_sp& metadata, string& pinId);
bool handlePropsChange(frame_sp& frame);

private:
void setMetadata(framemetadata_sp& metadata);
class Detail;
boost::shared_ptr<Detail> mDetail;
};
3 changes: 2 additions & 1 deletion base/include/FrameMetadata.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ class FrameMetadata {
HEVC_DATA, //H265
MOTION_VECTOR_DATA,
OVERLAY_INFO_IMAGE,
FACE_LANDMARKS_INFO
FACE_LANDMARKS_INFO,
TEXT
};

enum MemType
Expand Down
Loading

0 comments on commit 5358310

Please sign in to comment.