Skip to content

Commit

Permalink
WebAssembly exmaple for speaker diarization (#1411)
Browse files Browse the repository at this point in the history
  • Loading branch information
csukuangfj authored Oct 10, 2024
1 parent 67349b5 commit 1d061df
Show file tree
Hide file tree
Showing 37 changed files with 1,008 additions and 24 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/wasm-simd-hf-space-de-tts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/wasm-simd-hf-space-en-asr-zipformer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/wasm-simd-hf-space-en-tts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/wasm-simd-hf-space-silero-vad.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
167 changes: 167 additions & 0 deletions .github/workflows/wasm-simd-hf-space-speaker-diarization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
name: wasm-simd-hf-space-speaker-diarization

on:
push:
branches:
- wasm
- wasm-speaker-diarization
tags:
- 'v[0-9]+.[0-9]+.[0-9]+*'

workflow_dispatch:

concurrency:
group: wasm-simd-hf-space-speaker-diarization-${{ github.ref }}
cancel-in-progress: true

jobs:
wasm-simd-hf-space-speaker-diarization:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
shell: bash
run: |
emcc -v
echo "--------------------"
emcc --check
- name: Download model files
shell: bash
run: |
cd wasm/speaker-diarization/assets/
ls -lh
echo "----------"
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-segmentation-models/sherpa-onnx-pyannote-segmentation-3-0.tar.bz2
tar xvf sherpa-onnx-pyannote-segmentation-3-0.tar.bz2
rm sherpa-onnx-pyannote-segmentation-3-0.tar.bz2
mv sherpa-onnx-pyannote-segmentation-3-0/model.onnx ./segmentation.onnx
rm -rf sherpa-onnx-pyannote-segmentation-3-0
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-recongition-models/3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx
mv 3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx ./embedding.onnx
echo "----------"
ls -lh
- name: Build sherpa-onnx for WebAssembly
shell: bash
run: |
./build-wasm-simd-speaker-diarization.sh
- name: collect files
shell: bash
run: |
SHERPA_ONNX_VERSION=v$(grep "SHERPA_ONNX_VERSION" ./CMakeLists.txt | cut -d " " -f 2 | cut -d '"' -f 2)
dst=sherpa-onnx-wasm-simd-${SHERPA_ONNX_VERSION}-speaker-diarization
mv build-wasm-simd-speaker-diarization/install/bin/wasm/speaker-diarization $dst
ls -lh $dst
tar cjfv $dst.tar.bz2 ./$dst
- name: Upload wasm files
uses: actions/upload-artifact@v4
with:
name: sherpa-onnx-wasm-simd-speaker-diarization
path: ./sherpa-onnx-wasm-simd-*.tar.bz2

- name: Release
if: (github.repository_owner == 'csukuangfj' || github.repository_owner == 'k2-fsa') && github.event_name == 'push' && contains(github.ref, 'refs/tags/')
uses: svenstaro/upload-release-action@v2
with:
file_glob: true
overwrite: true
file: ./*.tar.bz2

- name: Publish to ModelScope
# if: false
env:
MS_TOKEN: ${{ secrets.MODEL_SCOPE_GIT_TOKEN }}
uses: nick-fields/retry@v2
with:
max_attempts: 20
timeout_seconds: 200
shell: bash
command: |
SHERPA_ONNX_VERSION=v$(grep "SHERPA_ONNX_VERSION" ./CMakeLists.txt | cut -d " " -f 2 | cut -d '"' -f 2)
git config --global user.email "[email protected]"
git config --global user.name "Fangjun Kuang"
rm -rf ms
export GIT_LFS_SKIP_SMUDGE=1
export GIT_CLONE_PROTECTION_ACTIVE=false
git clone https://www.modelscope.cn/studios/csukuangfj/web-assembly-speaker-diarization-sherpa-onnx.git ms
cd ms
rm -fv *.js
rm -fv *.data
git fetch
git pull
git merge -m "merge remote" --ff origin main
cp -v ../sherpa-onnx-wasm-simd-${SHERPA_ONNX_VERSION}-*/* .
git status
git lfs track "*.data"
git lfs track "*.wasm"
ls -lh
git add .
git commit -m "update model"
git push https://oauth2:${MS_TOKEN}@www.modelscope.cn/studios/csukuangfj/web-assembly-speaker-diarization-sherpa-onnx.git
- name: Publish to huggingface
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
uses: nick-fields/retry@v2
with:
max_attempts: 20
timeout_seconds: 200
shell: bash
command: |
SHERPA_ONNX_VERSION=v$(grep "SHERPA_ONNX_VERSION" ./CMakeLists.txt | cut -d " " -f 2 | cut -d '"' -f 2)
git config --global user.email "[email protected]"
git config --global user.name "Fangjun Kuang"
rm -rf huggingface
export GIT_LFS_SKIP_SMUDGE=1
export GIT_CLONE_PROTECTION_ACTIVE=false
git clone https://csukuangfj:[email protected]/spaces/k2-fsa/web-assembly-speaker-diarization-sherpa-onnx huggingface
ls -lh
cd huggingface
rm -fv *.js
rm -fv *.data
git fetch
git pull
git merge -m "merge remote" --ff origin main
cp -v ../sherpa-onnx-wasm-simd-${SHERPA_ONNX_VERSION}-*/* .
git status
git lfs track "*.data"
git lfs track "*.wasm"
ls -lh
git add .
git commit -m "update model"
git push https://csukuangfj:[email protected]/spaces/k2-fsa/web-assembly-speaker-diarization-sherpa-onnx main
2 changes: 1 addition & 1 deletion .github/workflows/wasm-simd-hf-space-vad-asr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
14 changes: 13 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ option(SHERPA_ONNX_ENABLE_WEBSOCKET "Whether to build webscoket server/client" O
option(SHERPA_ONNX_ENABLE_GPU "Enable ONNX Runtime GPU support" OFF)
option(SHERPA_ONNX_ENABLE_DIRECTML "Enable ONNX Runtime DirectML support" OFF)
option(SHERPA_ONNX_ENABLE_WASM "Whether to enable WASM" OFF)
option(SHERPA_ONNX_ENABLE_WASM_SPEAKER_DIARIZATION "Whether to enable WASM for speaker diarization" OFF)
option(SHERPA_ONNX_ENABLE_WASM_TTS "Whether to enable WASM for TTS" OFF)
option(SHERPA_ONNX_ENABLE_WASM_ASR "Whether to enable WASM for ASR" OFF)
option(SHERPA_ONNX_ENABLE_WASM_KWS "Whether to enable WASM for KWS" OFF)
Expand Down Expand Up @@ -135,6 +136,7 @@ message(STATUS "SHERPA_ONNX_ENABLE_C_API ${SHERPA_ONNX_ENABLE_C_API}")
message(STATUS "SHERPA_ONNX_ENABLE_WEBSOCKET ${SHERPA_ONNX_ENABLE_WEBSOCKET}")
message(STATUS "SHERPA_ONNX_ENABLE_GPU ${SHERPA_ONNX_ENABLE_GPU}")
message(STATUS "SHERPA_ONNX_ENABLE_WASM ${SHERPA_ONNX_ENABLE_WASM}")
message(STATUS "SHERPA_ONNX_ENABLE_WASM_SPEAKER_DIARIZATION ${SHERPA_ONNX_ENABLE_WASM_SPEAKER_DIARIZATION}")
message(STATUS "SHERPA_ONNX_ENABLE_WASM_TTS ${SHERPA_ONNX_ENABLE_WASM_TTS}")
message(STATUS "SHERPA_ONNX_ENABLE_WASM_ASR ${SHERPA_ONNX_ENABLE_WASM_ASR}")
message(STATUS "SHERPA_ONNX_ENABLE_WASM_KWS ${SHERPA_ONNX_ENABLE_WASM_KWS}")
Expand Down Expand Up @@ -196,9 +198,19 @@ else()
add_definitions(-DSHERPA_ONNX_ENABLE_DIRECTML=0)
endif()

if(SHERPA_ONNX_ENABLE_WASM_SPEAKER_DIARIZATION)
if(NOT SHERPA_ONNX_ENABLE_SPEAKER_DIARIZATION)
message(FATAL_ERROR "Please set SHERPA_ONNX_ENABLE_SPEAKER_DIARIZATION to ON if you want to build WASM for speaker diarization")
endif()

if(NOT SHERPA_ONNX_ENABLE_WASM)
message(FATAL_ERROR "Please set SHERPA_ONNX_ENABLE_WASM to ON if you enable WASM for speaker diarization")
endif()
endif()

if(SHERPA_ONNX_ENABLE_WASM_TTS)
if(NOT SHERPA_ONNX_ENABLE_TTS)
message(FATAL_ERROR "Please set SHERPA_ONNX_ENABLE_TTS to ON if you want to build wasm TTS")
message(FATAL_ERROR "Please set SHERPA_ONNX_ENABLE_TTS to ON if you want to build WASM for TTS")
endif()

if(NOT SHERPA_ONNX_ENABLE_WASM)
Expand Down
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ We also have spaces built using WebAssembly. They are listed below:
|VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small |[Click me][wasm-hf-vad-asr-zh-en-paraformer-small]| [地址][wasm-ms-vad-asr-zh-en-paraformer-small]|
|Speech synthesis (English) |[Click me][wasm-hf-tts-piper-en]| [地址][wasm-ms-tts-piper-en]|
|Speech synthesis (German) |[Click me][wasm-hf-tts-piper-de]| [地址][wasm-ms-tts-piper-de]|
|Speaker diarization |[Click me][wasm-hf-speaker-diarization]|[地址][wasm-ms-speaker-diarization]|

### Links for pre-built Android APKs

Expand Down Expand Up @@ -173,6 +174,7 @@ We also have spaces built using WebAssembly. They are listed below:
| Speaker identification (Speaker ID) | [Address][sid-models] |
| Spoken language identification (Language ID)| See multi-lingual [Whisper][Whisper] ASR models from [Speech recognition][asr-models]|
| Punctuation | [Address][punct-models] |
| Speaker segmentation | [Address][speaker-segmentation-models] |

### Useful links

Expand Down Expand Up @@ -261,6 +263,8 @@ Video demo in Chinese: [爆了!炫神教你开打字挂!真正影响胜率
[wasm-ms-tts-piper-en]: https://modelscope.cn/studios/k2-fsa/web-assembly-tts-sherpa-onnx-en
[wasm-hf-tts-piper-de]: https://huggingface.co/spaces/k2-fsa/web-assembly-tts-sherpa-onnx-de
[wasm-ms-tts-piper-de]: https://modelscope.cn/studios/k2-fsa/web-assembly-tts-sherpa-onnx-de
[wasm-hf-speaker-diarization]: https://huggingface.co/spaces/k2-fsa/web-assembly-speaker-diarization-sherpa-onnx
[wasm-ms-speaker-diarization]: https://www.modelscope.cn/studios/csukuangfj/web-assembly-speaker-diarization-sherpa-onnx
[apk-streaming-asr]: https://k2-fsa.github.io/sherpa/onnx/android/apk.html
[apk-streaming-asr-cn]: https://k2-fsa.github.io/sherpa/onnx/android/apk-cn.html
[apk-tts]: https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html
Expand Down Expand Up @@ -303,5 +307,6 @@ Video demo in Chinese: [爆了!炫神教你开打字挂!真正影响胜率
[sid-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models
[slid-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models
[punct-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/punctuation-models
[speaker-segmentation-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-segmentation-models
[GigaSpeech]: https://github.com/SpeechColab/GigaSpeech
[WenetSpeech]: https://github.com/wenet-e2e/WenetSpeech
4 changes: 2 additions & 2 deletions build-wasm-simd-asr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ if [ x"$EMSCRIPTEN" == x"" ]; then
echo "git clone https://github.com/emscripten-core/emsdk.git"
echo "cd emsdk"
echo "git pull"
echo "./emsdk install latest"
echo "./emsdk activate latest"
echo "./emsdk install 3.1.53"
echo "./emsdk activate 3.1.53"
echo "source ./emsdk_env.sh"
exit 1
else
Expand Down
4 changes: 2 additions & 2 deletions build-wasm-simd-kws.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ if [ x"$EMSCRIPTEN" == x"" ]; then
echo "git clone https://github.com/emscripten-core/emsdk.git"
echo "cd emsdk"
echo "git pull"
echo "./emsdk install latest"
echo "./emsdk activate latest"
echo "./emsdk install 3.1.53"
echo "./emsdk activate 3.1.53"
echo "source ./emsdk_env.sh"
exit 1
else
Expand Down
4 changes: 2 additions & 2 deletions build-wasm-simd-nodejs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ if [ x"$EMSCRIPTEN" == x"" ]; then
echo "git clone https://github.com/emscripten-core/emsdk.git"
echo "cd emsdk"
echo "git pull"
echo "./emsdk install latest"
echo "./emsdk activate latest"
echo "./emsdk install 3.1.53"
echo "./emsdk activate 3.1.53"
echo "source ./emsdk_env.sh"
exit 1
else
Expand Down
Loading

0 comments on commit 1d061df

Please sign in to comment.