demo.mov
A React web-app implementation of the Smith-Waterman algorithm for identifying common molecular subsequences, based on the 1981 paper by T.F. SMITH and M.S. WATERMAN.
- Core Implementation: Written in C for performance (and gets pleasure from debugging segmentation fault).
- WebAssembly: Compiles C code to run in the browser.
- Frontend: React for a modern, responsive user interface.
- Deployment: Hosted on Vercel (TBD).
This project implements the algorithm described in: Identification of Common Molecular Subsequences
- Install Emscripten:
git clone https://github.com/emscripten-core/emsdk.git
cd emsdk
./emsdk install latest
./emsdk activate latest
source ./emsdk_env.sh
emcc --version
- Create a new React project:
npx create-react-app alignement-react
cd alignement-react
Compile the C implementation using Emscripten:
emcc identification_common_molecular.c -s WASM=1 -s MODULARIZE=1 \
-s EXPORTED_RUNTIME_METHODS='["stringToUTF8", "UTF8ToString", "getValue", "ccall", "cwrap"]' \
-s EXPORTED_FUNCTIONS='["_malloc", "_free", "_align_sequences", "_free_alignment_result"]' \
-s ALLOW_MEMORY_GROWTH=1 \
-s ASSERTIONS=1 \
-s STACK_OVERFLOW_CHECK=2 \
-o public/common_molecular.js
This command generates two files:
common_molecular.js
: The JavaScript wrapper for interacting with WebAssembly.common_molecular.wasm
: The compiled WebAssembly module.
Start the React development server:
npm start
To demonstrate the accuracy of our implementation, we can compare our results with the example provided in the original Smith-Waterman paper.
-
Sequence from the paper:
- Seq1: A-A-U-G-C-C-A-U-U-G-A-C-G-G
- Seq2: C-A-G-C-C-U-C-G-C-U-U-A-G
-
Input in our application:
- Seq1: AAUGCCAUUGACGG
- Seq2: CAGCCUCGCUUAG
-
Alignment in the paper:
-G-C-C-A-U-U-G- -G-C-C—U-C-G-
-
Alignment from our implementation:
GCCAUUG GCC-UCG
-
Maximal Similarity Score:
- Paper: 3.30
- Our implementation: 3.34
-
Occurrences: Our result indicates 1 occurrence, which is correct as there is only one optimal alignment in this example.
The results from our implementation are remarkably close to the example given in the original Smith-Waterman paper. The obtained alignment is identical, and the difference in the similarity score is very small (3.34 vs 3.30). This slight difference could be explained by minor variations in scoring parameters or numerical precision used in the implementation.
Overall, our implementation works correctly and produces results consistent with the original Smith-Waterman algorithm.
Contributions to improve the implementation or extend its features are welcome. Please feel free to submit pull requests or open issues for any bugs or enhancements.
MIT License