2nd Winner of the D4GEN 2024 hackathon organized by the genopole in partnership with Onepoint and AWS.
This project has been conduct within 48h by Maryam Cherradi, Valentin Laforgue and Auguste Gardette.
The discovery of new bioactive molecules is a major challenge in many fields, including health, agronomy and the environment. However, conventional approaches to screening synthetic molecules are costly, time-consuming and environmentally unfriendly.
An alternative is to exploit natural biodiversity as a source of molecules. Indeed, living organisms produce a myriad of compounds with properties acquired over the course of evolution. Unfortunately, only a tiny fraction of this molecular diversity has been characterized to date, and many uncharacterized molecules are available in the COCONUT database.
It is in this context that machine learning approaches open up new perspectives. By combining molecular data available on online databases such as Pubchem and ChEMBL with artificial intelligence methods, it becomes possible to predict the biological activities of as yet unexplored natural molecules and effectively guide their discovery.
Our challenge is therefore to develop tools capable of making the most of existing data resources, in order to accelerate the discovery of natural molecules of interest while preserving biodiversity.
To start working on this project, follow these steps:
- Clone the repository:
git clone [email protected]:Aaramis/BioMiMiC.git
- Create a virtual environment:
conda env create --file environment.yml
- Activate the virtual environment:
conda activate hugging_face
- Using conda:
conda env export > environment.yml
Please checkout config file before running codes. According to your need you will have to adapt your dataset, you can find more details in the notebook
Code to finetune SMILE-BERT to a specific task.
python finetuning.py
Code to screen COCONUT database
python predict.py --COCONUT
Argument | Description |
---|---|
--COCONUT | Flag to trigger COCONUT |
--prediction_path | Path for prediction output |
We starting using django at the beggining but to speed-up the process, we decided to move to streamlit.
To see the website MVP mock-up, please run the following command-line :
cd test_streamlit
streamlit run landing.py