This project was developed in the third semester of my master degree bioinformatics. Overall our goal was to present to the scientific community for the first time a characterization of folding preferences of so called upstream open reading frames.
Therefore we developed our own deep learning approach trained by proteins from the PDB. We compared our results to the current front runner alphafold. Additionally we did basic statistics on uORFs and tried to get more informations by using iupred2a to predict their intrinsic disorder.
(For more information please don't hesitate to contact me.)
Actually this repository lacks of good documentation / commented code, it will be updated if there is time available. Reason is that most scripts just got used once.
- Observed unique properties of uORFs
- Results show uORFs maybe try to avoid defined structures
- Machine learning approaches trained by proteins, not peptides!
- Further research needed (different species, etc.)
Available under presentation.pdf:
- Locate uORFs
- Statistics on uORFs
- Gene Ontology
- Secondary structure prediction - Simple CNN
- Secondary structure prediction - AlphaFold
- Simple CNN vs AlphaFold
- Prediction of intrinsic disorder
The dataset, resulting from this project is available under https://www.kaggle.com/datasets/nigelhartm/arabidopsis-thaliana-uorf