Skip to content

This project demonstrates the application of Machine Learning, specifically deep neural networks, in identifying transcription-factor binding sites in DNA sequences.

Notifications You must be signed in to change notification settings

Faranak-H/ML_DNA-sequence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Project Description

This project demonstrates the application of Machine Learning, specifically deep neural networks, in identifying transcription-factor binding sites in DNA sequences. Utilizing a synthetic dataset of DNA sequences, the goal is to predict the presence of a specific transcription factor indicated by labels (1 or 0) using a developed machine learning model.

Dataset

The dataset consists of DNA sequences that are either regulated by a specific transcription factor (label 1) or not (label 0).

Methodology

Data Loading and Preprocessing: The data is loaded and preprocessed using one-hot encoding to transform the DNA sequences into a matrix format, suitable for machine learning algorithms.

Model Building:

The project involves building and training a deep neural network, with convolutional layers to detect motifs indicative of binding sites.

Model Evaluation:

The model is evaluated on a test dataset, with performance metrics such as accuracy and a confusion matrix being computed.

Saliency Map Analysis:

To interpret the model, saliency maps are computed to identify which parts of the DNA sequences most influence the model's predictions.

Implementation

The repository contains a Jupyter Notebook with detailed steps and code for data preprocessing, model building, and evaluation. Users can experiment with different model architectures and parameters.

Requirements

Python Libraries: TensorFlow, Keras, Pandas, Scikit-Learn, Matplotlib, Seaborn

Instructions

Clone the repository. Install required libraries. Run the Jupyter Notebook, following the instructions and completing the incomplete cells. Experiment with modifying the model and preprocessing steps to explore different outcomes.

About

This project demonstrates the application of Machine Learning, specifically deep neural networks, in identifying transcription-factor binding sites in DNA sequences.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published