Project Description

This project demonstrates the application of Machine Learning, specifically deep neural networks, in identifying transcription-factor binding sites in DNA sequences. Utilizing a synthetic dataset of DNA sequences, the goal is to predict the presence of a specific transcription factor indicated by labels (1 or 0) using a developed machine learning model.

Dataset

The dataset consists of DNA sequences that are either regulated by a specific transcription factor (label 1) or not (label 0).

Methodology

Data Loading and Preprocessing: The data is loaded and preprocessed using one-hot encoding to transform the DNA sequences into a matrix format, suitable for machine learning algorithms.

Model Building:

The project involves building and training a deep neural network, with convolutional layers to detect motifs indicative of binding sites.

Model Evaluation:

The model is evaluated on a test dataset, with performance metrics such as accuracy and a confusion matrix being computed.

Saliency Map Analysis:

To interpret the model, saliency maps are computed to identify which parts of the DNA sequences most influence the model's predictions.

Implementation

The repository contains a Jupyter Notebook with detailed steps and code for data preprocessing, model building, and evaluation. Users can experiment with different model architectures and parameters.

Requirements

Python Libraries: TensorFlow, Keras, Pandas, Scikit-Learn, Matplotlib, Seaborn

Instructions

Clone the repository. Install required libraries. Run the Jupyter Notebook, following the instructions and completing the incomplete cells. Experiment with modifying the model and preprocessing steps to explore different outcomes.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
DNA_sequence_classification.ipynb		DNA_sequence_classification.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Description

Dataset

Methodology

Model Building:

Model Evaluation:

Saliency Map Analysis:

Implementation

Requirements

Instructions

About

Releases

Packages

Languages

Faranak-H/ML_DNA-sequence

Folders and files

Latest commit

History

Repository files navigation

Project Description

Dataset

Methodology

Model Building:

Model Evaluation:

Saliency Map Analysis:

Implementation

Requirements

Instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages