Skip to content

Latest commit

 

History

History
102 lines (69 loc) · 7.74 KB

README.md

File metadata and controls

102 lines (69 loc) · 7.74 KB

Zenodo DOI

Overview of the Material for the [BC]2 2019 workshop

When: September 9th, 09:00 to 16:00

Where: University of Basel, Kollegienhaus building, Petersplatz 1, CH-4001 Basel

Room: Regenzzimmer 111

Organisers and tutors

Amel Ghouila, H3ABioNet

Fotis Psomopoulos, INAB-CERTH, ELIXIR-GR

Overview

Machine learning has emerged as a discipline that enables computers to assist humans in making sense of large and complex data sets. With the drop-in cost of sequencing technologies, large amounts of omics data are being generated and made accessible to researchers. Analysing these complex high-volume data is not trivial and the use of classical tools cannot explore their full potential. Machine learning can thus be very useful in mining large omics datasets to uncover new insights that can advance the field of medicine and improve health care.

The aim of this tutorial is to introduce participants to the Machine learning (ML) taxonomy and common machine learning algorithms. The tutorial will cover the methods being used to analyse different omics data sets by providing a practical context through the use of basic but widely used R and Python libraries. The tutorial will comprise a number of hands on exercises and challenges, where the participants will acquire a first understanding of the standard ML processes as well as the practical skills in applying them on familiar problems and publicly available real-world data sets.

Learning objectives

  • Understand the ML taxonomy and the commonly used machine learning algorithms for analysing “omics” data
  • Understand differences between ML algorithms categories and to which kind of problem they can be applied
  • Understand different applications of ML in different -omics studies
  • Use some basic, widely used Python and R packages for ML
  • Interpret and visualize the results obtained from ML analyses on omics datasets
  • Apply the ML techniques to analyse their own datasets

Audience and requirements

This introductory tutorial is aimed towards bioinformaticians (graduate students and researchers) familiar with different omics data technologies that are interested in applying machine learning to analyse them.

Prerequisites

  • Previous experience in Bioinformatics analysis
  • Familiarity with any programming language (especially R) is preferable but not necessary

Maximum participants: 30

Schedule

Time Details
09:00 - 09:15 Tutorial introduction.

- Get to know each other.
- Setup
Link to material
Part I: Background
09:15 - 10:45 Introduction to ML / DM.

- Data Mining.
- Machine Learning basic concepts.
- Taxonomy of ML and examples of algorithms.
- Deep learning overview.
Link to material
11:00 - 12:30 Applications of ML in Bioinformatics.

- Examples of different ML/DM techniques that can be applied to different NGS data analysis pipelines.
- How to choose the right ML technique?
Link to material
Part II: Hands-on
13:15 - 14:45 Loading and exploring omics data.

- What is Exploratory Data Analysis (EDA) and why is it useful?
- Unsupervised Learning.
- How could unsupervised learning be used to analyze omics data?
Link to material
15:00 - 16:30 Supervised Learning

Classification.
- How could supervised learning be used to analyze omics data.
Regression.
- What if the target variable is numerical rather than categorical?
Link to material
16:30 Closing, discussion and resource sharing

Other examples

If you finish all the exercices and wish to practice on more examples, here are a couple of good examples to help you get more familiar with the different ML techniques and packages.

  1. RNASeq Analysis in R
  2. Use the Iris R built-in data setto run clustering and also some supervised classification and compare results obtained by different methods.

Sources / References

The material in the workshop has been based on the following resources:

  1. ELIXIR CODATA Advanced Bioinformatics Workshop
  2. Machine Learning in R, by Hugo Bowne-Anderson and Jorge Perez de Acha Chavez
  3. Practical Machine Learning in R, by Kyriakos Chatzidimitriou, Themistoklis Diamantopoulos, Michail Papamichail, and Andreas Symeonidis.
  4. Linear models in R, by the Monash Bioinformatics Platform
  5. Relevant blog posts from the R-Bloggers website.

Relevant literature includes:

  1. Pattern Recognition and Machine Learning by Christopher M. Bishop.
  2. Machine learning in bioinformatics, by Pedro Larrañaga et al.
  3. Ten quick tips for machine learning in computational biology, by Davide Chicco
  4. Statistics versus machine learning
  5. Machine learning and systems genomics approaches for multi-omics data
  6. A review on machine learning principles for multi-view biological data integration

License

License: CC BY 4.0

This material is made available under the Creative Commons Attribution 4.0 International license. Please see LICENSE for more details.

Citation

Amel Ghouila, & Fotis E. Psomopoulos. (2019, September 9). Introduction to Machine Learning: Opportunities for advancing omics data analysis (Version v1.0.0). Zenodo. http://doi.org/10.5281/zenodo.3403768