Skip to content

Module: IO RNA Structure

JΓΆrg Winkler edited this page Mar 12, 2018 · 6 revisions

Project goals

This project aims to implement I/O routines for

  • fixed interactions with pseudoknot support
  • base pair probability matrix (only read)
  • alignments with consensus structure

Data elements in these files are diverse:

User scenario 1a:

structured_seq_file ssf ("file.db");
for (auto && rec : ssf)
{
    structured_rna<rna4, dot_bracket3> structured_sequence = get<STRSEQ>(rec);
    cout << get<ENERGY>(rec);
}

User scenario 1b:

structured_seq_file ssf ("file.db");
for (auto [ structseq, energy ] : ssf)
{
    cout << structseq;
    cout << energy;
}

User scenario 2:

structured_seq_file ssf ("bpp.ps");
for (auto && rec : ssf)
    vector<vector<double>> matrix = get<BPP>(rec);

Design ideas

A tuple contains all data of the current record. Desired data can be queried either through std::get with enum template argument or structured bindings.

Work packages

  • Design enum of data fields (see rna_record in SeqAn2)
  • Implement the tuple to be returned. Its length is the number of data fields.
  • Write constructors for structured_seq class.
    • structured_seq(filename)
    • structured_seq(stream, file_format)
  • Support Dot_bracket, Stockholm and ViennaRNA ps format as a start (we also need fasta, but this is sequence I/O).

Files to be created

  • seqan3/io/structured_seq/ (directory)
  • structured_seq_file.hpp
  • dot_bracket3_file.hpp
  • vienna_bpp_file.hpp
  • stockholm_file.hpp
  • ... further formats
Clone this wiki locally