-
Notifications
You must be signed in to change notification settings - Fork 1
Web server
MetaLogo provides a public webserver for users. Users can also deploy their own MetaLogo server in their local network. In this tutorial, we will explain how to use MetaLogo webserver without coding.
First you need to visit http://metalogo.omicsnet.org (could be http://localhost:8050 if you deploy MetaLogo in your own computer), then you will see the following page if nothing goes wrong.
Just as the top menu indicated, you could visit the analysis tab, results tab, tutorial, python package, Journal paper, our Lab website and email us from the top menu.
To start analysis, you can click the Analysis buttons on the page.
On the analysis page, there are four panels for users to custom their analysis. For basic use of MetaLogo, you probably only need to focus on the first panel. More advanced usage requires adjusting the parameters in the last three panels. In this tutorial, we will introduce all the four panels about their parameters and usage. Please note that there are four Submit buttons on the page, they are all the same. Feel free to click any of them after you finish your settings.
The figure above shows the details of the first panel. Red numbers are placed to indicate funtional items of the panel, which we will explain in the following.
-
Input Format
For Input Format, you could choose from Fasta and Fastq formats.
-
Sequence Type
For Sequence Type, you could choose from Auto, DNA, RNA or protein.
The term Auto means MetaLogo will automatically detect the sequence type. Note that if you choose the right sequence type, you need to make sure that your input sequences only involve in valid bases.
-
Basic analysis
For Basic analysis, you could choose from Yes and No.
MetaLogo could perform basic analysis on your input data, such as length distribution, group counts, entropy distribution, group clustering heatmap, sequence pairwise distance distribution, and so on. If you choose No for Baisc analysis, MetaLogo will ignore this analysis and only output the sequence logo.
-
Grouping By
For Grouping By, you can choose from Auto, Length, and Seq Identifier.
This parameter means how do you want MetaLogo to group your sequences to reveal the heterogeneity. If you choose Auto, MetaLogo will perform Multiple sequnce alignment on your sequences, build phylogenetic tree, and grouping sequences based on the tree. If you choose Length, MetaLogo will group sequences you input according to their lengths; if you choose Seq identifier, MetaLogo will identify the group information of each sequence by checking their sequence name to find a patter like group@number-name. Below is a valid example:
>seq1 group@1-firstgroup ATACAGACAGAGACACAGGGTTCG >seq2 group@1-firstgroup ATACAGACAGAGACACAGGGGTTC >seq3 group@2-secondgroup ATACAGACAGAGACACAG >seq4 group@2-secondgroup ATACCCCCAGAGACACAG
Please note that in Seq Identifier grouping mode, sequence lengths should be the same in a group.
-
Clustering Method
For Clustering Method, you can choose from Max, Max_clade,Single_linkage.
This parameter determine how do TreeCluster cluster your sequences based on the tree. For more details, please check https://github.com/niemasd/TreeCluster.
-
Grouping Resolution
For Grouping Resolution, you can set any value between 0 to 1.
Since the clustering process are based on the distances among sequences on the phylogenetic tree, a distance threshold is needed for the clustering process by TreeCluster. We provide a resolution value to control the distance threshold. The bigger the resolution, the bigger the threshold. But resolution is limitted to [0,1]. Resolution 0 means MetaLogo will treat each sequence as a single group, while resolution 1 means MetaLogo will treat all the sequences as one group. Simply, if you want to get more groups, please set a lower resolution; if you want to get less groups, set a higher resolution. If a sequence is not clustered into any cluster, or this sequence itself is a group, MetaLogo will consider it as a singleton and assign it to the -1 group. That means resolution 0 will also lead to a single group sequence logo, since all the sequences are assigned into -1 group.
-
Minimum Length and Maximum Length
These two parameters specify which sequences with certain lengths to be included for making sequence logo.
-
Please refer to 7.
-
Display Range (left) and Display Range (right)
These two parameters specify which region for displaying sequence logos. The range starts from 0 and ends to -1. -1 means the end of sequence, in a Python style. Note that only if the sequence logo are not aligned, this parameters will not work.
-
Please refer to 9.
-
Group Limit
This parameters specify how many groups you want to keep on the final sequence logos figure, in case that too many groups are produced during the clustering process to make it difficult to clearly present these logos on the final figure. MetaLogo will keep groups according the number of sequences in them. MetaLogo consider the first sequence of the input as the target sequence to track its grouping, so MetaLogo will keep the group into which the target sequence is assigned, no matter what value you set for Group Limit.
-
examples
By clicking the three example buttons, you can load these example dataset into the input area. Just click the submit button to peform the analysis. One set contains sequences of E. coli transcription factor binding sites, the second set contains sequences of CDR3s of verified antibodies detected in BCR repertoires of individuals with COVID-19, the third set contains sequences of another experiment validated BCR clonotypes from patient with COVID-19.
-
Input area
Please paste your sequences here. Please note the first sequence of your input will be treated as the target sequence. MetaLogo will track the grouping position of the target sequence.
-
Upload file
You can also upload a file instead of pasting it to the textarea. Please note a file with >5MB size will not be uploaded.
In this panel, users can specify the algorithms to make logos and logo alignment. Before specify parameters in this section, users need to check the Alignment tutorial first.
-
Height Algorithm
For Height Algorithm, user can choose Bits or Probabilities.
Bits is used in most sequence logo generators, which represent the total height of the letters depicts the information content of the position. please check the details at https://en.wikipedia.org/wiki/Sequence_logo. If you choose Probabilities, the height of each letter will represent the propotion of that letter in letters of that position.
-
Adjacent Alignment
If you choose Yes for Ajdacent Alignment, MetaLogo will try to link conserved positions between adjacent groups on the final figure.
-
Global Alignment with padding
This parameter will not work under auto-grouping scenario. When user speficy the grouping strategy, MetaLogo can align sequence logos from different groups by a adjusted MSA algorithm. If you set Yes for Global Alignment with padding, paddings will be insert to align different sequence logos. Note that in auto-grouping scenario, all sequences have been aligned in the MSA process before drawing the sequence logos.
-
Score Metric
For Score Metric, user can choose from Dot Production, Jensen Shannon, Cosine and Entropy weighted Bhattacharyya Coefficient. For explanations please check the Alignment tutorial first.
-
Gap Penalty
This is only for sequence logo alignment under user-defined grouping scenario. For Gap Penalty, user can specify the gap penalty in the logo alignment process. If you set a small penalty, like 0, MetaLogo will insert gaps as much as possible. In contrast, if you set a large penalty, like -10, MetaLogo will hate gaps and avoid them as much as possible. Below is a explained figure.
-
Connect Threshold
For Connect Threshold, the default value is -0.2. This threshold will not affect the alignment process, but it guide MetaLogo to choose and connect certain pairs of aligned positions from two adjacent logos according their alignment. If this threshold is positive (>0), MetaLogo will connect positions with a similarity larger than the threshold. If this threshold is negative (<0), MetaLogo will connect the top (ratio*100)% of all paris of aligned positions between two adjacent logos ranked by similarities, where ratio equals -1*threshold. Below is a explained example.
In this panel, users can choose different layout for their logo groups.
-
Logo Shape
There are in total four different layouts, including Horizontal, Circular, Radial and 3D. Below is a collection:
Please note that horizontal shape should be choosen most of the time. Because only logos with horizontal shapes can be connected to a clustering tree.
-
Connect Tree
-
Sort By
For Sort By, users can choose from Auto, Length, Length_reverse, Group Id, Group ID reverse.
This parameter specify the group order of logos. For auto-grouping scenario, Auto will be automatically choosen forLength means MetaLogo will sort logos by sequence lengths; Length_reverse means MetaLogo will sort logos by sequence lengths but in a descending manner; Group Id means MetaLogo will sort logos by group names indicated by sequence names; Group Id reverse leads to a descending manner.
-
Logo Margin Ratio
For Logo Margin Ratio, Column Margin Ratio and Character Margin Ratio, they specify the relative distances between items. These parameters all represent relative distances, if you set Logo Margin Ratio to 0.1, you will get a 1cm distance between logos when the logo heigth is 10cm. Below is a explained example.
In this panel, you could specify lots of parameters to customize your MetaLogo sequence logos. Most of them are easy to understand.
-
Auto figure size
If you set this parameter to Yes, MetaLogo will automatically determine the figure size of the output. However, you can still set your own size.
-
Alignment Color
For Alignment Color, users could choose the color for MetaLogo to connect or highlight similarly aligned positions.
-
Alignment Transparency
This parameter specifies the transparency of these connects.
-
Color scheme
For Color Scheme, users could choose the built-in scheme like DNA Basic, RNA Basic and Protein Basic. Users can also customize their own color scheme by assigning each base a color through our color picker widgets. RGB, HSL and HEX are supported for color picker.
Note that all color settings are stored in the local storage of your web browser, so you can feel free to refresh MetaLogo web without worried about losing your carefully chosen color schemes.
-
Download Format
For Download Format, MetaLogo provides PNG, PDF, PS, EPS and SVG formats for users to download.
After you submit the job, the page will re-direct to a new result page as follows.
MetaLogo server assigns each task a unique uid. When you click submit button, you will be redirected to the result page. The page will refresh itself regularly to get the updated message from the server.
-
Uid
This indicates the uid of the current uid.
-
Link
This link is the unique link of the current task. Users can save this link and check it later in the following 7 days.
-
refresh time
The page will be refreshed every 10 seconds in the first minute, then every 1 minute, then every 10 minute, which depends on the computational load of your task.
After the task is completed, MetaLogo server will present all the results on the result pages. There are several panels to display different type of results, we will explain them in the following.
This panel will show the basic information of the current task, including the id, title, created time and other detailed parameters.
This panel shows the sequence logos as well as a re-run module.
-
Resolution value
This value shows the resolution value of the current figure. Note if the grouping is not automatically determined by MetaLogo, no message will be displayed on the head of the figure.
-
Group limit
This value shows how many groups are displayed on the plot out of the total groups. Please refer to the analysis page. Note that if a group only contain few sequences, the group could get a all-zero bits array, and MetaLogo will not show that group on the figure. If you choose probabilities or bits_without_correction as the height algorithm, this will not be a problem.
-
Display range
This shows the displayed region of the aligned alignment.
-
Group Id
These values indicate the group id of each group. These ids are generated by TreeCluster. The group -1 contains all the singleton sequences.
-
Bits
Each logo has its own bits limits from 0 to the highest bits in the group. Note a tick could be the top of one logo but also be the bottom of another logo, so labels could be like 4.57/0, in which 4.57 represents the highest bits of the lower logo and 0 represents the bottom of the upper logo.
-
Conserved motifs
Conserved motifs are connected by colored strands. These strands could be controlled by choose an algorithm and a threshold, please refer to the instructions for the analysis page.
-
Gaps
These blans means gaps in the multiple sequence alignment or multiple logo alignment.
-
Target sequence mark
The red dot on the figure indicate the group to which the target belongs. Remember that MetaLogo takes the first sequence of the input as the target sequence to track.
-
Hierarchical clustering
The tree on the left show the hierarchical clustering result of the groups. For each group, the bits representation vector or probabilities vector is used to cluster. Note that this tree is not the phylogenetic tree. The phylogenetic tree of sequences is also provided on the result page and will be explained later.
If you choose to perform basic analysis on the analysis page, MetaLogo will provide you several statistical figures on the result page.
Figure 0. Sequence lengths distribution.
Figure 1. Sequence counts of each group.
Figure 2. Entropies of each position. ("X"s mean gaps)
Each box represents the entropy of the corresponding position. The higher the entropy, the less conserved the position.
Figure 3. Entropies distribution of each group.
Figure 4. Correlations among groups (only in global alignment mode and #groups>1). (Only for global alignment or auto-grouping mode)
The dendrogram on the figure is the same as that on the left of the sequence logos.
Figure 5. Distribution of pairwise distances of nodes in the phylogenetic tree. (Only for auto-grouping mode)
This figure can be used to guide the selection of the clustering resolution.
MetaLogo server also provides the MSA and phylogenetic tree visualiation. Note this is based on all the sequences, rather than the groups.
MetLogo provides the result files and the intermediate results.
-
Config File
This file contains all the parameters MetaLogo needed to replicate the current task. You can save the config for latter use. The file can also be used together with the MetaLogo python package.
-
Sequence Input
This file is the same file user uploaded to the server as the input.
-
Sequence Logo
This figure is the main sequence logo figure. Users can choose the prefered format (PNG, PDF, SVG, etc.) on the analysis page and download it here.
-
MSA result
This file is the fasta format of the msa result.
-
Phylogenetic Tree
This file is the newick format of the phylogenetic tree of sequences.
-
Grouping details
This file contains the grouping information of all the sequences. Note only sequences who satisfied the limitation of the parameters on the analysis page can be kept here.
developed by Yaowen Chen