forked from Weeks-UNC/Superfold
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
158 lines (110 loc) · 5.44 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
###################################################################################
Superfold installation, execution, and troubleshooting.
Gregg Rice 2014
###################################################################################
Requirements:
===================================================================================
python 2.7
===================================================================================
RNAStructure - https://rna.urmc.rochester.edu/RNAstructure.html
Fold and partition executables necessary to predict secondary structure and base pairing probabilites
Download command-line applications for your platform
Extract to home directory
build binaries using the 'make all' command in the RNAStructure directory.
add following 2 lines to ~/.bash_profile
export PATH=$PATH:$HOME/RNAstructure/exe
export DATAPATH=$HOME/RNAstructure/data_tables
===================================================================================
matplotlib (python module required for .pdf figure rendering) -
Download source
Extract to any directory
cd to the extracted directory
run the command "python setup.py install --user"
===================================================================================
httplib2 (python module only required if rendering structures) -
Download httplib2-0.7.6.tar.gz (or later version)
Extract to any directory
cd to httplib2 directory
run the command "python setup.py install --user"
===================================================================================
###################################################################################
###################################################################################
Execution instructions:
SuperFold can be run using one command:
python SuperFold.py RNA.map
All the other flags are optional. Use the --help flag for explainations of command line options
python SuperFold.py --help
File Setup:
The only required file is a .map file. This output is automatically
generated by the ShapeMapper pipeline. The .map file consistes of
the nucleotide #, SHAPE reactivity, Error, and Nucleotide sequence.
T nucleotides will automatically be converted to U by SuperFold.
---myFavoriteRNA.map---
1 0.002512 0.053798 G
2 -0.034906 0.143529 T
3 -0.077852 0.257623 T
4 -0.068123 0.122385 T
Differential SHAPEMap file:
The differential file consists of the nucleotide#, differntial SHAPE
reactivity, std error, nucleotide sequence and Z-factor of the difference
calculated by 1- 3(1m6_err + nmia_err)/abs(shape1-shape2).
--myRNAnmia-1m6.mapd--
1 -999.0 -999.0 G -999.0
2 -0.0124 0.2673 U -74.2440186566
3 0.0951 0.0833 U -2.34887508212
4 0.0409 0.0929 U -7.96984706503
A differential SHAPEMap file is created by running the utility
differenceByWindowSHAPAEMAP.py. This program has the following usage:
Usage: <nmia.txt.map> <1m6.txt.map> <difference.dif.mapd> <i>
Create your .mapd file using the following command:
python differenceByWindowSHAPEMAP.py nmia.map 1m6.map nmia-1m6.mapd 25
where nmia.map and 1m6.map are the names of the NMIA and 1M6 map files. The new file
"nmia-1m6.mapd" will contain the differential map file suitable to be given to the
--differentialFile flag of SuperFold.
Single Strand Constraints:
Include any other single stranded constraints that
you have other evidence shouldn't be considered for folding here. ex:
---ssConstraints.txt--- < this part is just the name, not in the file
34
35
36
78
77
76
PK constraints:
In a second file. List the PKs in pairs. We will use this paired PK file to
reassemble your pk'd nucleotides in the final step. ex:
---ListofPKs_ds.txt---
34 78
35 77
36 76
ShapeMapper 2.2+ and --dms:
If the data was generated with ShapeMapper 2.2+ in DMS mode (--dms)
Superfold should be run with the --DMS flag. This will modify the
submitted fold and partition commands in a manner compatible with
DMS SM 2.2+ data.
###################################################################################
Output description and troubleshooting:
Occasionally (depending on the RNA and SHAPE constraints) it may be required to use a smaller window size
for partition and for Fold in order to obtain base pairs in the output. This can be accomplished with the:
--partitionWindowSize
--foldWindowSize
1000 is a good size to select for the partition window. For window sizes less than 1000 set --trimInterior
to 200 nucleotides in order to obtain an output for interior windows. Smaller window sizes will result in
a bias toward shorter range interactions.
Outputs are listed in the order of execution:
Folders are created by superfold automatically to store the output. In order to prevent a collision with file names
a cryptographic hash of the input values is appended to the folder and file names. A log file detailing the run is
in the results folder.
Intermediate partition function calculatoins are in the partition folder. Intermediate fold calcualtions are in the
fold folder.
Merged partition function and minimum free energy structures are in the results folder and begin with the title
merged.
Likely base pairs from partition function are plotted as arc in the arcs file. The following is the key:
green > 80%
blue > 30%
yellow > 10%
gray > 3%
The Shannon entropy and SHAPE analysis is plotted in the ShannonSHAPE pdf file. Region cutsites are written to the log file.
Indvidual region structure files and plots are written to the regions folder with the region range in the filename