-
Notifications
You must be signed in to change notification settings - Fork 1
Creating mmCIF files
Hongsuda edited this page Jul 27, 2020
·
4 revisions
Requirements for extracting catalog data into mmCIF (to be used as an input to a separate pdb system)
- Data needs to be exported entry-wise i.e., only data belonging to a particular
entry
(as denoted byentry.id
) is to be exported - Only tables and columns in the json schema need to be exported
- Some tables need to be appended from the
mmCIF file
uploaded by the user in step 2:atom_site
ihm_starting_model_coord
ihm_sphere_obj_site
ihm_gaussian_obj_site
ihm_gaussian_obj_ensemble
pdbx_poly_seq_scheme
pdbx_nonpoly_scheme
Note: Not all entries will have all of the the above tables.
- PDBx/mmCIF syntax
- CIF 1.1 syntax specifications
- Space or tabs can be used to separate column values in a row
- If there is an optional column in a table and some rows in the table have values and some rows don't, then
.
can be used to denote missing values - Single or double quotes can be used for textual column values that contain spaces
- Multi-line texts are enclosed within
;
(see example forentity_poly
table below)-
;
has to be in the beginning of the line. - The text does not have to start right after the first
;
- In the example below (
entity_poly
table), the first;*;
is for_entity_poly.pdbx_seq_one_letter_code
and the second;*;
is for_entity_poly.pdbx_seq_one_letter_code_can
-
-- valid
;multi-line text
;
-- valid (prefer)
; multi-line text
;
-- valid (empty line can be used anywhere in the file)
; multi-line text
;
"next column value"
-- valid
; multi-line text
;
-- valid (but don't recommend)
;
multi-line text
;
- If the text contains single quotes, then they can be enclosed within double quotes and vice versa. If the text contains both single and double quotes, then they are enclosed within
;
like multi-line texts. -
#
identifies a commented line and can be used to add empty lines between tables - When vocab tables are used, the corresponding values should be used to populate the mmCIF tables (see
entity.type
in the example below) - If a table returns zero rows for a particular
structure_id
, then the table need not be included in the mmCIF file i.e., no empty tables - The
structure_id
column in each table need not be included in the mmCIF file
data_structure_id (use value of structure_id)
loop_
_table_name.column_name_1
_table_name.column_name_2
...
...
...
_table_name.column_name_n
Row_1_column_value_1 Row_1_column_value_2 ......... Row_1_column_value_n
....
....
....
Row_m_column_value_1 Row_m_column_value_2 ......... Row_m_column_value_n
loop_
_entity.id
_entity.type
_entity.src_method
_entity.pdbx_description
_entity.formula_weight
_entity.pdbx_number_of_molecules
1 polymer man "C1q subunits A, C, and B" 45697.594 1
2 non-polymer man N-ACETYL-D-GLUCOSAMINE 221.208 1
3 non-polymer syn 'CALCIUM ION' . 1
#
loop_
_entity_poly.entity_id
_entity_poly.type
_entity_poly.nstd_linkage
_entity_poly.nstd_monomer
_entity_poly.pdbx_seq_one_letter_code
_entity_poly.pdbx_seq_one_letter_code_can
1 'polypeptide(L)' no no
;KDQPRPAFSAIRRNPPMGGNVVIFDTVITNQEEPYQNHSGRFVCTVPGYYYFTFQVLSQWEICLSIVSSSRGQVRRSLGF
CDTTNKGLFQVVSGGMVLQLQQGDQVWVEKDPKKGHIYQGSEADSVFSGFLIFPSAGSGKQKFQSVFTVTRQTHQPPAPN
SLIRFNAVLTNPQGDYDTSTGKFTCKVPGLYYFVYHASHTANLCVLLYRSGVKVVTFCGHTSKTNQVNSGGVLLRLQVGE
EVWLAVNDYYDMVGIQGSDSVFSGFLLFPDGSAKATQKIAFSATRTINVPLRRDQTIRFDHVITNMNNNYEPRSGKFTCK
VPGLYYFTYHASSRGNLCVNLMRGRERAQKVVTFCDYAYNTFQVTTGGMVLKLEQGENVFLQATDKNSLLGMEGANSIFS
GFLLFPDMEA
;
;KDQPRPAFSAIRRNPPMGGNVVIFDTVITNQEEPYQNHSGRFVCTVPGYYYFTFQVLSQWEICLSIVSSSRGQVRRSLGF
CDTTNKGLFQVVSGGMVLQLQQGDQVWVEKDPKKGHIYQGSEADSVFSGFLIFPSAGSGKQKFQSVFTVTRQTHQPPAPN
SLIRFNAVLTNPQGDYDTSTGKFTCKVPGLYYFVYHASHTANLCVLLYRSGVKVVTFCGHTSKTNQVNSGGVLLRLQVGE
EVWLAVNDYYDMVGIQGSDSVFSGFLLFPDGSAKATQKIAFSATRTINVPLRRDQTIRFDHVITNMNNNYEPRSGKFTCK
VPGLYYFTYHASSRGNLCVNLMRGRERAQKVVTFCDYAYNTFQVTTGGMVLKLEQGENVFLQATDKNSLLGMEGANSIFS
GFLLFPDMEA
;
#
loop_
_entity_poly_seq.entity_id
_entity_poly_seq.num
_entity_poly_seq.mon_id
_entity_poly_seq.hetero
1 1 LYS n
1 2 ASP n
1 3 GLN n
1 4 PRO n
1 5 ARG n
1 6 PRO n
1 7 ALA n
1 8 PHE n
1 9 SER n
1 10 ALA n
1 11 ILE n
1 12 ARG n
1 13 ARG n
1 14 ASN n
1 15 PRO n
#