-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xml parsing #1
Comments
@chrisprobert Ok cool, I finishing up my code to convert the xml file into several arrays and dictionaries of relevant information, should be done by tomorrow. |
@aaroncp1an0 take a look at Also, I've also posted those here: Some initial thoughts:
|
I check out your commits on github. Nice work. It will take me a little time to get up and running with github yet. In terms of the files you've generated, my data agrees with yours, although after processing 100k proteins I had 39 features rather than 33 - but some of them were trivial such as 'initiator methionine'. Do you prefer I upload my files and code to your github, as they are somewhat redundant? I am storing the dictionary/array descriptions using numpy dump. My sample output is as follows: ######SAMPLE OUTPUT of array elements feat[0] -> corresponding features in feature dictionary des[0] -> corresponding descriptions in description dictionary pos[0] -> corresponding to start/stop position for each feat+description pair Feature Dictionary Description Dictionary: |
This page documents the various feature annotations: http://www.uniprot.org/help/sequence_annotation Ideally I think we'd like to download the 'structure' section, but I'm not sure how: http://www.uniprot.org/help/structure_section Actually, it turns out that the feature section does contain secondary structure information. The 'helix', 'turn' and 'strand' values we found here are the same as those in the secondary structure section on the uniprotKB entries. |
I checked that out ~ thank you. Interesting unfortunately they don't I have a flexible outline for implementing the 'part of speech tagging' - I'm free most of the day. On Tue, May 12, 2015 at 9:05 AM, Chris Probert [email protected]
|
@aaroncp1an0 I've started some code for XML parsing, check the xml_parser directory. I'll aim to finish this up later today, and we should have a TSV file for sequence feature attributes soon.
The text was updated successfully, but these errors were encountered: