denovo_search denovo_input_feature_file #2

cguetot · 2020-09-10T14:29:29Z

Hi,

If I want to make predictions for a new mgf file, do I have to leave an empty cell for the 'seq' column in its feature file?

I noted that the headers of the feature files are defined as follow;
"spec_group_id","m/z","z","rt_mean","seq","scans","profile","feature area"

On the other hand, do you have any setting recommendations (deepnovo_config.py) for data coming from a QExactive-HF?

best,

Carlos

volpato30 · 2020-09-10T18:27:20Z

Hi Carlos, Yes, you need keep seq column and leave anything in that cell (seq won't be used when doing de novo). The reader will search "seq" in the header so deleting that column should raise error. Let me know if you have any problems when running it. Best, Rui Carlos Gueto-Tettay <[email protected]> 于2020年9月10日周四上午10:29写道：

…

Hi, If I want to make predictions for a new mgf file, do I have to leave an empty cell for the 'seq' column in its feature file? I noted that the headers of the feature files are defined as follow; "spec_group_id","m/z","z","rt_mean","seq","scans","profile","feature area" best, Carlos — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACB2A6P6VTK3AG3ZES6JPBTSFDPFXANCNFSM4RFFFMXA> .

cguetot · 2020-09-10T19:49:10Z

Hi Rui,

I modified my comment so you did not get the change to read my second question:

do you have any setting recommendations (deepnovo_config.py) for data coming from a QExactive-HF? both for training and denovo search.

Carlos

volpato30 · 2020-09-10T20:19:42Z

Hi Carlos, I don't think you need to change parameters for Q Exactive data. Just make sure your training data have relatively similar properties (enzyme, instrument, fragmentation method) as the data you want to perform de novo sequencing. Rui Carlos Gueto-Tettay <[email protected]> 于2020年9月10日周四下午3:49写道：

…

Hi Rui, I modified my comment so you did not get the change to read my second question: do you have any setting recommendations (deepnovo_config.py) for data coming from a QExactive-HF? both for training and denovo search. Carlos — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACB2A6OOZ74JMV6QY3ERCPTSFEUUPANCNFSM4RFFFMXA> .

cguetot · 2021-02-03T14:07:31Z

can I use the same knapsack file from deepnovo? or are they different?

volpato30 · 2021-02-03T14:11:27Z

They are the same. But be careful about the ptm settings (AAs included in vocab_reverse). One knapsack file corresponds to a specfic set of ptms and MZ_MAX. I believe the original deepnovo knapsack is generated with C(Cam), M(oxidation) NQ(Deamidation) and MZ_MAX of 3000.

cguetot · 2021-02-03T14:28:30Z

how can I build a custom knapsack for DeepNovoV2, with, for example, MZ_MAX of 4000 ?

volpato30 · 2021-02-03T14:50:18Z

change the MZ_MAX to 4000 in config file, then $>make denovo. When the program detects no knapsack.npy file in the current folder it will start building a new one with the configurations in deepnovo_config.py file

cguetot · 2021-02-03T15:00:30Z

That's perfect. Thanks, Carlos

…

On Wed, Feb 3, 2021, 15:50 volpato30 ***@***.***> wrote: change the MZ_MAX to 4000 in config file, then $>make denovo. When the program detects no knapsack.npy file in the current folder it will start building a new one with the configurations in deepnovo_config.py file — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIK5UOB2YYAV27SC5FCLC2TS5FPDXANCNFSM4RFFFMXA> .

cguetot · 2021-02-04T12:14:49Z

Hi again,

several questions:

how can I increase the training, valid and test sizes?
context: I see variables like train_stack_size, valid_stack_size and test_stack_size are not used anymore in this code compared to the old tensorflow version.
I also see variable called batch_size with a lower value (32) respect to the original code (128). how does it affect the training process?
If I increase "num_workers", will it speed up the calculations?
is it possible to get the top n best candidates for each scan?

thanks in advance,

Carlos

volpato30 · 2021-02-08T15:27:42Z

Do you mean batch size? Batch size are configured with batch_size variable in config file. The number of data points totally depends on your input file.
for training you should use lower value. 128 is what I used for doing de novo. I usually train model with batch size of 16 or 32 and I don't observe significant difference in the final accuracy of the model.
num_workers controls the number of CPU thread to provide (i.e. preprocess) training data to GPU. If you observe that your GPU usage is not full during training, then increasing it might help. Otherwise there is no need to increase the value.
Yes, you totally can. The current beam search retrains the top 5 (also configurable in config file) candidates. You just need to slightly modify the denovo.py and writer.py file to output top 5 instead of top 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

denovo_search denovo_input_feature_file #2

denovo_search denovo_input_feature_file #2

cguetot commented Sep 10, 2020 •

edited

Loading

volpato30 commented Sep 10, 2020 via email

cguetot commented Sep 10, 2020

volpato30 commented Sep 10, 2020 via email

cguetot commented Feb 3, 2021

volpato30 commented Feb 3, 2021

cguetot commented Feb 3, 2021

volpato30 commented Feb 3, 2021

cguetot commented Feb 3, 2021 via email

cguetot commented Feb 4, 2021

volpato30 commented Feb 8, 2021

denovo_search denovo_input_feature_file #2

denovo_search denovo_input_feature_file #2

Comments

cguetot commented Sep 10, 2020 • edited Loading

volpato30 commented Sep 10, 2020 via email

cguetot commented Sep 10, 2020

volpato30 commented Sep 10, 2020 via email

cguetot commented Feb 3, 2021

volpato30 commented Feb 3, 2021

cguetot commented Feb 3, 2021

volpato30 commented Feb 3, 2021

cguetot commented Feb 3, 2021 via email

cguetot commented Feb 4, 2021

volpato30 commented Feb 8, 2021

cguetot commented Sep 10, 2020 •

edited

Loading