Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

denovo_search denovo_input_feature_file #2

Open
cguetot opened this issue Sep 10, 2020 · 10 comments
Open

denovo_search denovo_input_feature_file #2

cguetot opened this issue Sep 10, 2020 · 10 comments

Comments

@cguetot
Copy link

cguetot commented Sep 10, 2020

Hi,

If I want to make predictions for a new mgf file, do I have to leave an empty cell for the 'seq' column in its feature file?

I noted that the headers of the feature files are defined as follow;
"spec_group_id","m/z","z","rt_mean","seq","scans","profile","feature area"

On the other hand, do you have any setting recommendations (deepnovo_config.py) for data coming from a QExactive-HF?

best,

Carlos

@volpato30
Copy link
Owner

volpato30 commented Sep 10, 2020 via email

@cguetot
Copy link
Author

cguetot commented Sep 10, 2020

Hi Rui,

I modified my comment so you did not get the change to read my second question:

do you have any setting recommendations (deepnovo_config.py) for data coming from a QExactive-HF? both for training and denovo search.

Carlos

@volpato30
Copy link
Owner

volpato30 commented Sep 10, 2020 via email

@cguetot
Copy link
Author

cguetot commented Feb 3, 2021

can I use the same knapsack file from deepnovo? or are they different?

@volpato30
Copy link
Owner

They are the same. But be careful about the ptm settings (AAs included in vocab_reverse). One knapsack file corresponds to a specfic set of ptms and MZ_MAX. I believe the original deepnovo knapsack is generated with C(Cam), M(oxidation) NQ(Deamidation) and MZ_MAX of 3000.

@cguetot
Copy link
Author

cguetot commented Feb 3, 2021

how can I build a custom knapsack for DeepNovoV2, with, for example, MZ_MAX of 4000 ?

@volpato30
Copy link
Owner

change the MZ_MAX to 4000 in config file, then $>make denovo. When the program detects no knapsack.npy file in the current folder it will start building a new one with the configurations in deepnovo_config.py file

@cguetot
Copy link
Author

cguetot commented Feb 3, 2021 via email

@cguetot
Copy link
Author

cguetot commented Feb 4, 2021

Hi again,

several questions:

  1. how can I increase the training, valid and test sizes?
    context: I see variables like train_stack_size, valid_stack_size and test_stack_size are not used anymore in this code compared to the old tensorflow version.

  2. I also see variable called batch_size with a lower value (32) respect to the original code (128). how does it affect the training process?

  3. If I increase "num_workers", will it speed up the calculations?

  4. is it possible to get the top n best candidates for each scan?

thanks in advance,

Carlos

@volpato30
Copy link
Owner

  1. Do you mean batch size? Batch size are configured with batch_size variable in config file. The number of data points totally depends on your input file.
  2. for training you should use lower value. 128 is what I used for doing de novo. I usually train model with batch size of 16 or 32 and I don't observe significant difference in the final accuracy of the model.
  3. num_workers controls the number of CPU thread to provide (i.e. preprocess) training data to GPU. If you observe that your GPU usage is not full during training, then increasing it might help. Otherwise there is no need to increase the value.
  4. Yes, you totally can. The current beam search retrains the top 5 (also configurable in config file) candidates. You just need to slightly modify the denovo.py and writer.py file to output top 5 instead of top 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants