Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: does BPNet code support any seq length? #1

Open
vitkl opened this issue Aug 8, 2023 · 22 comments
Open

Question: does BPNet code support any seq length? #1

vitkl opened this issue Aug 8, 2023 · 22 comments

Comments

@vitkl
Copy link

vitkl commented Aug 8, 2023

Hi @jmschrei

does BPNet code support arbitrary sequence length?
https://github.com/jmschrei/bpnet-lite/blob/master/bpnetlite/bpnet.py

@jmschrei
Copy link
Owner

jmschrei commented Aug 8, 2023

The BPNet model itself should work on any length. Pay attention to the trimming parameter, which specifies the difference between the input length and the output length (or, specifically, half that difference because it's the amount to trim off either side). However, I may have gotten lazy in other parts of the code-base and assumed 2114 input or 1000 output. Let me know if that's the case anywhere and I'll fix it.

@vitkl
Copy link
Author

vitkl commented Aug 8, 2023

Thanks for clarifying!

What are the requirements for the input sequence length? Can I change trimming to use 1000 input and 1000 output?

@jmschrei
Copy link
Owner

jmschrei commented Aug 8, 2023

If you set trimming to 0 you should be able to use the same input and output length. However, you'll likely get worse predictions on the flanks because they can't see their full contexts.

@vitkl
Copy link
Author

vitkl commented Aug 8, 2023

I would like to use this as a bias model - how much context does the bias model need to see?

@jmschrei
Copy link
Owner

jmschrei commented Aug 8, 2023

Usually, the bias model is given far less context so that it does not inadvertently learn complex rules. We usually use a model with four layers, so the residual layers aggregate 2**4 nucleotides after the first layer.

@vitkl
Copy link
Author

vitkl commented Aug 8, 2023

Thanks for clarifying! Trimming 16/2 nucleotides probably doesn't matter too much.

What are the bias model first layer filter size and FCLayers dimensionality?

@jmschrei
Copy link
Owner

jmschrei commented Aug 8, 2023

@vitkl
Copy link
Author

vitkl commented Aug 9, 2023

I am not sure where I should looks for a full list of bias model parameters. I see this https://github.com/jmschrei/bpnet-lite/blob/master/example_jsons/chrombpnet_pipeline_example.json#L34-L41 but it only mentions the number of layers.

@vitkl
Copy link
Author

vitkl commented Aug 9, 2023

Also interesting to see that BPNet doesn't use any normalisation layers (eg LayerNorm/BatchNorm). I wonder if not using those normalisation layers is a pre-requisite for learning TF concentration-dependent effects.

@jmschrei
Copy link
Owner

jmschrei commented Aug 9, 2023

The bias model is just a BPNet model so you can use any of the parameters in https://github.com/jmschrei/bpnet-lite/blob/master/example_jsons/bpnet_fit_example.json

I'm not sure how normalization would relate to learning TF concentration-dependent effects. Presumably, that's static in each experiment.

@vitkl
Copy link
Author

vitkl commented Aug 9, 2023

Presumably, that's static in each experiment.

This makes sense. How do you motivate not using normalisation?

@jmschrei
Copy link
Owner

jmschrei commented Aug 9, 2023

Presumably, the motivation is that adding it in didn't help, empirically, and keeping the model conceptually simpler can help with interpretation.

@vitkl
Copy link
Author

vitkl commented Aug 9, 2023

Makes sense!

Looking at the file with options:

  1. max_jitter - Which values do you recommend using? I see a pretty large value of 128 compared to Basset default of 3.
  2. reverse_complement - do you have strong arguments against adding results of scanning forward & reverse complement in favour of random RC input?

@jmschrei
Copy link
Owner

jmschrei commented Aug 9, 2023

  1. I'd recommend having it be as large as possible while still capturing the peak. Basset would have benefitted from a larger jitter because it had position dependence in the model through the use of dense layers. BPNet is much more resistant to this because it only uses convolution layers.

  2. https://www.biorxiv.org/content/10.1101/103663v1 suggests the best approach is to train the model and then, at inference time, scan both. I don't think it matters too much though.

@vitkl
Copy link
Author

vitkl commented Aug 9, 2023

  1. Do you mean that BPNet should in principle works as well with a smaller shift?

  2. I saw the paper - however, I have to use the summation strategy in the rest of the model - so the question is whether the bias model should be similarly trained.

@jmschrei
Copy link
Owner

jmschrei commented Aug 9, 2023

  1. Many of the choices for BPNet were done on a small number of ChIP-nexus data sets. I don't think we've rigorously tested each decision by, for example, looking at performance with different jitters. I think that jittering helps BPNet but don't have more than just that intuition.

  2. The bias model and subsequently the full ChromBPNet model should be trained by randomly RCing input. If you want to force outputs to be identical across strands then, at inference time, you run a sequence and its RC through the entire model (bias and accessibility) and aggregate.

@vitkl
Copy link
Author

vitkl commented Aug 10, 2023

  1. I see. I observed that 3bp jittering helps cell2state compared to no jittering. I will test 100bp jittering.

  2. I see what you mean. I would like to use BPNet as a bias model but the biological model is very different. I am using parameter-sharing architecture in cell2state CNN because the refererence-based TF motifs will be only recognised in either RC or FW so the model has to look at both directions and aggregate. As far as I understand, random RC forces the model to learn both FW and RC forms of the TF motifs (or in this case bias motifs) - does this map to your intuition? The main question is whether it's fine to randomly RC input for the bias model but use both FW/RC for the biological model.

@jmschrei
Copy link
Owner

Randomly RCing encourages the model to learn motifs in both directions but there's no guarantee that it learns the same effect in both directions, even though biologically that's plausible. There have been RARE cases where it learns a motif in one direction and not in the other, but this is usually only when there are not a lot of binding sites.

I'm not sure I understand how you're training your biological model. Are you training it jointly with a frozen pre-trained bias model? I guess my feeling is that both should be trained the same way. If you end up doing parameter sharing with your model, you should train the bias model with parameter sharing. Otherwise, it might learn some weird effects. Unfortunately, parameter sharing is not implemented in bpnet-lite.

@vitkl
Copy link
Author

vitkl commented Aug 10, 2023

I see. The model failing to learn a motif in both directions makes sense.

Also makes sense to train both models with parameter sharing. The bpnet-lite code helps a lot with understanding the architecture. I will try implementing parameter sharing but I am not sure I fully understand how to do that for layer 2+. My biological architecture uses just one CNN layer, scans forward DNA with FW filter, then with RC filter (simply swap complementary nucleotides and flip width axis), applies Exp activation to each, and then adds the result of scanning with FW/RC. Does the same operation happen in all subsequent layers?

basarnoyan1 referenced this issue in basarnoyan1/bpnet-lite Nov 12, 2023
@jmschrei
Copy link
Owner

If you want to do parameter sharing I'd recommend having a wrapper that takes in a normal model, runs your sequence through it, then runs the RC'd sequence through it, and then flips the output from the RC'd sequence and averages it. No need to modify the underlying model.

@vitkl
Copy link
Author

vitkl commented Jul 6, 2024

Thanks for this suggestion @jmschrei! This certainly simplifies the implementation.

Didn't Anshul's group previously show that this kind of conjoining during training is worse than both conjoining during evaluation and a model that correctly accounts for the symmetries in every layer?

@vitkl
Copy link
Author

vitkl commented Jul 6, 2024

Actually, this won't work because for forward and RC sequences to give the same results, the model needs to represent both the forward and RC versions of each TF motif:
-A>
<A-

Which is fine if you train a CNN de-novo but not fine if you use TF motifs as the first layer CNN weight priors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants