-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training data preparation issues #12
Comments
Looks like there was a leftover piece assuming it would be ran on a HPC system, try replacing that script with this:
I only removed the "echo" and "qsub" pipe at the end here, similar to the other merge* scripts. |
ok I will try. Please guide me throughout the training process. |
It worked. Thank you so much |
After using this command, I used this: But it didnt create any file. Is there any error? I changed the command but i couldnt understand why it is not working |
Even mergeSubs.sh script is also not working for me. |
In your call:
You seem to wildcard a set of files:
It wants a folder with different subfolders, where every subfolder contains files (ending with |
yeah my folder structure matches. I have created .start.fwd and .start.rev in this folder /train/outdir. I used your command for merging: for i in $(seq 1 22) for i in $(seq 1 22) |
I used this command: |
I am using this command: shall I change in script or please tell me some ideas. |
I'm trying to understand what you are running exactly as it seems it's not looking at the folder structure the way it's intended. This step:
|
I think it will be used for multiple files. I will solve this later. But please help me in mergeSubs.sh |
This one |
I can't help you on that one if the input data is wrong: you are trying to run a merge step that depends on the previous merge step which apparently didn't work. |
I used these file: IonXpress_006.bam for i in $(seq 1 22) |
So there is no subfolders in
The scripts are targeting explicitly files in paths similar to Can you check if that is correct/send a folder structure tree so I can tell? |
no I didnt create the folder in /train/a/ or train/outdir/ |
I only created /train/outdir and /train/a/ for merging. Should i create one more folder in a/ and outdir/? |
Create a folder |
ok i am trying. |
yes it worked. It gave me files like merge.1, merge.2 upto merge.22. It didnt give merge.X and merge.Y |
what should i do next? |
After this, what shall i do? how do I run mergeSubs.sh? |
It indeed doesn't produce information for X and Y, that is by design (for ethical and sample comparability reasons). Depending on the data you have and your intentions you may have to take a moment to consider whether the merged file you created is what you really want. I don't know if you put everything together or made a bunch of subsets (the whole |
so shall I put all the merge files from merge.sh in run1 folder or different folder? |
or shall I run python script directly for getting nucleosome tracks? |
i am getting this error while running this script |
The steps you need to take are described in the README as is, I suggest you re read that carefully as I get the impression this new error is because you didn't run it as explained there. Mind the flow diagram that's included on the github page, I believe that should provide the info needed to understand the steps. |
I ran the commands according to your README file only. First I ran merge.sh script. It gave me results. Second I ran mergeSubs.sh script and it gave the same results as well as same size of files. What I am doing right or wrong, I dont understand, thats why I am asking you. Please help me understanding this README |
what do you mean by subset here? subset means each start.fwd and start.rev files or each bam files? |
I am using this command: and getting this: /home/mdrcubuntu/anaconda3/envs/smruti/bin/python /media/mdrcubuntu/C8CE59DBCE59C1FC/new_data/sanefalcon/nuclDetector_1.py /media/mdrcubuntu/C8CE59DBCE59C1FC/new_data/sanefalcon/train/anti.1 /media/mdrcubuntu/C8CE59DBCE59C1FC/new_data/sanefalcon/train/nucl_ex3.1 I have modified this script according to previous but still getting this, no files are created. |
subset = batch in 3.1 Split data set in README.md.
If you end up with the same files and filesizes you only made one batch/subset. For |
Thank you @rstraver I am working on this and will let you know if any error occurs. Can you please provide your mail ID? I have to ask about wisecondor-defrag algorithm. I am using that also |
Also, create a nucleosome track for the whole set of training data:
What is whole set of training data here? Is it the anti merged data or merged data? |
I have completed upto getting nucleosome profiles. Now I am trying this:
My question is what is trainNucl trainRef here? which file shall i use? Please tell me |
please @rstraver how to create trainRef file? I have created trainNucl file but I am not clear about trainRef file. |
Could you please refer to the schematic overview and check if you understand where you are and what you are doing?
To clarify on that: all that difficult to follow work on the right hand side is to ensure you don't detect nucleosomes on the same data as you train the FF% prediction on to ensure the knowledge it learns on is realistic (i.e. new sample not influencing nucleosome position detection). I might be able to dig up a nucleosome track we made with 300+ low coverage samples to simplify/reduce work a bit although plugging that in and using it may need a bit more of understanding what goes where too. For the trainRef input: Looking at the way that file is read it contains one line per sample and 3 columns, split by a space:
Looks like I didn't document this (guess I assumed people would include their own FF% estimations or drag over from defrag in WISECONDOR) but there's 2 scripts to help you out there: FYI: I'm not super responsive on questions regarding this set of scripts as it has been over 5 years since I touched it. Meanwhile I changed jobs and I don't have access to the data I used so I find it difficult to figure out what your question really is referring to. I'm really answering just by browsing the scripts I have little memory of by now and I need to find some time to dig in for that. |
I got this output after using getRefFF.sh: |
Please tell me how to get reference file. I am getting error in this. |
Seems it didn't count anything on chromosome X. Looking through the prepSamples.sh script I assume you ran before it appears the X chromosome is commented out: This should enable it to run for everything: Or just X: I'd try the latter option, and rerun to get the read start positions per sample for chromosome X. Then retry getting the fetal fractions. |
will you try or I try? |
I don't have data so you'll have to try this. |
ok so I have to change the script actually, right? |
I have started from prepSamples.sh script by uncommenting or "X" but after that I am facing issues, it is not taking the X and not creating X.start.fwd. What should I add to the merge script? |
Could you copy and paste your edited code here so I can see what it looks like now? |
I have edited nothing, just changed X to 23, then I got all the results |
@rstraver can you please help me in this thread: VUmcCGP/wisecondor#51 its a wisecondor defrag issue |
I already asked the colleague that worked on that script back then to take a look on defrag, just waiting for that now.
Does that mean you made it look like this: |
No i did it in this way: |
You need to pick one of the two options I provided:
The first you tried ( |
when I put 'seq 1 22' or "X" it was giving me this: so, I removed or and ran the script. Now it gave all the .start files of 1-22 and X. after getting X.start files, I changed its name to 23 and then ran the merge script on them. |
Ah I see, my mistake on the 'or' part there. |
oh we dont need X for merging, I already did it actually, this will throw error again? |
I think it should be ignored by other scripts as they are not interested in X anyway, if that's what you ask. |
ok alright then I will check with getRefFF.sh and let you know. Anyway, I will ask you if any error comes up |
I ran the command and finally got this: |
It's a simple rough estimation, it takes the median of reads per bin on autosomal chromosomes (206) and chromosome X (207). The difference between the two is assumed to be half the fetal fraction for pregnancies where the fetus is male. Depending on sequencing depth and variations it is not super exact, and if you ran this for a pregnancy where the fetus is female (or male but no fetal DNA is in the sample anymore) a small negative fraction estimation is possible (a tad more reads mapped on X than on autosomal chromosomes, relatively speaking). This value is not what SANEFALCON aims to produce, it's just a simple reference value to train SANEFALCON on (it needs to learn with known fetal fractions (which is much easier to estimate with male fetuses)). Be sure to get these values for pregnancies where the fetus is known to be male and use the resulting fractions as reference training data for SANEFACLON training afterward. |
after this I ran predictor.py script, but I got this error:
|
@rstraver I read your paper comprising of using SANEFALCON with DEFRAG and seqFF. It would be very kind of you if you help me both in SANEFALCON and DEFRAG because I got stuck in the last part of this tools. |
@rstraver I didnt get any response regarding sanefalcon and defrag. Please tell me whether I can use defrag or not because there is error in defrag script. I am not able to use sanefalcon because of the above error. |
I have run the command:
./merge.sh ./train/outdir as mentioned in your paper
A little script format to do this is supplied in merge.sh, to run this for subset a use:
The text was updated successfully, but these errors were encountered: