-
Notifications
You must be signed in to change notification settings - Fork 65
Understanding the input
There is read depth normalization, but it's done implicitly by the LOWESS GC-Correction. WISECONDOR previously had a seperate step to normalize the data (which was only useful after applying the RETRO-filter due to the read-towers or spikes in the data), but I decided to remove it as I applied LOWESS using a division:
correctedValue = sample[chrom][bin]/lowessCurve.pop(0)
.
As the actual read count of any bin gets scaled to about 1 using this step, the separate normalization step became obsolete, results with and without normalizing the data prior to GC-Correction showed no noticable differences.
The data we obtained from our lab contains enough reads to allow us to use this setting. We prefer using only the highest reliable data we can get over more, less reliably mapped data. Of course, you are free to test WISECONDOR with mismatches allowed, just make sure your reference set is build using the same settings as your test samples. WISECONDOR does not care for these mismatches, it only counts reads based on their position on the genome.
Not much, WISECONDOR will probably not report any aberrated bins (not ones that are fetal anyway) but it won't check for fetal percentages, it simply assumes there is enough.
We believe about 12 million mappable reads is safe, in general 10 million seems enough and 8 million appears to be the absolute lower threshold. The more reads the merrier, keep in mind that a reduced amount of reads per bin increases the StandardDeviation for that bin relatively quickly when looking for small relative changes, thus decreasing WISECONDOR's sensitivity.
I built a reference using relatively messy samples, what will a test on a good sample show me using this reference?
Hard to predict, depends on what you call messy as well. If you use mostly the same protocol to prepare the samples in the laboratory , WISECONDOR may happily get along with your messy reference set: Any structurally comparable behaviour among bins can still be identified when building the reference set and if the tested sample has completely different read depths over all bins, but the bins identified in the reference still do behave structurally alike, the calls are not necessarily bad.
The problem occurs when the read depths over bins start to behave differently, which may happen when the workflow in a laboratory changes although even that may be less influential than expected. Still, the general Bio-Informatics rule applies here:
- Rubbish in is rubbish out.
An interesting note on the side: We have seen pretty good results using a dirty reference set; while most samples were unaberrated, some in the set were (trisomy 21 for example), when using the built reference on the very same set of samples, WISECONDOR did call the trisomy 21 cases without mistakes. We do vouch for using reference sets that do not contian such errorneous samples but if there is no other data available, you may give it a shot.
If you run into issues, please create a ticket so I can take care of it.
If you have other troubles running WISECONDOR or any related questions, feel free to contact me through the e-mail adress on my GitHub page.