Skip to content

Understanding the input

Roy Straver edited this page Nov 25, 2013 · 5 revisions

There is no read depth normalization, how can you compare samples this way?

There is read depth normalization, but it's done implicitly by the LOWESS GC-Correction. WISECONDOR previously had a seperate step to normalize the data (which was only useful after applying the RETRO-filter due to the read-towers or spikes in the data), but I decided to remove it as I applied LOWESS using a division: correctedValue = sample[chrom][bin]/lowessCurve.pop(0). As the actual read count of any bin gets scaled to about 1 using this step, the separate normalization step became obsolete, results with and without normalizing the data prior to GC-Correction showed no noticable differences.

You state you are mapping your data with 0 mismatches allowed

The data we obtained from our lab contains enough reads to allow us to use this setting. We prefer using only the highest reliable data we can get over more, less reliably mapped data. Of course, you are free to test WISECONDOR with mismatches allowed, just make sure your reference set is build using the same settings as your test samples. WISECONDOR does not care for these mismatches, it only counts reads based on their position on the genome.

What happens if there is not enough fetal DNA present?

Not much, WISECONDOR will probably not report any aberrated bins (not ones that are fetal anyway) but it won't check for fetal percentages, it simply assumes there is enough.

How many reads do I need for analysis?

We believe about 12 million mappable reads is safe, in general 10 million seems enough and 8 million appears to be the absolute lower threshold. The more reads the merrier, keep in mind that a reduced amount of reads per bin increases the StandardDeviation for that bin relatively quickly when looking for small relative changes, thus decreasing WISECONDOR's sensitivity.

I built a reference using relatively messy samples, what will a test on a good sample show me using this reference?

Hard to predict, depends on what you call messy as well. If you use mostly the same protocol to prepare the samples in the laboratory , WISECONDOR may happily get along with your messy reference set: Any structurally comparable behaviour among bins can still be identified when building the reference set and if the tested sample has completely different read depths over all bins, but the bins identified in the reference still do behave structurally alike, the calls are not necessarily bad.
The problem occurs when the read depths over bins start to behave differently, which may happen when the workflow in a laboratory changes although even that may be less influential than expected. Still, the general Bio-Informatics rule applies here:

  • Rubbish in is rubbish out.

An interesting note on the side: We have seen pretty good results using a dirty reference set; while most samples were unaberrated, some in the set were (trisomy 21 for example), when using the built reference on the very same set of samples, WISECONDOR did call the trisomy 21 cases without mistakes. We do vouch for using reference sets that do not contian such errorneous samples but if there is no other data available, you may give it a shot.