Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DNA width measurements #1033

Open
2 tasks
ns-rse opened this issue Dec 4, 2024 · 0 comments
Open
2 tasks

Improve DNA width measurements #1033

ns-rse opened this issue Dec 4, 2024 · 0 comments
Labels
DNATracing Issues pertaining to the DNATracing class

Comments

@ns-rse
Copy link
Collaborator

ns-rse commented Dec 4, 2024

#999 introduced the mean width of DNA to the output statistics (thanks @tcatley 👍 )

There are a couple of improvements that can be made...

  • Flexible average
  • Correctly handle odd-width masks

Flexible average

Currently the mean() of the width across the whole molecule is returned. It would be simple to add an option average: str with options mean and median and allow the calculation of either the mean or median width (or by extension other measures of centrality). The option should be specified in topostats/default_config.yaml under disordered_tracing

topostats/default_config.yaml

disordered_tracing:
  ...
  width_average: mean # Measure of central tendency across molecule width. Options : mean | median

docs/configuration.md

| `disordered_tracing`  | `width_average`  | str   | `mean`  | Method of central tendency across molecule width. Options: `mean`, `median`. |

topostats/tracing/disordered_tracing.py

NB The following is untested and just a suggestion

    @staticmethod
    def calculate_dna_width(
        smoothed_mask: npt.NDArray,
        pruned_skeleton: npt.NDArray,
        px2nm: float = 1,
        width_average: str = "mean",
    ) -> float:
        """
        Calculate the mean width in metres of the DNA using the trace and mask.
        Parameters
        ----------
        smoothed_mask : npt.NDArray
            Smoothed mask to be measured.
        pruned_skeleton : npt.NDArray
            Pruned skeleton.
        px2nm : float
            Scaling of pixels to nanometres.
        width_average : str
            Measure of central tendency. Options are ``mean`` and ``median``.

        Returns
        -------
        float
            Width of grain in metres.
        """
        dist_trans = distance_transform_edt(smoothed_mask)
        comb = np.where(pruned_skeleton == 1, dist_trans, 0)

		width_average = {
            "mean": np.mean,
            "median": np.median,
        }
        return width_average(comb[comb != 0]) * 2 * px2nm

Tests will need updating in light of changes.

Correctly handle odd-width masks

As noted in this comment the width of any region where the mask is an odd number of pixels will be over-estimated because the comb is multiplied by 2. This systematically inflates the estimate of the width across the length of the molecule.

Ideally we should get a more accurate measurement of the width that correctly handles such regions.

@ns-rse ns-rse added the DNATracing Issues pertaining to the DNATracing class label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DNATracing Issues pertaining to the DNATracing class
Projects
None yet
Development

No branches or pull requests

1 participant