anonymize_dataset fails if dataset contains RawDataElement #85

GuillaumeDehaene · 2024-05-27T08:23:06Z

Hello,

Thank you for this great project.

While using it in our codebase, I have found the following issue.

anonymize_dataset fails if dataset contains RawDataElement

anonymize_dataset fails if the dataset contains RawDataElement:

element children of a dataset can be either DataElement or RawDataElement: it do not understand the details of pydicom which trigger this.
RawDataElement are not mutable. This means that the anoymization code: element.value = new_value fails. In my case, it's in replace_element_UID (https://github.com/KitwareMedical/dicom-anonymizer/blob/master/dicomanonymizer/simpledicomanonymizer.py#L96) but I believe that all other rules have the same issue.
In our code, I have fixed this by sanitizing the input dataset before calling anonymize_dataset. I iterate through it and replace all RawDataElement with DataElement (there is a pydicom built-in for this: https://pydicom.github.io/pydicom/dev/reference/generated/pydicom.dataelem.DataElement_from_raw.html)

Proposed solution

I would be happy to create a PR to fix this issue here too.

I see two possibilities:

Sanitize dataset as part of anonymize_dataset:
- iterate over the dataset and replace RawDataElements,
- continue with the current code of anonymize_dataset.
Modify current replace_element code to handle the RawDataElement case.

I'm partial to solution 1.:
- it's simple and easy to understand and review.
- it makes it easier for the user to add custom-rules since they are able to assume that all elements are RawDataElement, and they can use the simpler: element.value = new_value syntax.
- however, it also means that input dataset are walked-through twice. I feel that this price is worth paying.

Best
Guillaume

The text was updated successfully, but these errors were encountered:

pchoisel · 2024-07-26T09:35:13Z

Hi @GuillaumeDehaene, please forgive me for the lack of reactivity, I now have a bit of time to work on this project.

I understand the problem but I'm wondering how can this problem happen ? Do those RawDataElement come directly from DICOM files or are you anonymizing a custom dataset ?

To fix this issue, instead of parsing the dataset twice to sanitize it, couldn't we just drop the tag if we detect that it is a RawDataElement ? This could be done in dicomanonymizer/simpledicomanonymizer.py::anonymize_dataset()

GuillaumeDehaene · 2024-07-27T10:12:18Z

Hello @pchoisel

No worries. Thank you for maintaining this project.

So this was coming from a real DICOM dataset, but I'm not in contact with the person who generated it so I don't have any info regarding which software, which kind of data, etc.
I also was unable to figure out what this RawDataElement corresponds to in pydicom 🤷

I guess we could also throw out any raw data but that feels very rough, doesn't it?
Double-parsing the data is suboptimal but it is also simple. Are you so worried about the performance hit?

Anyway, I've made my case and I think those are the options.
If you let me know which option you prefer, I'll write a PR for it and we can discuss how to proceed further on that basis.

Thank you again for your work on this project.
Best regards
Guillaume

pchoisel · 2024-08-05T13:42:41Z

Hi @GuillaumeDehaene,

I definitely don't want to parse the dataset twice. That's mainly because people (at least us) use this software in automated processes that anonymize lots of DICOM files.

I'm fine with not dropping the RawDataElement. Have you checked the function DataElement_from_raw ? Maybe we could use it to handle the RawDataElement. If you want to keep those tags as RawDataElement, maybe you can re-transform them before writing the output file ?

What do you think ?

GuillaumeDehaene · 2024-08-14T13:32:54Z

Hello If you don't want to double-parse, then it's probably possible to rewrite the parsing code to: - check each element for whether they are raw - replace raw elements by standard elements - apply anonymization I'll write it out. I'll try to have a draft done for end of september, hopefully? Best Guillaume Le lun. 5 août 2024 à 15:43, pchoisel ***@***.***> a écrit :

…

Hi @GuillaumeDehaene <https://github.com/GuillaumeDehaene>, I definitely don't want to parse the dataset twice. That's mainly because people (at least us) use this software in automated processes that anonymize lots of DICOM files. I'm fine with not dropping the RawDataElement. Have you checked the function DataElement_from_raw <https://pydicom.github.io/pydicom/dev/reference/generated/pydicom.dataelem.DataElement_from_raw.html> ? Maybe we could use it to handle the RawDataElement. If you want to keep those tags as RawDataElement, maybe you can re-transform them before writing the output file ? What do you think ? — Reply to this email directly, view it on GitHub <#85 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACZDZEDBGBWYSTGI2OUIE53ZP56OPAVCNFSM6AAAAABIKXCF2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGEYTGMZUG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

pchoisel · 2024-08-26T13:00:10Z

That sounds perfect !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anonymize_dataset fails if dataset contains RawDataElement #85

anonymize_dataset fails if dataset contains RawDataElement #85

GuillaumeDehaene commented May 27, 2024

pchoisel commented Jul 26, 2024

GuillaumeDehaene commented Jul 27, 2024

pchoisel commented Aug 5, 2024

GuillaumeDehaene commented Aug 14, 2024 via email

pchoisel commented Aug 26, 2024

anonymize_dataset fails if dataset contains RawDataElement #85

anonymize_dataset fails if dataset contains RawDataElement #85

Comments

GuillaumeDehaene commented May 27, 2024

anonymize_dataset fails if dataset contains RawDataElement

Proposed solution

pchoisel commented Jul 26, 2024

GuillaumeDehaene commented Jul 27, 2024

pchoisel commented Aug 5, 2024

GuillaumeDehaene commented Aug 14, 2024 via email

pchoisel commented Aug 26, 2024