You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@kaitlynhair currently, dedup_citations returns cite_source and cite_label also as source and label, which then leads to duplications in output (and confusion where those fields actually contained something else).
We talked about this in the last meeting. Not sure if these extra columns (source, label) are used for functions down the line. If they are we should probably change this to point utilize cite_source and cite_label to keep the data clean.
If for some reason this is something ASySD relies on I think that we can just work to remove these columns after they are no longer needed for processing the data (as long as that will still work for re-importing data)
Looking at all the functions that are used down the line, source and label are not used. My guess is that source and label are required as part of ASySD... we should be able to remove these columns at some point in our data processing when calling the dedup functions in ASySD, not sure if we can do this now or wait until ASySD is on CRAN (or at least submitted).
We currently turn cite_source into source, and cite_label into label before ASySD, and then copy the results back after (in our dedup_citations()). I think that came from a time when ASySD did not support merging other fields. Instead, I believe we can call ASySD with extra_merge_field = our three cite_fields, and leave the label and source columns alone (in case they have meaningful information)?
I need a better understanding of the extra_merge_field argument functionality in ASySD...
However, source and label columns are not used in CiteSource after deduplication so I don't think we need to keep them.
Since the information in source and label are duplicate data from cite_source and cite_label, even if source and label are used down the line we could easily just point to those columns instead of source and label (again I'm almost 100% that source and label are not being used after dedup anyway).
@kaitlynhair currently, dedup_citations returns
cite_source
andcite_label
also assource
andlabel
, which then leads to duplications in output (and confusion where those fields actually contained something else).Reproduce:
The text was updated successfully, but these errors were encountered: