-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite table scripts to rely soley on deduplicated data (for re-imported data) #196
Comments
@LukasWallrich I've created two new functions that should be able to replace the record_counts function and the record_counts_table function. I based them on re-import of a .csv -- Can you review this? It works in all the various vignettes and other test libries I've tried. I'm thinking that I'd like to keep these functions separate and name them something according to processing re-imported data. https://gist.github.com/TNRiley/b48fc14f719c00d9030af9705bd0898a the more difficult ones will be the calculate_record_counts and the tables that use the output of that data... |
Happy to review this - but for context, why are you proposing to keep both? If we can do everything with unique_citations, then that would seem to make for an easier user interface? If this is about not changing functions, then let's introduce new ones and deprecate old ones ... but not have two parallel sets? |
I was thinking we'd keep them both until everything is fully tested and then deprecate the old ones. |
Ok, that makes sense - will review your functions & then let's think of what to call the new functions. Currently, our interface is not particularly consistent between verbs ( |
Just looked at yours, and it looks good (though you might want to try to stick to one style a bit more - when using dplyr, it makes sense to use [EDIT: I take back my earlier suggestion to separate unique_citations back into citations in a helper function - that does not work with missing data in any of the original cite_ columns. You seem to be making great progress, I will be happy to review the code and think about how to streamline some of it when you are through.] |
I've gone through each of the new functions and applied changes as needed to maintain a consistent style. I've also added comments and placeholder examples. I'm going to drop the new functions into a new script in the R directory. I also do not think that we need to keep the "citation_summary_table". That function was the first one I created and isn't as useful now. we can depricate it when we do the others. |
screwed up a little on the script and will need to try again. Had to add the .data prefix for the columns as I was receiving some issues on those and forgot that is what I had done previously. I need to figure out how to test these in relation to the CMD check before bringing them into github. @LukasWallrich is it as simple as adding the .R locally and then running the devetools::check() ? |
Yes, that should do it
…On Fri, 9 Aug 2024, 14:49 Trevor Riley, ***@***.***> wrote:
screwed up a little on the script and will need to try again. Had to add
the .data prefix for the columns as I was receiving some issues on those
and forgot that is what I had done previously. I need to figure out how to
test these in relation to the CMD check before bringing them into github.
@LukasWallrich <https://github.com/LukasWallrich> is it as simple as
adding the .R locally and then running the devetools::check() ?
—
Reply to this email directly, view it on GitHub
<#196 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOK6NGMGUR5KOBB7QPKH45TZQTCHFAVCNFSM6AAAAABLYOEEWSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZXHE4DQOJQGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I believe the functions are good to go now. The only issue I was running into was with the @example data for the new create_detailed_records_table - no matter what I spent many hours trying to resolve it, but didn't fail after my push. closing this now as this is complete, if there are any issues or changes with individual new functions we can add in a separate issue. Once these are fully integrated into the vignettes we can deprecate the old functions and change the version. examples should be changed in the future in relation to issue #200 |
The following tables currently rely on the original uploaded citation file data as well as the post-deduplication data. Reliance on the original file data is problematic due to the fact that users will not be able to recreate any of the tables from exported CiteSource data when re-importing.
Changing these tables to be able to rely on only the deduplicated data would allow users to retain a single .ris or .csv rather than all of the initial raw citation files.
This should be able to be accomplished as the exported data retains the duplicate_id, record_ids, cite_source, cite_label, and (cite_string - not currently exported but easily added)
The text was updated successfully, but these errors were encountered: