Increase robustness of calculate_record_counts #161

LukasWallrich · 2023-06-16T09:54:25Z

Does this function need to rely on a search label? I would have expected it to work without any labels. If it needs the label, that should be documented, and a warning be issued when there is no search label.

LukasWallrich · 2023-06-16T10:15:29Z

Also, I don't think it (the record summary table) currently works correctly to compare labels or strings rather than sources - can that be? In my test, I get NAs in the last two columns.

TNRiley · 2023-06-16T11:37:46Z

Also, I don't think it (the record summary table) currently works correctly to compare labels or strings rather than sources - can that be? In my test, I get NAs in the last two columns.

I'm trying to think of a use case outside of comparing the sources. I don't see any need for it to compare strings or labels.

Does this function need to rely on a search label? I would have expected it to work without any labels. If it needs the label, that should be documented, and a warning be issued when there is no search label.

Edited it does rely on the label being "search" which could potentially be removed without other changes. I'll take a look at this in a bit more depth.

TNRiley · 2023-06-21T11:26:13Z

Does this function need to rely on a search label? I would have expected it to work without any labels. If it needs the label, that should be documented, and a warning be issued when there is no search label.

Looked at this again and remembered why I had set this to filter by the search label. The table is used to show the impact of sources/methods. If we remove the need for the label, it will show any screened or final data as well, which I currently don't see a use case for and believe it would just clutter things up. I think there are many ways in which we could change this to ensure it is usable for other use cases, but I'm not able to envision those at this time.

I can add the following warning to the function and details to the documentation.

if (!"search" %in% n_unique$cite_label) {
warning("Source's must be tagged as 'search' in the 'cite_label' field.")

#' @details
#' The function works on three main inputs: unique_citations, citations, and n_unique. Each of these dataframes contains
#' a column corresponding to the database source.
#'
#' Additionally, n_unique dataframe contains two specific columns: 'cite_source' and 'cite_label', provided by the user.
#' 'cite_source' column represents the source of the citation and 'cite_label' represents another variable assigned to the
#' citation (normally search, screened, final).
#'
#' The function groups by 'cite_source' and filters by 'cite_label' == "search". These steps are essential for correctly
#' calculating unique records count.

LukasWallrich · 2023-06-21T13:26:38Z

If the data is set up correctly, final and screened should not be sources but labels, and always be a subset of search - so that filtering for search should not do anything? For that reason, I would find it convenient if you could just upload data to shiny without having to label everything (but 2) as search. Would it make sense to use blank labels instead of search if search does not exist?

…

On Wed, 21 Jun 2023, 12:26 Trevor Riley, ***@***.***> wrote: Does this function need to rely on a search label? I would have expected it to work without any labels. If it needs the label, that should be documented, and a warning be issued when there is no search label. Looked at this again and remembered why I had set this to filter by the search label. The table is used to show the impact of sources/methods. If we remove the need for the label, it will show any screened or final data as well, which I currently don't see a use case for and believe it would just clutter things up. I think there are many ways in which we could change this to ensure it is usable for other use cases, but I'm not able to envision those at this time. I can add the following warning to the function and details to the documentation. if (!"search" %in% n_unique$cite_label) { warning("Source's must be tagged as 'search' in the 'cite_label' field.") #' @details <https://github.com/details> #' The function works on three main inputs: unique_citations, citations, and n_unique. Each of these dataframes contains #' a column corresponding to the database source. #' #' Additionally, n_unique dataframe contains two specific columns: 'cite_source' and 'cite_label', provided by the user. #' 'cite_source' column represents the source of the citation and 'cite_label' represents another variable assigned to the #' citation (normally search, screened, final). #' #' The function groups by 'cite_source' and filters by 'cite_label' == "search". These steps are essential for correctly #' calculating unique records count. — Reply to this email directly, view it on GitHub <#161 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOK6NGNSGZOBMP5MUDHULRTXMLK57ANCNFSM6AAAAAAZJAVZDQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

TNRiley · 2023-06-21T16:33:02Z

If the data is set up correctly, final and screened should not be sources but labels, and always be a subset of search - so that filtering for search should not do anything? For that reason, I would find it convenient if you could just upload data to shiny without having to label everything (but 2) as search. Would it make sense to use blank labels instead of search if search does not exist?

The labels would be set as search, screened, or final. We only want to see this information broken down by source, so the function is filtering for only the search records. If we didn't filter by search the table would pull the screened and final data too, which we don't want. Does this make sense, I'm not sure if I'm misunderstanding your point ;)

LukasWallrich · 2023-06-22T15:15:41Z

I forgot that the new calculate_record_counts depends on the raw citations, not just the deduplicated citations. However, would filtered and screened records not just have a blank source there, which we could filter out?

Typing in "search" many times just does not seem to be very user-friendly on Shiny - but if we can document clearly that this should be done (best on the page where users upload data so that this is correct pre-deduplication) and throw a warning when it is not present then that's good enough for now.

TNRiley · 2023-06-23T00:56:26Z

I'm not sure I can think of a way around it currently. On the current web shiny there is an option to add a label before you click upload file. If we could add this function to the new shiny where someone could select multiple files and add a label to all of them at the same time, that would make it easy.

We'd still need to add the information on the upload page that lets users know that they should use "search" for source files, I think that would work.

TNRiley · 2024-05-30T15:55:14Z

@LukasWallrich just circling back to this one and I wanted to see if you still wanted to make adjustments here. Maybe we can run through some use case examples in the shiny once it is complete and see if this requires changes?

TNRiley · 2024-07-09T14:48:01Z

Closing this as changes would require multiple functions to be significantly changed, beyond this function. I'll add instructions for users. If needed we can look at the internal structure of the count functions in the future.

TNRiley closed this as completed Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase robustness of calculate_record_counts #161

Increase robustness of calculate_record_counts #161

LukasWallrich commented Jun 16, 2023

LukasWallrich commented Jun 16, 2023 •

edited

Loading

TNRiley commented Jun 16, 2023 •

edited

Loading

TNRiley commented Jun 21, 2023

LukasWallrich commented Jun 21, 2023 via email

TNRiley commented Jun 21, 2023

LukasWallrich commented Jun 22, 2023

TNRiley commented Jun 23, 2023 •

edited

Loading

TNRiley commented May 30, 2024

TNRiley commented Jul 9, 2024

Increase robustness of calculate_record_counts #161

Increase robustness of calculate_record_counts #161

Comments

LukasWallrich commented Jun 16, 2023

LukasWallrich commented Jun 16, 2023 • edited Loading

TNRiley commented Jun 16, 2023 • edited Loading

TNRiley commented Jun 21, 2023

LukasWallrich commented Jun 21, 2023 via email

TNRiley commented Jun 21, 2023

LukasWallrich commented Jun 22, 2023

TNRiley commented Jun 23, 2023 • edited Loading

TNRiley commented May 30, 2024

TNRiley commented Jul 9, 2024

LukasWallrich commented Jun 16, 2023 •

edited

Loading

TNRiley commented Jun 16, 2023 •

edited

Loading

TNRiley commented Jun 23, 2023 •

edited

Loading