Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplication & Table errors when uploading files without URL field (3 Web of Science .ris example) #179

Closed
TNRiley opened this issue Jun 7, 2024 · 9 comments
Assignees
Labels

Comments

@TNRiley
Copy link
Collaborator

TNRiley commented Jun 7, 2024

This appears to be limited to a specific case study I'm working to document. These errors do not come up when I use other .ris files. My thought is that something is happening with the WOS identifier or something related.

Running three searches in Web of Science - (simple variations on a string strategy) - export full record for each

  1. (whale OR cetacean) AND ((passive AND acoustic) AND (monitor* OR record* OR detect*)) n=642
  2. (whale OR cetacean) AND (“passive acoustic” AND (monitor* OR record* OR detect*)) n=557
  3. (whale OR cetacean) AND (“passive acoustic monitoring” OR “passive acoustic recording” OR “passive acoustic detection”) n=367

1st error occurs if you do any manual deduplication, all tables and visuals will show an error
2nd error that is constant regardless if you do manual deduplication is that the individual record table throws and error

Note: when deduplicating I do get the pop up that says that of the 1566 records, there were 642 unique records (which makes no sense as this is the number of records from v1 and there is complete overlap between v2 and v3) Furthermore, there is 1 set of potential duplicates which for some reason were not automatically identified despite all data coming from WOS (this could be a metadata issue so would need to be reviewed later)

@TNRiley TNRiley self-assigned this Jun 10, 2024
@DrMattG
Copy link
Collaborator

DrMattG commented Jun 17, 2024

I get an error because it is looking for the cite_string column and that does not exist in the dataframes. Is this because they are not declared in the function like cite_source and cite_label are?

raw_citations$label <- raw_citations$cite_label

@TNRiley
Copy link
Collaborator Author

TNRiley commented Jun 17, 2024

I've uploaded the .ris and a script in the test file folder.

@TNRiley
Copy link
Collaborator Author

TNRiley commented Jun 17, 2024

Running these files in R, rather than the shiny throws an error for the record-level table. It appears that the URL column is the issue for some strange reason. Looking at the data the URL seems to be missing from the beginning with all these WoS files and others. I think that the second error with the table is due to the fact that this is the first time I've run test with ONLY WoS .ris files. When other database files are included the URL column is added and you end up with an NA for the WoS citations. Most likely the way to fix this is to make the record-level table not reliant on the URL column to run.

Error in `dplyr::mutate()`:In argument: `reference = generate_apa_reference(...)`.
Caused by error in `.data$url`:
! Column `url` not found in `.data`.
Run `rlang::last_trace()` to see where the error occurred.

@TNRiley
Copy link
Collaborator Author

TNRiley commented Jun 17, 2024

@LukasWallrich can you take a look at how the record_level_table and the generate_apa_reference functions can be changed to work if there is no URL column due to it not being included in any of the citation files/metadata? I've tried but have not been successful.

Also removing the script for the testing from the test folder due to the CMD check failure. I'll keep the .ris files in "shinytest"in test folder

@TNRiley TNRiley changed the title Multiple Shiny Errors on visualization and tables (Web of Science string variations) Deduplication & Table errors when uploading files without URL field (3 Web of Science .ris example) Jul 9, 2024
@TNRiley TNRiley added Bug Something isn't working Priority labels Jul 9, 2024
@LukasWallrich
Copy link
Collaborator

@TNRiley I changed the record_level_table(), can you check if the issue persists?

@TNRiley
Copy link
Collaborator Author

TNRiley commented Jul 29, 2024

@LukasWallrich I'm still running into an error in both the shiny and R. I've been unsuccessful in troubleshooting, it's hung up on the weblink column needing to be a character type and despite converting and checking I still get the error.

@TNRiley
Copy link
Collaborator Author

TNRiley commented Jul 29, 2024

Also the record_level_table is written in a way that it uses the "citations" tibble, but all the tests I have been running and all examples which are on the vignettes are using "unique_citations" which seems correct. I believe the record_level_table function needs to be corrected to ensure it's pointing to the deduplicated unique_citations data, but need to review further.

@LukasWallrich
Copy link
Collaborator

LukasWallrich commented Jul 30, 2024

Sorry about that - this bug fix was too ad-hoc.

I now moved the error and type checking into the generate_apa_reference() function, and also fixed the reference generation for single-name authors. I also added tests that should prevent URL and type errors (or other missing column issues) from reoccurring. @TNRiley can you test it again, and also keep an eye out for references that are misformatted?

Re citations, the function uses whatever argument you pass to it. So if you call record_level_table(unique_citations), then R translates that to record_level_table(citations = unique citations) because unnamed arguments are allocated in order, and any reference to citations within the function refers to the value of that argument. We can rename the argument (and all references to it within the function) to offer a consistent user interface - but given that all functions after dedup require unique citations, I am wondering whether we should rather rename it all to citations? While this does not change functionality, consistency will make for a better user experience (and this should not change after the first release as it is a breaking change.) [I will create a new issue for this, so that we can close this if URL is solved.]

@TNRiley
Copy link
Collaborator Author

TNRiley commented Jul 30, 2024

I tested it in local shiny and it worked, the small sample set of citations I viewed also looked good and was formatted correctly.

@TNRiley TNRiley closed this as completed Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants