Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Behavior: {Problem with dplyr internal call of redcap_read()} #538

Open
echaritos opened this issue Oct 1, 2024 · 16 comments
Open

Comments

@echaritos
Copy link

echaritos commented Oct 1, 2024

I have

> R.version.string
[1] "R version 4.2.3 (2023-03-15)"
> packageVersion("REDCapR")
[1] ‘1.2.0’
data<-redcap_read(redcap_uri=url, token=token, records = i )$data
598 variable metadata records were read from REDCap in 0.1 seconds.  The http status code was 200.                                                           
The data dictionary describing 396 fields was read from REDCap in 0.1 seconds.  The http status code was 200.
19 instrument metadata records were read from REDCap in 0.1 seconds.  The http status code was 200.                                                          
1 rows were read from REDCap in 0.1 seconds.  The http status code was 200.                                                                                  
0 data access groups were read from REDCap in 0.1 seconds.  The http status code was 200.                                                                    
Error in `dplyr::select()`:
! Can't select columns that don't exist.
✖ Column `export_field_name` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.
Warning messages:
1: The following named parsers don't match the column names: NA 
2: Unknown or uninitialised column: `original_field_name`. 

but

data<-redcap_read_oneshot(redcap_uri=url, token=token,records = i )$data
1 records and 605 columns were read from REDCap in 0.1 seconds.  The http status code was 200. 

It seems that the internal call of redcap_read() throws an error. I am able to read the projects metadata


> rlang::last_trace()
<error/vctrs_error_subscript_oob>
Error in `dplyr::select()`:
! Can't select columns that don't exist.
✖ Column `export_field_name` doesn't exist.
---
Backtrace:
    ▆
 1. ├─REDCapR::redcap_read(redcap_uri = url, token = token, records = i)
 2. │ └─REDCapR:::redcap_metadata_internal(...)
 3. │   └─d_var %>% ...
 4. ├─dplyr::select(., field_name = "export_field_name", field_name_base = "original_field_name")
 5. └─dplyr:::select.data.frame(., field_name = "export_field_name", field_name_base = "original_field_name")
Run rlang::last_trace(drop = FALSE) to see 17 hidden frames.
> rlang::last_trace(drop = FALSE)
<error/vctrs_error_subscript_oob>
Error in `dplyr::select()`:
! Can't select columns that don't exist.
✖ Column `export_field_name` doesn't exist.
---
Backtrace:
     ▆
  1. ├─REDCapR::redcap_read(redcap_uri = url, token = token, records = i)
  2. │ └─REDCapR:::redcap_metadata_internal(...)
  3. │   └─d_var %>% ...
  4. ├─dplyr::select(., field_name = "export_field_name", field_name_base = "original_field_name")
  5. ├─dplyr:::select.data.frame(., field_name = "export_field_name", field_name_base = "original_field_name")
  6. │ └─tidyselect::eval_select(expr(c(...)), data = .data, error_call = error_call)
  7. │   └─tidyselect:::eval_select_impl(...)
  8. │     ├─tidyselect:::with_subscript_errors(...)
  9. │     │ └─base::withCallingHandlers(...)
 10. │     └─tidyselect:::vars_select_eval(...)
 11. │       └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
 12. │         └─tidyselect:::eval_c(expr, data_mask, context_mask)
 13. │           └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
 14. │             └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
 15. │               └─tidyselect:::as_indices_sel_impl(...)
 16. │                 └─tidyselect:::as_indices_impl(...)
 17. │                   └─tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
 18. │                     └─vctrs::vec_as_location(...)
 19. └─vctrs (local) `<fn>`()
 20.   └─vctrs:::stop_subscript_oob(...)
 21.     └─vctrs:::stop_subscript(...)
 22.       └─rlang::abort(...)
@wibeasley
Copy link
Member

Thanks for the details, @echaritos. What version of REDCap? These may be fields that were added a year or two ago. Did this function work before?

If this is happening on a recent-ish version of REDCap, I'd like to know more about the scenario so I can modify the code and add regression tests.

@echaritos
Copy link
Author

echaritos commented Oct 1, 2024 via email

@echaritos
Copy link
Author

echaritos commented Oct 1, 2024 via email

@echaritos
Copy link
Author

echaritos commented Oct 1, 2024 via email

@wibeasley
Copy link
Member

wibeasley commented Oct 1, 2024

Bummer. It looks like a problem with these two lines in the non-public function REDCapR:::redcap_metadata_internal().

image

Can you try these calls (to test projects on our server)? If that works, can you replace your uri & token with the project you're having trouble with?

uri   <- "https://bbmc.ouhsc.edu/redcap/api/"

# A simple project (pid 153)
REDCapR:::redcap_metadata_internal(uri, "9A81268476645C4E5F03428B8AC3AA7B")$d_variable

# A longitudinal project (pid 212)
REDCapR:::redcap_metadata_internal(uri, "0434F0E9CF53ED0587847AB6E51DE762")$d_variable

# A repeating measures (pid 3181)
REDCapR:::redcap_metadata_internal(uri, "22C3FF1C8B08899FB6F86D91D874A159")$d_variable

image

@echaritos
Copy link
Author

I had to downgrade to 1.1.0 because this was on our production server. The problem almost went away. I have the suspicion that it is not purely a problem of REDCapR but an interaction with it and tibble::. I'll try to test tomorrow but my test capability is very limited. Thanks for your help.

@echaritos
Copy link
Author


Restarting R session...

> #New

> library(REDCapR, lib.loc = "/home/r_studio_rc/redcap_files/libs")

> packageVersion("REDCapR")
[1] ‘1.2.0’

> print(R.version.string)
[1] "R version 4.2.3 (2023-03-15)"

> packageVersion("tidyverse")
[1] ‘2.0.0’

> packageVersion("tibble")
[1] ‘3.2.1’

> packageVersion("dplyr")
[1] ‘1.1.4’

> 
> 

> uri   <- "https://bbmc.ouhsc.edu/redcap/api/"
> 

> # A simple project (pid 153)
> REDCapR:::redcap_metadata_internal(uri, "9A81268476645C4E5F03428B8AC3AA7B")$d_variable
# A tibble: 25 × 9                                                                                                                                           
   field_name            form_name   field_type validation_type autonumber readr_col_type aligned field_name_base plumbing
   <chr>                 <chr>       <chr>      <chr>           <lgl>      <chr>          <chr>   <chr>           <lgl>   
 1 record_id             demographi… text       NA              TRUE       col_character… "  rec… record_id       TRUE    
 2 name_first            demographi… text       NA              FALSE      col_character… "  nam… name_first      FALSE   
 3 name_last             demographi… text       NA              FALSE      col_character… "  nam… name_last       FALSE   
 4 address               demographi… notes      NA              FALSE      col_character… "  add… address         FALSE   
 5 telephone             demographi… text       phone           FALSE      col_character… "  tel… telephone       FALSE   
 6 email                 demographi… text       email           FALSE      col_character… "  ema… email           FALSE   
 7 dob                   demographi… text       date_ymd        FALSE      col_date()     "  dob… dob             FALSE   
 8 age                   demographi… text       NA              FALSE      col_character… "  age… age             FALSE   
 9 sex                   demographi… radio      NA              FALSE      col_character… "  sex… sex             FALSE   
10 demographics_complete demographi… complete   NA              FALSE      col_integer()  "  dem… demographics_c… FALSE   
# ℹ 15 more rows
# ℹ Use `print(n = ...)` to see more rows

> REDCapR:::redcap_read(redcap_uri = uri, token = "9A81268476645C4E5F03428B8AC3AA7B")
24 variable metadata records were read from REDCap in 0.3 seconds.  The http status code was 200.                                                            
The data dictionary describing 17 fields was read from REDCap in 0.3 seconds.  The http status code was 200.
3 instrument metadata records were read from REDCap in 0.3 seconds.  The http status code was 200.                                                           
1 rows were read from REDCap in 0.3 seconds.  The http status code was 200.                                                                                  
2 data access groups were read from REDCap in 0.3 seconds.  The http status code was 200.                                                                    
5 records and 1 columns were read from REDCap in 0.4 seconds.  The http status code was 200.                                                                 
Starting to read 5 records  at 2024-10-02 09:44:36.
Reading batch 1 of 1, with subjects 1 through 5 (ie, 5 unique subject records).
5 records and 25 columns were read from REDCap in 0.4 seconds.  The http status code was 200.                                                                
$data
# A tibble: 5 × 25
  record_id name_first name_last address  telephone email dob          age   sex demographics_complete height weight   bmi
      <dbl> <chr>      <chr>     <chr>    <chr>     <chr> <date>     <dbl> <dbl>                 <dbl>  <dbl>  <dbl> <dbl>
1         1 Nutmeg     Nutmouse  "14 Ros… (405) 32… nutt… 2003-08-30    11     0                     2     7       1 204. 
2         2 Tumtum     Nutmouse  "14 Ros… (405) 32… tumm… 2003-03-10    11     1                     2     6       1 278. 
3         3 Marcus     Wood      "243 Hi… (405) 32… mw@m… 1934-04-09    80     1                     2   180      80  24.7
4         4 Trudy      DAG       "342 El… (405) 32… pero… 1952-11-02    61     0                     2   165      54  19.8
5         5 John Lee   Walker    "Hotel … (405) 32… left… 1955-04-15    59     1                     2   193.    104  27.9
# ℹ 12 more variables: comments <chr>, mugshot <chr>, health_complete <dbl>, race___1 <dbl>, race___2 <dbl>,
#   race___3 <dbl>, race___4 <dbl>, race___5 <dbl>, race___6 <dbl>, ethnicity <dbl>, interpreter_needed <dbl>,
#   race_and_ethnicity_complete <dbl>

$success
[1] TRUE

$status_codes
[1] "200"

$outcome_messages
[1] "5 records and 25 columns were read from REDCap in 0.4 seconds.  The http status code was 200."

$records_collapsed
[1] ""

$fields_collapsed
[1] ""

$forms_collapsed
[1] ""

$events_collapsed
[1] ""

$filter_logic
[1] ""

$datetime_range_begin
[1] NA

$datetime_range_end
[1] NA

$elapsed_seconds
[1] 2.845126

> 

> # A longitudinal project (pid 212)
> REDCapR:::redcap_metadata_internal(uri, "0434F0E9CF53ED0587847AB6E51DE762")$d_variable
# A tibble: 125 × 9                                                                                                                                          
   field_name        form_name       field_type validation_type autonumber readr_col_type aligned field_name_base plumbing
   <chr>             <chr>           <chr>      <chr>           <lgl>      <chr>          <chr>   <chr>           <lgl>   
 1 study_id          demographics    text       NA              FALSE      col_character… "  stu… study_id        TRUE    
 2 redcap_event_name longitudinal/r… event_name NA              FALSE      col_character… "  red… redcap_event_n… TRUE    
 3 date_enrolled     demographics    text       date_ymd        FALSE      col_date()     "  dat… date_enrolled   FALSE   
 4 patient_document  demographics    file       NA              FALSE      col_character… "  pat… patient_docume… FALSE   
 5 first_name        demographics    text       NA              FALSE      col_character… "  fir… first_name      FALSE   
 6 last_name         demographics    text       NA              FALSE      col_character… "  las… last_name       FALSE   
 7 telephone_1       demographics    text       phone           FALSE      col_character… "  tel… telephone_1     FALSE   
 8 email             demographics    text       email           FALSE      col_character… "  ema… email           FALSE   
 9 dob               demographics    text       date_ymd        FALSE      col_date()     "  dob… dob             FALSE   
10 age               demographics    calc       NA              FALSE      col_character… "  age… age             FALSE   
# ℹ 115 more rows
# ℹ Use `print(n = ...)` to see more rows
> 

> # A repeating measures (pid 3181)
> REDCapR:::redcap_metadata_internal(uri, "22C3FF1C8B08899FB6F86D91D874A159")$d_variable
# A tibble: 15 × 9                                                                                                                                           
   field_name              form_name field_type validation_type autonumber readr_col_type aligned field_name_base plumbing
   <chr>                   <chr>     <chr>      <chr>           <lgl>      <chr>          <chr>   <chr>           <lgl>   
 1 record_id               intake    text       NA              TRUE       col_integer()  "  rec… record_id       TRUE    
 2 redcap_repeat_instrume… longitud… repeat_in… NA              FALSE      col_character… "  red… redcap_repeat_… TRUE    
 3 redcap_repeat_instance  longitud… repeat_in… NA              FALSE      col_integer()  "  red… redcap_repeat_… TRUE    
 4 height                  intake    text       number_1dp      FALSE      col_double()   "  hei… height          FALSE   
 5 weight                  intake    text       number_1dp      FALSE      col_double()   "  wei… weight          FALSE   
 6 bmi                     intake    text       number_1dp      FALSE      col_double()   "  bmi… bmi             FALSE   
 7 intake_complete         intake    complete   NA              FALSE      col_integer()  "  int… intake_complete FALSE   
 8 sbp                     blood_pr… text       number          FALSE      col_double()   "  sbp… sbp             FALSE   
 9 dbp                     blood_pr… text       number          FALSE      col_double()   "  dbp… dbp             FALSE   
10 blood_pressure_complete blood_pr… complete   NA              FALSE      col_integer()  "  blo… blood_pressure… FALSE   
11 lab                     laborato… text       NA              FALSE      col_character… "  lab… lab             FALSE   
12 conc                    laborato… text       NA              FALSE      col_character… "  con… conc            FALSE   
13 laboratory_complete     laborato… complete   NA              FALSE      col_integer()  "  lab… laboratory_com… FALSE   
14 image_profile           image     file       NA              FALSE      col_character… "  ima… image_profile   FALSE   
15 image_complete          image     complete   NA              FALSE      col_integer()  "  ima… image_complete  FALSE 

@echaritos
Copy link
Author

echaritos commented Oct 2, 2024

Ok here are some more diagnostics. The issues seems to be with an interaction between dplyr and REDCapR 1.2.0

#With 1.2.0

packageVersion("REDCapR")
[1] ‘1.2.0’

> print(R.version.string)
[1] "R version 4.2.3 (2023-03-15)"

> packageVersion("tidyverse")
[1] ‘2.0.0’

> packageVersion("tibble")
[1] ‘3.2.1’

> packageVersion("dplyr")
[1] ‘1.1.4’

> data<-redcap_read(batch_size=300, redcap_uri="http://192.168.28.111/redcap/api/", token=rstudio.HC.DB.token,records = i )$data
598 variable metadata records were read from REDCap in 0.1 seconds.  The http status code was 200.                                                           
The data dictionary describing 396 fields was read from REDCap in 0.1 seconds.  The http status code was 200.
19 instrument metadata records were read from REDCap in 0.1 seconds.  The http status code was 200.                                                          
1 rows were read from REDCap in 0.1 seconds.  The http status code was 200.                                                                                  
0 data access groups were read from REDCap in 0.1 seconds.  The http status code was 200.                                                                    
Error in `dplyr::select()`:
! Can't select columns that don't exist.
✖ Column `export_field_name` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.
Warning messages:
1: The following named parsers don't match the column names: NA 
2: Unknown or uninitialised column: `original_field_name`. 

> data<-redcap_read_oneshot(redcap_uri="http://192.168.28.111/redcap/api/", token=rstudio.HC.DB.token,records = i )$data
1 records and 605 columns were read from REDCap in 0.1 seconds.  The http status code was 200.                                                               

> detach("package:REDCapR", unload = TRUE)

#With 1.1.0

> #Old

> library(REDCapR)

> packageVersion("REDCapR")
[1] ‘1.1.0’

> print(R.version.string)
[1] "R version 4.2.3 (2023-03-15)"

> packageVersion("tidyverse")
[1] ‘2.0.0’

> packageVersion("tibble")
[1] ‘3.2.1’

> packageVersion("dplyr")
[1] ‘1.1.4’

> data<-redcap_read(batch_size=300, redcap_uri="http://192.168.28.111/redcap/api/", token=rstudio.HC.DB.token,records = i )$data
The data dictionary describing 396 fields was read from REDCap in 0.1 seconds.  The http status code was 200.
1 records and 1 columns were read from REDCap in 0.1 seconds.  The http status code was 200.                                                                 
Starting to read 1 records  at 2024-10-02 09:51:20.
Reading batch 1 of 1, with subjects 2833 through 2833 (ie, 1 unique subject records).
1 records and 605 columns were read from REDCap in 0.1 seconds.  The http status code was 200.                                                               

── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────
cols(
  ...
  # ... with 276 more columns
)
ℹ Use `spec()` for the full column specifications.


> data<-redcap_read_oneshot(redcap_uri="http://192.168.28.111/redcap/api/", token=rstudio.HC.DB.token,records = i )$data
1 records and 605 columns were read from REDCap in 0.1 seconds.  The http status code was 200.  

The error backtrace remains the same


rlang::last_trace()
<error/vctrs_error_subscript_oob>
Error in `dplyr::select()`:
! Can't select columns that don't exist.
✖ Column `export_field_name` doesn't exist.
---
Backtrace:
    ▆
 1. ├─REDCapR::redcap_read(...)
 2. │ └─REDCapR:::redcap_metadata_internal(...)
 3. │   └─d_var %>% ...
 4. ├─dplyr::select(., field_name = "export_field_name", field_name_base = "original_field_name")
 5. └─dplyr:::select.data.frame(., field_name = "export_field_name", field_name_base = "original_field_name")
Run rlang::last_trace(drop = FALSE) to see 17 hidden frames.
> rlang::last_trace(drop = FALSE)
<error/vctrs_error_subscript_oob>
Error in `dplyr::select()`:
! Can't select columns that don't exist.
✖ Column `export_field_name` doesn't exist.
---
Backtrace:
     ▆
  1. ├─REDCapR::redcap_read(...)
  2. │ └─REDCapR:::redcap_metadata_internal(...)
  3. │   └─d_var %>% ...
  4. ├─dplyr::select(., field_name = "export_field_name", field_name_base = "original_field_name")
  5. ├─dplyr:::select.data.frame(., field_name = "export_field_name", field_name_base = "original_field_name")
  6. │ └─tidyselect::eval_select(expr(c(...)), data = .data, error_call = error_call)
  7. │   └─tidyselect:::eval_select_impl(...)
  8. │     ├─tidyselect:::with_subscript_errors(...)
  9. │     │ └─base::withCallingHandlers(...)
 10. │     └─tidyselect:::vars_select_eval(...)
 11. │       └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
 12. │         └─tidyselect:::eval_c(expr, data_mask, context_mask)
 13. │           └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
 14. │             └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
 15. │               └─tidyselect:::as_indices_sel_impl(...)
 16. │                 └─tidyselect:::as_indices_impl(...)
 17. │                   └─tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
 18. │                     └─vctrs::vec_as_location(...)
 19. └─vctrs (local) `<fn>`()
 20.   └─vctrs:::stop_subscript_oob(...)
 21.     └─vctrs:::stop_subscript(...)
 22.       └─rlang::abort(...)

in the old version i have deleted the output of column specifications.

@echaritos
Copy link
Author

I am on REDCap 14.6.10

@wibeasley
Copy link
Member

weird. I have the same package versions (with REDCapR 1.2.0) and don't see the error. The only difference is I'm running the current version of R (R 4.4.1). It sounds like you're locked into the version of R from 18 months ago (ie, 4.2.3)? I don't see why that would matter, but that's the only difference I see right now. I should have time tomorrow to debug things with you. If possible, can you try R 4.4.1 from another computer?

@echaritos
Copy link
Author

It does not look like an R problem. Below the output from a current R version


> #Download the DB
> #New
> library(REDCapR)
> packageVersion("REDCapR")
[1] ‘1.2.0’
> print(R.version.string)
[1] "R version 4.4.1 (2024-06-14)"
> packageVersion("tidyverse")
[1] ‘2.0.0’
> packageVersion("tibble")
[1] ‘3.2.1’
> packageVersion("dplyr")
[1] ‘1.1.4’
> data<-redcap_read(batch_size=300, redcap_uri="http://192.168.28.111/redcap/api/", token=rstudio.HC.DB.token )$data
598 variable metadata records were read from REDCap in 0.1 seconds.  The http status code was 200.                    
The data dictionary describing 396 fields was read from REDCap in 0.1 seconds.  The http status code was 200.
19 instrument metadata records were read from REDCap in 0.1 seconds.  The http status code was 200.                   
1 rows were read from REDCap in 0.1 seconds.  The http status code was 200.                                           
0 data access groups were read from REDCap in 0.1 seconds.  The http status code was 200.                             
Error in `dplyr::select()`:
! Can't select columns that don't exist.
✖ Column `export_field_name` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.
Warning messages:
1: The following named parsers don't match the column names: NA 
2: Unknown or uninitialised column: `original_field_name`. 
> data<-redcap_read_oneshot(redcap_uri="http://192.168.28.111/redcap/api/", token=rstudio.HC.DB.token )$data
> data<-redcap_read_oneshot(redcap_uri="http://192.168.28.111/redcap/api/", token=rstudio.HC.DB.token )$data
7,873 records and 605 columns were read from REDCap in 1.4 seconds.  The http status code was 200.                    
> detach("package:REDCapR", unload = TRUE)

@echaritos
Copy link
Author

weird. I have the same package versions (with REDCapR 1.2.0) and don't see the error. The only difference is I'm running the current version of R (R 4.4.1). It sounds like you're locked into the version of R from 18 months ago (ie, 4.2.3)? I don't see why that would matter, but that's the only difference I see right now. I should have time tomorrow to debug things with you. If possible, can you try R 4.4.1 from another computer?

redcap_read() on 1.1.0 works fine. It is 1.2.0 that returns the error

@wibeasley
Copy link
Member

one last thing to try. If this fails, let's get on a call and debug it together (if you'd like).

Can you add all permissions to this user? That's the only thing I can think of that might be different from (a) you calling my server's project and (b) you calling your server's project.

@echaritos
Copy link
Author

The user permissions is pretty much maxed out. This is certainly a new issue and behaviour. It seems to be an issue with dplyr::select() when called from redcap_read() but redcap_read_oneshot() is not affected. I am on GMT+2 time

@echaritos
Copy link
Author

echaritos commented Oct 4, 2024

export_field_name is missing from metadata that REDCap sends


> redcap_uri <- "http://192.168.28.111/redcap/api/"
> token <- rstudio.HC.DB.token
> 
> 
> library(httr)
> library(readr)
> 
> # Set your REDCap API URL and token
> 
> # Export metadata
> response <- httr::POST(
+   url = redcap_uri,
+   body = list(
+     token = token,
+     content = 'metadata',
+     format = 'csv'
+   ),
+   encode = 'form'
+ )
> 
> # Read metadata into a data frame
> metadata <- read_csv(httr::content(response, as = "text"))
Rows: 396 Columns: 18                                                                                                                   
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (14): field_name, form_name, section_header, field_type, field_label, select_choices_or_calculations, field_note, text_validatio...
dbl  (1): text_validation_max
lgl  (3): question_number, matrix_group_name, matrix_ranking

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> 
> # Check if 'export_field_name' exists
> if (!"export_field_name" %in% names(metadata)) {
+   print("export_field_name is missing from metadata.")
+ } else {
+   print("export_field_name is present in metadata.")
+ }
[1] "export_field_name is missing from metadata."

@echaritos
Copy link
Author

echaritos commented Oct 4, 2024

Here are some modifications that work for me as a temporary solution:

# Load required packages
library(REDCapR)
library(dplyr)
library(tidyr)
library(tibble)
library(readr)
library(stringr)

# Replicate 'sanitize_token' function
sanitize_token <- function(token) {
  token <- stringr::str_trim(token)
  if (stringr::str_detect(token, "\\s")) {
    stop("The token (length ", nchar(token), " characters) should not have any spaces.")
  } else if (nchar(token) != 32L) {
    stop("The token (length ", nchar(token), " characters) should be exactly 32 characters.")
  }
  return(token)
}

# Replicate 'verbose_prepare' function
verbose_prepare <- function(verbose) {
  if (is.null(verbose) || is.na(verbose)) {
    verbose <- FALSE
  }
  return(verbose)
}

# Modified 'redcap_metadata_internal' function with NA handling
redcap_metadata_internal_modified <- function(
    redcap_uri,
    token,
    http_response_encoding = "UTF-8",
    locale = readr::default_locale(),
    verbose = FALSE,
    config_options = NULL,
    handle_httr = NULL
) {
  # Ensure the token is sanitized
  token <- sanitize_token(token)
  
  # Prepare verbosity settings
  verbose <- verbose_prepare(verbose)
  
  # Retrieve project metadata
  d_meta <- REDCapR::redcap_metadata_read(
    redcap_uri = redcap_uri,
    token      = token,
    verbose    = verbose,
    handle_httr = handle_httr
  )$data
  
  # Retrieve instrument metadata
  d_inst <- REDCapR::redcap_instruments(
    redcap_uri = redcap_uri,
    token      = token,
    verbose    = verbose,
    handle_httr = handle_httr
  )$data
  
  # Retrieve project information
  d_proj <- REDCapR::redcap_project_info_read(
    redcap_uri = redcap_uri,
    token      = token,
    verbose    = verbose,
    handle_httr = handle_httr
  )$data
  
  # Print d_proj to inspect its contents
  print("Contents of d_proj:")
  print(d_proj)
  
  # Retrieve data access groups (DAGs)
  d_dags <- REDCapR::redcap_dag_read(
    redcap_uri = redcap_uri,
    token      = token,
    verbose    = verbose,
    handle_httr = handle_httr
  )
  
  # Determine record ID field from metadata
  if ("field_name" %in% names(d_meta)) {
    .record_field <- d_meta$field_name[1]
  } else {
    stop("Cannot determine record ID field; 'field_name' not found in metadata.")
  }
  
  # Determine if DAGs are used
  .dags <- (nrow(d_dags$data) >= 1) || (grepl("do not have permission", d_dags$raw_text))
  
  # Possible plumbing variables
  .plumbing_possibles <- c(.record_field, "redcap_event_name", "redcap_repeat_instrument", "redcap_repeat_instance")
  
  # Check decimal mark in locale
  decimal_period <- (locale$decimal_mark == ".")
  decimal_comma <- (locale$decimal_mark == ",")
  
  # Prepare instrument data
  d_inst <- d_inst %>%
    dplyr::rename(form_name = "instrument_name") %>%
    dplyr::mutate(form_order = dplyr::row_number())
  
  # Prepare completion status fields
  d_complete <- d_inst %>%
    dplyr::mutate(
      field_name = paste0(form_name, "_complete"),
      field_name_base = field_name,
      field_type = "complete",
      vt = NA_character_
    ) %>%
    dplyr::select(field_name, field_name_base, form_name, field_type, vt)
  
  # Initialize additional variables
  d_again <- tibble::tibble(
    field_name = character(0),
    field_name_base = character(0),
    form_name = character(0),
    field_type = character(0),
    vt = character(0)
  )
  
  # Handle longitudinal and repeating instruments
  is_longitudinal <- d_proj$is_longitudinal[1]
  if (is.na(is_longitudinal) || !is.logical(is_longitudinal)) {
    is_longitudinal <- FALSE
  }
  
  if (is_longitudinal) {
    d_again <- d_again %>%
      dplyr::add_row(
        field_name = "redcap_event_name",
        field_name_base = "redcap_event_name",
        form_name = "longitudinal/repeating",
        field_type = "event_name",
        vt = NA_character_
      )
  }
  
  has_repeating <- d_proj$has_repeating_instruments_or_events[1]
  if (is.na(has_repeating) || !is.logical(has_repeating)) {
    has_repeating <- FALSE
  }
  
  if (has_repeating) {
    d_again <- d_again %>%
      dplyr::add_row(
        field_name = c("redcap_repeat_instrument", "redcap_repeat_instance"),
        field_name_base = c("redcap_repeat_instrument", "redcap_repeat_instance"),
        form_name = "longitudinal/repeating",
        field_type = c("repeat_instrument", "repeat_instance"),
        vt = NA_character_
      )
  }
  
  # Prepare metadata
  d_meta <- d_meta %>%
    dplyr::rename(
      field_name_base = "field_name",
      vt = "text_validation_type_or_show_slider_number"
    ) %>%
    dplyr::filter(field_type != "descriptive") %>%
    dplyr::mutate(
      field_name = field_name_base
    ) %>%
    dplyr::select(
      field_name,
      field_name_base,
      form_name,
      field_type,
      vt
    ) %>%
    dplyr::bind_rows(d_complete) %>%
    dplyr::left_join(d_inst, by = "form_name") %>%
    dplyr::group_by(form_name) %>%
    dplyr::mutate(field_order_within_form = dplyr::row_number()) %>%
    dplyr::ungroup() %>%
    dplyr::arrange(form_order, field_order_within_form) %>%
    dplyr::select(-form_order, -field_order_within_form) %>%
    dplyr::bind_rows(d_again) %>%
    dplyr::mutate(plumbing = field_name %in% .plumbing_possibles)
  
  # Prepare final variable data
  d <- d_meta %>%
    dplyr::mutate(
      dags = .dags & (field_name == .record_field),
      autonumber = d_proj$record_autonumbering_enabled[1] & (field_name == .record_field)
    ) %>%
    # Determine readr column types based on field types and validation
    dplyr::mutate(
      readr_col_type = dplyr::case_when(
        dags ~ "col_character()",
        autonumber & !dags ~ "col_integer()",
        field_type == "event_name" ~ "col_character()",
        field_type == "repeat_instrument" ~ "col_character()",
        field_type == "repeat_instance" ~ "col_integer()",
        field_type == "complete" ~ "col_integer()",
        field_type == "truefalse" ~ "col_logical()",
        field_type == "yesno" ~ "col_logical()",
        field_type == "checkbox" ~ "col_logical()",
        field_type == "radio" ~ "col_character()",
        field_type == "dropdown" ~ "col_character()",
        field_type == "file" ~ "col_character()",
        field_type == "notes" ~ "col_character()",
        field_type == "slider" ~ "col_integer()",
        field_type == "calc" ~ "col_character()",
        field_type == "sql" ~ "col_character()",
        field_type == "text" & is.na(vt) ~ "col_character()",
        field_type == "text" & vt == "" ~ "col_character()",
        vt == "integer" ~ "col_integer()",
        vt == "number" & decimal_period ~ "col_double()",
        vt == "number" & !decimal_period ~ "col_character()",
        vt %in% c("date_ymd", "date_mdy", "date_dmy") ~ "col_date()",
        vt %in% c("datetime_ymd", "datetime_mdy", "datetime_dmy") ~ "col_datetime()",
        vt %in% c("datetime_seconds_ymd", "datetime_seconds_mdy", "datetime_seconds_dmy") ~ "col_datetime()",
        TRUE ~ "col_character()"
      )
    ) %>%
    dplyr::select(
      field_name,
      form_name,
      field_type,
      validation_type = vt,
      autonumber,
      readr_col_type,
      field_name_base,
      plumbing
    )
  
  # Determine plumbing variables
  .plumbing_variables <- intersect(d$field_name, .plumbing_possibles)
  
  # Return the list with variable data and project info
  list(
    d_variable = d,
    success = TRUE,
    longitudinal = is_longitudinal,
    repeating = has_repeating,
    record_id_name = .record_field,
    plumbing_variables = .plumbing_variables
  )
}









# Overwrite the original function in the REDCapR package namespace
assignInNamespace(
  "redcap_metadata_internal",
  redcap_metadata_internal_modified,
  ns = "REDCapR"
)

like


result <- redcap_read(batch_size = 1500,
+   redcap_uri = redcap_uri,
+   token      = token
+ )

The data dictionary describing 396 fields was read from REDCap in 0.1 seconds.  The http status code was 200.
19 instrument metadata records were read from REDCap in 0.1 seconds.  The http status code was 200.                                     
1 rows were read from REDCap in 0.1 seconds.  The http status code was 200.                                                             
[1] "Contents of d_proj:"
# A tibble: 1 × 27
  project_id;project_title…¹ project_id project_title creation_time production_time in_production project_language purpose purpose_other
  <chr>                      <lgl>      <lgl>         <lgl>         <lgl>           <lgl>         <lgl>            <lgl>   <lgl>        
1 "13;\"Statistik\";\"20… NA         NA            NA            NA              NA            NA               NA      NA           
# ℹ abbreviated name:
#   ¹​`project_id;project_title;creation_time;production_time;in_production;project_language;purpose;purpose_other;project_notes;custom_record_label;secondary_unique_field;is_longitudinal;has_repeating_instruments_or_events;surveys_enabled;scheduling_enabled;record_autonumbering_enabled;randomization_enabled;ddp_enabled;project_irb_number;project_grant_number;project_pi_firstname;project_pi_lastname;display_today_now_button;missing_data_codes;external_modules;bypass_branching_erase_field_prompt`
# ℹ 18 more variables: project_notes <lgl>, custom_record_label <lgl>, secondary_unique_field <lgl>, is_longitudinal <lgl>,
#   has_repeating_instruments_or_events <lgl>, surveys_enabled <lgl>, scheduling_enabled <lgl>, record_autonumbering_enabled <lgl>,
#   randomization_enabled <lgl>, ddp_enabled <lgl>, project_irb_number <lgl>, project_grant_number <lgl>, project_pi_firstname <lgl>,
#   project_pi_lastname <lgl>, display_today_now_button <lgl>, missing_data_codes <lgl>, external_modules <lgl>,
#   bypass_branching_erase_field_prompt <lgl>

0 data access groups were read from REDCap in 0.1 seconds.  The http status code was 200.                                               
7,875 records and 1 columns were read from REDCap in 0.4 seconds.  The http status code was 200.                                        
Starting to read 7,875 records  at 2024-10-04 11:48:03.500117.

Reading batch 1 of 6, with subjects 9 through 1531 (ie, 1500 unique subject records).
1,500 records and 605 columns were read from REDCap in 0.9 seconds.  The http status code was 200.                                      

Reading batch 2 of 6, with subjects 1532 through 3037 (ie, 1500 unique subject records).
1,500 records and 605 columns were read from REDCap in 1.3 seconds.  The http status code was 200.                                      

Reading batch 3 of 6, with subjects 3038 through 4549 (ie, 1500 unique subject records).
1,500 records and 605 columns were read from REDCap in 1.2 seconds.  The http status code was 200.                                      

Reading batch 4 of 6, with subjects 4550 through 6051 (ie, 1500 unique subject records).
1,500 records and 605 columns were read from REDCap in 1.2 seconds.  The http status code was 200.                                      

Reading batch 5 of 6, with subjects 6052 through 7563 (ie, 1500 unique subject records).
1,500 records and 605 columns were read from REDCap in 1.1 seconds.  The http status code was 200.                                      

Reading batch 6 of 6, with subjects 7564 through 7938 (ie, 375 unique subject records).
375 records and 605 columns were read from REDCap in 0.3 seconds.  The http status code was 200.                                        

Warning message:
The following named parsers don't match the column names: NA 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants