-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misleading errors when reading sparse data without a record_id column #471
Comments
Yes, I think you're right, but I'm not seeing anything pop when I quickly skim the release notes since v8). I feel it happened about the same time that the record_id would be hashed if the user didn't have PHI rights. There's always a chance that I'm wrong and REDCap is returning the field but REDCapR is not. I can look deeper if it matters. Either way, I'm leaning in favor of letting the user decide to omit the variable (even if it's more frequently an accident than intentional). Do you have a strong preference to always include it? Are you still in favor of just adding another bullet in the error message quoted above --it sounded like that was your first reaction. |
I would honor what the API does. If it omits the record_id, when the caller does not name it, honor that. As for what action to take, I think you should replace the hard stop with a warning. To the warning text, add a bullet that says something like, "...or perhaps the fields and forms you requested are sparse" and also change the hard stop to a warning. If REDCapR's assessment of the situation could be wrong, the hard stop causes problems with a non-obvious fix. I found the hard stop annoying more than anything. I knew very well the form I was querying was sparse and REDCapR wouldn't let me run my query. If I could have seen the result, I could assess that sparseness, decide if the error messages were accurate or not, note that the record_id column was missing from |
I think the real fix is to unambiguously determine the name of This would prevent the guessing of the Lines 448 to 449 in 18b9663
|
FWIW, I just migrated to a new Mac where I did a fresh install of R, RStudio, and REDCapR, and I am seeing this exact issue, whereas it's not happening on my older Mac (presumably because I haven't updated the REDCapR package). I troubleshooted myself and came to the same conclusion, that you need to include the name of the record_id field and include it in the list of fields you request, otherwise you get the misleading error posted in the initial issue. I do also have one project where record_id has been renamed to something else, so that threw me for a loop for a bit. I have no real preference on whether the record_id field should always be included, though I don't think it hurts to always include it since it's the de facto id field for records in that project. But the user of REDCapR should not be required to request this field to get any data at all, as I don't think that's clear from the documentation (which makes sense since this is a recent change in behavior). |
FYI I just checked on my other computers: this issue does not occur with REDCapR package version 1.0.0. It does occur with version 1.1.0. I used |
@wibeasley, the record_id column is always the first row in the metadata. Don't you always query the metadata? If so, you would have the name of the record_id field name before you queried for data. If that feels too hacky, you could write your own method to read the record_id column from the metadata, then reference that method to get the record_id column name. Then you have the easy option to revise the method when a better, REDCap-official way is released. |
Is it worth trying to figure out why version 1.0.0 of the REDCapR package still works in the previous fashion, without requiring you to include the record_id field name in the list of fields to query? |
I'll have time tomorrow to look at this closely. I remember starting on this issue at least. The https://ouhscbbmc.github.io/REDCapR/reference/redcap_read.html @simX, in the meantime, can you try the version on GitHub and see if it does what you want? Install with |
@wibeasley, yes, the |
@simX, I think this has been fixed for a while with the main version on GitHub. Here's how the two read functions operate. Is this a problem for your scenario? uri <- "https://bbmc.ouhsc.edu/redcap/api/"
token <- "22C3FF1C8B08899FB6F86D91D874A159" # pid 3181
REDCapR::redcap_read_oneshot(
redcap_uri = uri,
token = token,
forms = "blood_pressure",
verbose = FALSE
)$data
#> # A tibble: 6 × 3
#> sbp dbp blood_pressure_complete
#> <dbl> <dbl> <dbl>
#> 1 1.1 11.1 2
#> 2 1.2 11.2 2
#> 3 1.3 11.3 2
#> 4 2.1 22.1 2
#> 5 2.2 22.2 2
#> 6 2.3 22.3 2
REDCapR::redcap_read(
redcap_uri = uri,
token = token,
forms = "blood_pressure",
verbose = FALSE
)$data
#> # A tibble: 8 × 6
#> record_id redcap_repeat_instrument redcap_repeat_instance sbp dbp
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 <NA> NA NA NA
#> 2 1 blood_pressure 1 1.1 11.1
#> 3 1 blood_pressure 2 1.2 11.2
#> 4 1 blood_pressure 3 1.3 11.3
#> 5 2 <NA> NA NA NA
#> 6 2 blood_pressure 1 2.1 22.1
#> 7 2 blood_pressure 2 2.2 22.2
#> 8 2 blood_pressure 3 2.3 22.3
#> # ℹ 1 more variable: blood_pressure_complete <dbl> Created on 2023-07-09 with reprex v2.0.2 |
I’ll take a look in a couple days with the version from GitHub. It is possible it’s already fixed, I didn’t realize there were further changes beyond version 1.1.0. |
Looks like the GitHub version works properly; it doesn't require including the record_id field in the list of fields to retrieve anymore. I installed the |
@simX, sweet. Glad to hear it's working. (In the future, I think you need to restart just R within RStudio, not restart all of RStudio.) @pbchase, I think the modified behavior (since Feb) satisfies your first two posts in this issue. But I'd like to hear it from you (that I'm not overlooking something) before I close the issue. In short, *crap, the message from commit ab4c58a should be "verify |
I am seeing odd behavior from
redcap_read
when reading a single form of data. This is with REDCap 1.1.0, R 4.2.2, REDCap 12.4.2. If I read a sparsely-filled form--a form that does not have a row of data for every person described by the record_id column--I get this confusing error message:All of the suggestions are misleading. The real fix is to add the
record_id
column (ptid
in my case) e.g.,Whereas I say the error message is misleading, I'm not sure adjusting it is the fix. I feel like we used to always get the
record_id
column even if we never asked for it. That makes sense. Reading data without its primary key is silly unless you are anonymizing data. That hardly seems like a good default. Did that change? Was it supposed to?The text was updated successfully, but these errors were encountered: