You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've encountered the "Something is wrong; all the Accuracy metric values are missing:" error when one of my columns is a factor. However, the help function did not specify any limitation for the data structure. I think I managed to get around it, but I just wanted to understand if the "workaround" is not changing the data analysis by any means.
If I run the train function using x_train and y_target as inputs, I get the error:
elaNet_model <- caret::train(x_train, y_target, method = "glmnet")
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :9 NA's :9
Error: Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)
However, I saw somewhere else that using the formula input could solve the problem, so I tested it:
I am testing a bunch of models, and I see the same pattern with "xgbTree", "svmLinear", "svmRadial", and "nb".
Therefore, it seems to me that at some point, Caret is changing the data structure when using the formula input. Is it the case? Or does it simply mean that Caret can only handle factors properly when using the formula input?
I considered changing all factors into numeric before submitting data to Caret, but is this appropriate? Considering that most of the remaining data are actually numeric, how is Caret going to deal with a numeric (formerly factor) that is made of only 1 and 0? Isn't Caret going to mistakenly interpret the formerly factor features?
Please, let me know if my questions sound too confusing, and thank you in advance.
I guess I found the answer to my question. I am sharing my finding with anyone who might encounter the same problem.
It does seem that the formula input ends up with the factor being transformed to numeric. I concluded this from the train.default function (link). Here is part of the function:
x <- model.matrix(Terms, m, contrasts)
cons <- attr(x, "contrast")
int_flag <- grepl("(Intercept)", colnames(x))
if (any(int_flag)) x <- x[, !int_flag, drop = FALSE]
w <- as.vector(model.weights(m))
y <- model.response(m)
res <- train(x, y, weights = w, ...)
After removal of the "(Intercept)", the "x" object contains only the features and is later used by train. In short, the formula input just means there will be a few additional steps, but at the end, it will use the "x=" and "y=" type of input. When calling str() on "x", it describes the object as a matrix, hence, only numeric variables.
Browsing on other issues, I ended up finding this explanation by the owner of the Caret repository:
However, there are a variety of package functions whose models do not require that all of the predictors be encoded as numbers. Trees, rule-based models, naive Bayes, and others fall into this bucket.
So, if you want to keep factors as factors, use the non-formula method for train
A factor in the training data will only lead to errors if the corresponding method cannot handle factors. This seems very obvious now that I am writing it, but Caret is such an awesome and complete package that I ended up overlooking this detail.
Personally, as I am testing a bunch of methods, I will keep two versions of the training and testing datasets, one with factors and the other only numeric, and will use them according to each method's requirements.
I am keeping this open just in case any contributors want to chime in. Otherwise, feel free to close it.
Hi,
I've encountered the "Something is wrong; all the Accuracy metric values are missing:" error when one of my columns is a factor. However, the help function did not specify any limitation for the data structure. I think I managed to get around it, but I just wanted to understand if the "workaround" is not changing the data analysis by any means.
Here is a toy data set:
If I run the train function using x_train and y_target as inputs, I get the error:
However, I saw somewhere else that using the formula input could solve the problem, so I tested it:
Indeed no error message is displayed, and I get the full model. The factor feature is still a factor:
If I transform the factor column into numeric, I get no error as well:
I am testing a bunch of models, and I see the same pattern with "xgbTree", "svmLinear", "svmRadial", and "nb".
Therefore, it seems to me that at some point, Caret is changing the data structure when using the formula input. Is it the case? Or does it simply mean that Caret can only handle factors properly when using the formula input?
I considered changing all factors into numeric before submitting data to Caret, but is this appropriate? Considering that most of the remaining data are actually numeric, how is Caret going to deal with a numeric (formerly factor) that is made of only 1 and 0? Isn't Caret going to mistakenly interpret the formerly factor features?
Please, let me know if my questions sound too confusing, and thank you in advance.
The text was updated successfully, but these errors were encountered: