-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measures that rely on Tasks do not work for Pipelines #13
Comments
Hey, I had a brief talk with Bernd about this today. What we understood is the following:
In general, on an abstract level, what the cluster measure should measure is with respect to the original data and not some processed version? If I tune against a cluster measure and I get to measure with respect to the pre-processed data, I can pre-process the data such that the metric is optimal (e.g. by just dropping all variables or something). Pinging for comment here @damirpolat @henrifnk |
Thank you for the thoughts @pfistfl. In my opinion, it should be up to the user how the metric should be calculated.
Please have a look at the PR I made yesterday. |
I think @henrifnk is right. Here is another example. In supervised learning, performance measures that can be "extracted" from the fitted model should match with the ones computed from the "outside" via task = tsk("boston_housing")
l1 = lrn("regr.lm")
l1$train(task)
mean(l1$model$residuals^2) # extract MSE from the model (residuals)
p1 = l1$predict(task)
p1$score(msr("regr.mse")) # computing MSE from "outside" gives the same value The same thing can be done with a pipeline: task = tsk("boston_housing")
pscale = po("scale")
l2 = pscale %>>% lrn("regr.lm")
l2$train(task)
mean(l2$pipeops$regr.lm$state$model$residuals^2) # extract mse from the model
p2 = l2$predict(task)
p2$regr.lm.output$score(msr("regr.mse")) # computing mse from "outside" gives the same value I would expect the same behavior for clustering tasks, i.e., measures that can be extracted from the cluster model should be the same as the ones that are computed from the "outside". @pfistfl would you agree here? task = tsk("usarrests")
l1 = lrn("clust.kmeans", centers = 2)
l1$train(task)
l1$model$tot.withinss # extract wss from the model
p1 = l1$predict(task)
p1$score(msr("clust.wss"), task = task) # computing wss from "outside" gives the same value
pscale = po("scale")
l2 = pscale %>>% lrn("clust.kmeans", centers = 2)
l2$train(task)
p2 = l2$predict(task)
l2$pipeops$clust.kmeans$state$model$tot.withinss
p2$clust.kmeans.output$score(msr("clust.wss"), task) # computing wss from "outside" is not the same
# you have to do this to fix it and obtain the same wss value as the one that can be extracted from the model
p2$clust.kmeans.output$score(msr("clust.wss"), task = pscale$train(list(task))$output) Obviously, the "fix" in the last line where we pass the scaled task does not work if you benchmark multiple learners. |
I agree with @henrifnk. If I were to scale data and do clustering, I would expect measures to be applied to the preprocessed data since that's what cluster analysis was done on. |
I am happy that we disagree here since this gives us the possibility to flesh things out. To reduce confusion I am trying to re-state the discussion quickly. Given a graph such as: <<HERE>> po("scale") %>>% ... %>>% <<THERE>> po(lrn("clust.kmeans")) The open question is at which point we want to compute cluster measures, @henrifnk stated
This is exactly my problem. We would like to ensure that any data that is passed to the measure has the same scale.
po("scale") %>>% po(flt("anova")) %>>% po(lrn("clust.kmeans")) and measure using the preprocessed task.
My general argument is the following:By allowing transformations for the measure, we allow the pipeline to change the goal post (the values measured by our clustering metric) . And if an agent (our pipeline) can move it's own goal post (e.g. through tuning), it will often not become better but instead, just move the goal towards something that is easier to solve (by simply ignoring conflicting information). The analogy is the cleaning robot that learned to put a bucket on its head so it does not see any dirt. Can not see any dirt -> problem solved! With respect to @henrifnk 's other comments:
@giuseppec I get your problem but in your case, we look at the target variable which is mostly unchanged throughout the Pipeline. I think my suggestion is not optimal BUT it avoids falling into the traps mentioned above. What we instead should have:Each metric should know IF it is sensible to scaling / can deal with NA's etc. And it should then treat it's input accordingly (i.e. by re-scaling). |
Finally, I think a really understand you point, thank you for carrying that out :). Option 1: Have a stable pipeline for cluster measures:Independent from the pipeline of a given cluster learner, there will always be the same mechanism that preprocesses the task data that determine the scoring of a certain cluster measure.
The pipe operators within that pipeline must be somewhat smart to the task and their measure, such, that they can decide whether it is really necessary to call on them. Option 2: Mirror the preprocessed task from pipeline-learnerThe measure is calculated by the same task, as the learner was calculated on, by default.
Additionally, an optional
Addition: This might be supplemented by a warning if measures are calculated on tasks whre features have a different scale or similar issues... Let me briefly point out 2 scenarios where your approach would be problematic: Scenario 1:lrn(kmeans.clust)$train(tsk('usarrest')) The user is training a scale sensitive learner on a task with differently scaled features. Option 1: Measures from prediction would be magically scaled now and the user wouldn't notice his faulty design... **Option 2: ** Results would be biased be the features with higher scale, but (!) also clusters made by the learner are biased from that problem... Scenario 2The user reads in very raw data that are no even in shape for the learner to use them (e.g. images etc...). She/He wants to use mlr3 now. Option 1: Not working. The user could do predictions but couldn't calculate measures, as the pre-given pipeline is not able to shape the raw data. Imagine him seeing this error, that the measure is not able to handle the data. This will probably be contraintuitive and confusing... **Option2:" No problmes... To be honest, to me, the second option is still way more attractive as is gives the user any freedom to calculate the measure on any data that might make sense in a certain situation! |
I thought about it again recently. My opinion: measures should be calculated on the same data on which clustering was done. I can see @pfistfl's argument about moving the goalpost but at the same time I think users should be the ones that are responsible for ensuring that their pipeline makes sense for their task. Also, I would image this could become a problem if there was an automated way for tuning pipelines that takes into the account preprocessing ops. But does mlr3 do that now? We could deal with that later when it comes up. |
Simple Example:
will throw an output like
The output from wss is obviously too high to be scaled.
The problem can be found in
MeasureClustInternal
that takes the "raw" task without any preprocessing to calculate the features.I think, this is probably only an issue that
mlr3cluster
suffers from, as all other Measures are only dependent on the predictions ...?mlr3cluster/R/MeasureClustInternal.R
Lines 22 to 30 in 23b3bef
This could be avioded if there is any generic access to the preprocessed task in the pipeline.
In this case, one could exchange the taske in the function by the learner itself.
The problem is, if I enter the state of a trained pipeline, stored preprocessed Tasks are empty...
The text was updated successfully, but these errors were encountered: