How does Khiops handle target value grouping, and why is it useful for classification tasks? #509
-
This discussion is based on a question received via our contact form: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Khiops’ target value grouping functionality is designed to address challenges when dealing with a large number of target classes. When the number of classes is high, it can be difficult to discriminate between all classes, especially with insufficient number of instances (i.e. with sparse input data). To address this, Khiops reduces data sparsity by grouping target classes. In practice:
Example: Grouped Target Probabilities In the following example, the explanatory variable x has two distinct values {v1, v2}, and the target variable y has eight classes {A, B, C, D, E, F, G, H}, which are clustered into three groups during preprocessing. For x = v1:
For x = v2:
In this example, the classifier identifies that distinguishing between groups ({A, B}, {C, D, E, F}, {G, H}) is feasible, but it cannot reliably separate classes within the same group during univariate preparation. Because this grouping is done independently for each explanatory variable, a different grouping could occur for another explanatory variable z. For example: For z = v0:
The SNB predictor later uses this variable-specific grouped information from all explanatory variables to make precise, individual class predictions. Additional Insights In some cases, it may be beneficial to globally reduce the number of target classes for the entire problem. The univariate preparation reports generated by Khiops can help data miners identify frequently occurring groupings, guiding decisions about merging or removing target classes altogether. |
Beta Was this translation helpful? Give feedback.
Khiops’ target value grouping functionality is designed to address challenges when dealing with a large number of target classes. When the number of classes is high, it can be difficult to discriminate between all classes, especially with insufficient number of instances (i.e. with sparse input data).
To address this, Khiops reduces data sparsity by grouping target classes. In practice: