Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support to sklearn TargetEncoder #1137

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

boccaff
Copy link

@boccaff boccaff commented Nov 7, 2024

This PR implements a converter and a shape calculator for the TargetEncoder class introduced in Scikit-learn 1.5. The code follows much of the implementation of the converter for Ordinal Encoder.

A partial suit of tests is already implemented, but there is at least a couple of additional tests that I would like to add (missing values and using the smooth parameter from sklearn, even though I think it shouldn't matter).

@xadupre
Copy link
Collaborator

xadupre commented Nov 14, 2024

Thanks for the contribution. One line should be removed. Everything else looks good.

skl2onnx/shape_calculators/target_encoder.py Fixed Show resolved Hide resolved
[("input", StringTensorType([None, X.shape[1]]))],
target_opset=TARGET_OPSET,
)
self.assertTrue(model_onnx is not None)

Check notice

Code scanning / CodeQL

Imprecise assert Note test

assertTrue(a is not b) cannot provide an informative message. Using assertIsNot(a, b) instead will give more informative messages.
target_opset=TARGET_OPSET,
)
self.assertTrue(model_onnx is not None)
self.assertTrue(model_onnx.graph.node is not None)

Check notice

Code scanning / CodeQL

Imprecise assert Note test

assertTrue(a is not b) cannot provide an informative message. Using assertIsNot(a, b) instead will give more informative messages.
[("input", Int64TensorType([None, X.shape[1]]))],
target_opset=TARGET_OPSET,
)
self.assertTrue(model_onnx is not None)

Check notice

Code scanning / CodeQL

Imprecise assert Note test

assertTrue(a is not b) cannot provide an informative message. Using assertIsNot(a, b) instead will give more informative messages.
target_opset=TARGET_OPSET,
)
self.assertTrue(model_onnx is not None)
self.assertTrue(model_onnx.graph.node is not None)

Check notice

Code scanning / CodeQL

Imprecise assert Note test

assertTrue(a is not b) cannot provide an informative message. Using assertIsNot(a, b) instead will give more informative messages.
model_onnx = convert_sklearn(
model, "ordinal encoder two string cats", inputs, target_opset=TARGET_OPSET
)
self.assertTrue(model_onnx is not None)

Check notice

Code scanning / CodeQL

Imprecise assert Note test

assertTrue(a is not b) cannot provide an informative message. Using assertIsNot(a, b) instead will give more informative messages.
tests/test_sklearn_target_encoder_converter.py Fixed Show resolved Hide resolved
@boccaff
Copy link
Author

boccaff commented Nov 15, 2024

Thanks for the comments @xadupre. I've removed the line, and solved a couple of the CodeQL suggestions (removed an unused import and an unused variable). The rest of the CodeQL suggested changes would diverge from the other implementations. For the .assertTrue I can just follow the suggestion, but it would diverge from other tests. Is it ok? For the except: pass, maybe we can add a warning (example above on the respective CodeQL comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants