Feature Request: Add `PandasCategoricalEncoder` to encode categorical features as pandas categorical #828

ClaudioSalvatoreArcidiacono · 2024-12-03T12:58:35Z

Some libraries like LightGBM are well integrated with pandas categorical
types.
I could not find a nice implementation to encode categorical features as pandas
categorical columns while preserving the categories across different datasets. I would like to
propose the addition of a PandasCategoricalEncoder to the feature_engine library to
address this issue.

Is your feature request related to a problem? Please describe.
Yes, I often encounter issues when working with categorical data in pandas. The current
methods do not ensure consistent encoding across different datasets, leading to
potential errors.

Describe the solution you'd like
I would like to implement the PandasCategoricalEncoder class, which will transform
categorical features into pandas categorical types. This encoder will ensure that
categories are encoded consistently between training and testing datasets, and it will
handle unseen categories gracefully based on specified parameters.

Describe alternatives you've considered
I have considered using existing categorical encoding libraries, but they do not provide
such feature.

Additional context
The PandasCategoricalEncoder will include features such as handling missing values,
allowing for flexible unseen category management, and providing methods for inverse
transformation to retrieve original values. This will enhance the usability and
reliability of categorical data processing in pandas.

The text was updated successfully, but these errors were encountered:

solegalli · 2024-12-05T11:19:01Z

Hi @ClaudioSalvatoreArcidiacono

Our encoders try to handle pandas categorical variables within their functionality. They should be able to take variables that are of type object and of type categorical simultaneously, and for those that are of type categorical, we do have some functionality to make it work (i.e., add categories from the train set to the test set to ensure compatibility).

Did you test our encoders with a dataset where these did not work?

ClaudioSalvatoreArcidiacono · 2024-12-05T13:27:17Z

Hey @solegalli,

This encoder is very similar to OrdinalEncoder indeed, however from what I understood it is not possible to get pandas categorical columns as output of OrdinalEncoder, right? The other way around is definetly possible with OrdinalEncoder.

I tried to add this feature as an extra option for OrdinalEncoder, but since it shares the actual encoding part with a lot of other encoders in the library I felt is would have been easier to create a new class.

ClaudioSalvatoreArcidiacono linked a pull request Dec 3, 2024 that will close this issue

Add PandasCategoricalEncoder #829

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add `PandasCategoricalEncoder` to encode categorical features as pandas categorical #828

Feature Request: Add `PandasCategoricalEncoder` to encode categorical features as pandas categorical #828

ClaudioSalvatoreArcidiacono commented Dec 3, 2024

solegalli commented Dec 5, 2024

ClaudioSalvatoreArcidiacono commented Dec 5, 2024 •

edited

Loading

Feature Request: Add PandasCategoricalEncoder to encode categorical features as pandas categorical #828

Feature Request: Add PandasCategoricalEncoder to encode categorical features as pandas categorical #828

Comments

ClaudioSalvatoreArcidiacono commented Dec 3, 2024

solegalli commented Dec 5, 2024

ClaudioSalvatoreArcidiacono commented Dec 5, 2024 • edited Loading

Feature Request: Add `PandasCategoricalEncoder` to encode categorical features as pandas categorical #828

Feature Request: Add `PandasCategoricalEncoder` to encode categorical features as pandas categorical #828

ClaudioSalvatoreArcidiacono commented Dec 5, 2024 •

edited

Loading