You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some libraries like LightGBM are well integrated with pandas categorical
types.
I could not find a nice implementation to encode categorical features as pandas
categorical columns while preserving the categories across different datasets. I would like to
propose the addition of a PandasCategoricalEncoder to the feature_engine library to
address this issue.
Is your feature request related to a problem? Please describe.
Yes, I often encounter issues when working with categorical data in pandas. The current
methods do not ensure consistent encoding across different datasets, leading to
potential errors.
Describe the solution you'd like
I would like to implement the PandasCategoricalEncoder class, which will transform
categorical features into pandas categorical types. This encoder will ensure that
categories are encoded consistently between training and testing datasets, and it will
handle unseen categories gracefully based on specified parameters.
Describe alternatives you've considered
I have considered using existing categorical encoding libraries, but they do not provide
such feature.
Additional context
The PandasCategoricalEncoder will include features such as handling missing values,
allowing for flexible unseen category management, and providing methods for inverse
transformation to retrieve original values. This will enhance the usability and
reliability of categorical data processing in pandas.
The text was updated successfully, but these errors were encountered:
Our encoders try to handle pandas categorical variables within their functionality. They should be able to take variables that are of type object and of type categorical simultaneously, and for those that are of type categorical, we do have some functionality to make it work (i.e., add categories from the train set to the test set to ensure compatibility).
Did you test our encoders with a dataset where these did not work?
This encoder is very similar to OrdinalEncoder indeed, however from what I understood it is not possible to get pandas categorical columns as output of OrdinalEncoder, right? The other way around is definetly possible with OrdinalEncoder.
I tried to add this feature as an extra option for OrdinalEncoder, but since it shares the actual encoding part with a lot of other encoders in the library I felt is would have been easier to create a new class.
Some libraries like LightGBM are well integrated with pandas categorical
types.
I could not find a nice implementation to encode categorical features as pandas
categorical columns while preserving the categories across different datasets. I would like to
propose the addition of a
PandasCategoricalEncoder
to thefeature_engine
library toaddress this issue.
Is your feature request related to a problem? Please describe.
Yes, I often encounter issues when working with categorical data in pandas. The current
methods do not ensure consistent encoding across different datasets, leading to
potential errors.
Describe the solution you'd like
I would like to implement the
PandasCategoricalEncoder
class, which will transformcategorical features into pandas categorical types. This encoder will ensure that
categories are encoded consistently between training and testing datasets, and it will
handle unseen categories gracefully based on specified parameters.
Describe alternatives you've considered
I have considered using existing categorical encoding libraries, but they do not provide
such feature.
Additional context
The
PandasCategoricalEncoder
will include features such as handling missing values,allowing for flexible unseen category management, and providing methods for inverse
transformation to retrieve original values. This will enhance the usability and
reliability of categorical data processing in pandas.
The text was updated successfully, but these errors were encountered: