You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the benefits of using a registry is the ability to register specific versions of modules by providing keyword arguments (kwargs) directly in the __init__.pyregister() call. As a simple example, we can imagine a MinMaxScaler class will need to know things like axis or feature_range. Since we frequently want to scale between -1 and 1, we may want to register a version of this scaler as id=minus_one_one_scaler where feature_range = [-1,1].
What happens when a user calls make('minus_one_one_scaler', **kwargs)? Well, it depends on how the arguments are passed to the MinMaxScaler. Let's assume that the __init__() function is defined as:
def __init__(self, config=None):
config = {} if config is None
self.config = self.get_default_config()
self.config.update(config)
...
The way our registry code is written, we will register the new scaler as: register(id='minus_one_one_scaler', entry_point='<path_to_MinMaxScaler>', kwargs={'config': {'feature_range'}})
This works well if we are ok with the default behavior for all other configuration parameters, but what happens if I call make('minus_one_one_scaler', config={'axis': None}?
Currently, this will return an object where config = {'axis': None}. Therefore, our minus_one_one_scaler will scale by the default feature_range, which may not be [-1,1]! This is caused by the line _kwargs=_kwargs.update(kwargs) in registration.py which will update the full config dictionary.
There are many possible ways to change this behavior:
Add a registry_config argument to the module __init__() to be used by the registry. Then the module itself can handle any logic to update its config parameters. This is a nice, simple solution, but requires every module we write to include this logic in the __init__().
Add a check in the registry code for a config argument in kwargs during the make() call, and update that dictionary separately. This has the benefit of being transparent to the module (so no extra module code is needed), but means we would have to agree on the key name (or a list of them) that we want to be handled in this special way.
Implement the module __init__() by splitting up config into its associated keys. i.e. def __init__(self, axis=0, feature_range=None, ...). Since these keys are now individual kwargs, they will get appropriately updated by the registry. This has the benefit of no changes to the registry. However, this would mean you couldn't call make('minmax_scaler', config=scaler_config). Instead, your code would either need to be something like make('minmax_scaler', feature_range=[0,1], axis=None) or make('minmax_scaler', **scaler_config). The kwargs passed to register() would also need to be separated.
There are probably other ways to change this as well, but these were the top 3 ideas I could think of after discussing with @dlersch, @Kishanrajput, and @ahmedmohammed107. I am starting to lean towards using idea 3, as it feels somewhat natural to me, but I'm open to anything.
One thing to note: None of these ideas will recursively update the configuration, so any configuration arguments that are dictionaries will still need to be passed in their entirety.
The text was updated successfully, but these errors were encountered:
One of the benefits of using a registry is the ability to register specific versions of modules by providing keyword arguments (kwargs) directly in the
__init__.py
register()
call. As a simple example, we can imagine aMinMaxScaler
class will need to know things likeaxis
orfeature_range
. Since we frequently want to scale between -1 and 1, we may want to register a version of this scaler asid=minus_one_one_scaler
wherefeature_range = [-1,1]
.What happens when a user calls
make('minus_one_one_scaler', **kwargs)
? Well, it depends on how the arguments are passed to theMinMaxScaler
. Let's assume that the__init__()
function is defined as:The way our registry code is written, we will register the new scaler as:
register(id='minus_one_one_scaler', entry_point='<path_to_MinMaxScaler>', kwargs={'config': {'feature_range'}})
This works well if we are ok with the default behavior for all other configuration parameters, but what happens if I call
make('minus_one_one_scaler', config={'axis': None}
?Currently, this will return an object where
config = {'axis': None}
. Therefore, ourminus_one_one_scaler
will scale by the defaultfeature_range
, which may not be[-1,1]
! This is caused by the line_kwargs=_kwargs.update(kwargs)
inregistration.py
which will update the fullconfig
dictionary.There are many possible ways to change this behavior:
registry_config
argument to the module__init__()
to be used by the registry. Then the module itself can handle any logic to update its config parameters. This is a nice, simple solution, but requires every module we write to include this logic in the__init__()
.config
argument inkwargs
during themake()
call, and update that dictionary separately. This has the benefit of being transparent to the module (so no extra module code is needed), but means we would have to agree on the key name (or a list of them) that we want to be handled in this special way.__init__()
by splitting upconfig
into its associated keys. i.e.def __init__(self, axis=0, feature_range=None, ...)
. Since these keys are now individual kwargs, they will get appropriately updated by the registry. This has the benefit of no changes to the registry. However, this would mean you couldn't callmake('minmax_scaler', config=scaler_config)
. Instead, your code would either need to be something likemake('minmax_scaler', feature_range=[0,1], axis=None)
ormake('minmax_scaler', **scaler_config)
. The kwargs passed toregister()
would also need to be separated.There are probably other ways to change this as well, but these were the top 3 ideas I could think of after discussing with @dlersch, @Kishanrajput, and @ahmedmohammed107. I am starting to lean towards using idea 3, as it feels somewhat natural to me, but I'm open to anything.
One thing to note: None of these ideas will recursively update the configuration, so any configuration arguments that are dictionaries will still need to be passed in their entirety.
The text was updated successfully, but these errors were encountered: