You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In bandit feedback, n_actions are set as int(self.action.max() + 1), which doesn't raise any error in above code,
assuming that logs generated by policy covered all possible actions.
However, to be more precise, I think n_actions should be explicitly given, rather than extracted from log data.
And if changed, the above code might raise error.
If 1000 possible actions and only 0~998 actions exist in bandit _feedback and somehow policy selected action 999,
this might raise out-of-index error.
Idea
BanditFeedback data is given n_actions explicitly.
Rather than:
Possible Issue
In bandit feedback, n_actions are set as
int(self.action.max() + 1)
, which doesn't raise any error in above code,assuming that logs generated by
policy
covered all possible actions.However, to be more precise, I think
n_actions
should be explicitly given, rather than extracted from log data.And if changed, the above code might raise error.
If 1000 possible actions and only 0~998 actions exist in bandit _feedback and somehow policy selected action 999,
this might raise out-of-index error.
Idea
BanditFeedback data is given
n_actions
explicitly.Rather than:
zr-obp/obp/dataset/real.py
Lines 78 to 81 in 55ab57e
Use
n_actions
directly inconvert_to_action_dist
Rather than:
zr-obp/obp/simulator/simulator.py
Lines 75 to 78 in 55ab57e
The text was updated successfully, but these errors were encountered: