-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eventually: add flexibility for the exploration criterion #14
Comments
This should be done similar to how it is in POMCPOW: https://github.com/JuliaPOMDP/POMCPOW.jl/blob/master/src/criteria.jl It may need some thought about the interface. E.g. should |
Also, it will be annoying to deprecate the c keyword argument :( |
I'm interested in this for DPW / continuous actions. I need it for my research. For me, it should look as much like a continuous bandit problem as possible. Best case scenario, it can be independently tested as a bandit (or even have it as a separate bandit package). This might mean that the interface should pass s, a, and r. Might also need to have two functions, one for selecting an action and another for updating after observing r. Let me know what you think on the design side. I can implement it. Would be nice to have something shareable and generic. i need it soon though, so I might need to hack something together and clean it up later. |
I think But it sounds like you want to do something different - you want to use r instead of Q for your bandit? You could, of course, call I am a little hesitant to put the |
Right now, the UCB exploration criterion is hard-coded into the solver. We should eventually make this flexible.
The text was updated successfully, but these errors were encountered: