AlphaZero subtree persistence #86

lowrollr · 2024-01-06T23:21:21Z

Requested by #51, this PR introduces the capability to pass a Tree to muzero_policy and gumbel_muzero_policy, allowing for MCTS to continue from a pre-initialized tree.

The main use-case is for users implementing AlphaZero, where environment dynamics are known, not modeled and therefore saving work from a previous MCTS call becomes useful.

I introduce a new public API function get_subtree, which extracts a subtree rooted at a given root child index, which can be utilized by AlphaZero-esque implementations to extract the subtree corresponding to a taken action.

I also include a utility function reset_search_tree, which can be used to reset/zero out the search tree, useful in the case of a terminated episode where the search tree can be discarded.

Including this feature within an AlphaZero implementation might look something like this (pseudo-code)

output = mctx.muzero_policy(..., tree=tree)
tree = mctx.get_subtree(output.search_tree, output.action)
terminated = env.step(output.action)
tree = mctx.reset_search_tree(tree, terminated)

In the case where no trees have been initialized mctx.muzero_policy(..., tree=None) still works and will instantiate a new search tree (as before).

I've also decoupled num_simulations from the capacity of the search tree, which is now specified as an argument to muzero_policy or gumbel_muzero_policy called max_nodes. If max_nodes is not specified, the tree capacity defaults to num_simulations (just as it worked before). This is useful in the case of AlphaZero, where the number of occupied nodes in the search tree may grow/shrink from call to call so it's useful to include extra capacity.

I also included tests for get_subtree that run on each of the existing test pytrees. The tests run get_subtree on each of the root children and compare against the source tree. I'd be happy to only run on a subset of the child nodes if test runtime is too long (~60s total on my machine).

Calls the public API work as they did before, I did not introduce any new mandatory arguments. Happy to re-organize & re-tool any of these changes if the maintainers have suggestions.

lowrollr · 2024-01-07T19:44:19Z

I thought of one concern regarding the Tree property num_simulations. The number of simulations that a particular Tree object supported used to be equivalent to its capacity, but in this PR this is no longer the case, which could make the name of this property deceiving (as it now just tied to capacity, or maximum number of simulations).

fidlej · 2024-01-14T16:45:24Z

Thanks for trying the get_subtree() and sending the PR.
Sorry for my slow response.

I worry that the subtree reuse is not compatible with the current gumbel_muzero_policy implementation.
That policy assumes that the tree starts empty. To implement the sequential halving, the action selection uses a simulation_index.
https://github.com/google-deepmind/mctx/blob/d40d32e1a18fb73030762bac33819f95fff9787c/mctx/_src/action_selection.py#L145C3-L145C19

lowrollr · 2024-01-15T22:40:28Z

I see -- I'm not aware of a good way to incorporate any existing visit counts into the sequential halving algorithm, especially given that they were generated by the interior action selection algorithm -- perhaps devising a way to do this would be a good research problem but is probably out of scope for this PR.

I will remove the option for subtree reuse from gumbel_muzero_policy and just allow it for muzero_policy.
If you'd prefer, I could instead create a new policy alphazero_policy that allows for subtree reuse and is otherwise identical to muzero_policy and restore muzero_policy to the way it was before. I wanted to minimize changes to the public API but this could help disambiguate.

fidlej · 2024-01-16T00:24:48Z

Thanks for the comment.
Are you sure that the implementation works correctly?
I left some comments on the code, but I have not checked everything.

lowrollr · 2024-01-16T01:53:09Z

Thanks for the comment.
Are you sure that the implementation works correctly?

As far as I can tell -- all subtrees of the provided test trees are reproduced accurately in the tests I wrote. I also tested the feature in the Connect 4 example notebook linked in the readme and had no issues.

I'd be happy to write some more granular test cases if you'd like.

I left some comments on the code, but I have not checked everything.

I'm not able to see your comments yet

fidlej

Sorry, my comments were pending.

mctx/_src/search.py

fidlej · 2024-01-14T16:35:55Z

mctx/_src/search.py

    tree = expand(
        params, expand_key, tree, recurrent_fn, parent_index,
        action, next_node_index)
+    # if next_node_index goes out of bounds (i.e. no room left for new nodes)
+    # backward its (in-bounds) parent


Should we bound the next_node_index before calling expand()?

I'm exploiting that out-of-bounds updates are no-ops in JAX here, otherwise I'd have to change some of the logic in expand(). If this is a bad pattern I can try something else.

I assumed that if the tree is full we do not want to overwrite any already-expanded node, but still backpropagate the value normally as if we did do an expansion. (this is why I put the out of bounds logic after expand() )

fidlej · 2024-01-14T16:38:33Z

mctx/_src/policies.py

    invalid_actions: a mask with invalid actions. Invalid actions
      have ones, valid actions have zeros in the mask. Shape `[B, num_actions]`.
    max_depth: maximum search tree depth allowed during simulation.
+    max_nodes: maximum number of nodes allowed in the search tree. If `None`,
+      max_nodes == num_simulations + 1. This only applies when `tree` is `None` 


Do we need to specify max_nodes? Cannot we always deduce the max_nodes from the num_simulations?

The idea behind max_nodes is that when initializing a Tree we need to choose a capacity. When the tree is discarded after each call to search(), we can just initialize a tree with capacity = num_simulations.

However, in the case where we want to re-use a tree, I thought it might be useful to decouple num_simulations from tree capacity, s.t. tree capacity >= num_simulations, allowing for extra room for the next search call's expanded nodes.

In the case where a selected subtree contains on average S nodes, one might want to set the tree capacity to some value >= S + num_simulations, so that there is room for most node expansions. How high/low to set this capacity relative to num_simulations becomes a problem-dependent memory/(accuracy?) trade-off.

I do think that having max_nodes alongside num_simulations could be confusing, perhaps a cleaner way to organize this would be:

change tree to be a mandatory argument to search()

muzero_policy etc. each call instantiate_tree_from_root and pass the initialized tree to search()

lets us remove max_nodes argument to search() (and coincidentally root, extra_data, and root_invalid_actions)

create a new policy alphazero_policy:

[option 1]: accepts an optional tree and max_nodes, initializes a tree with capacity max_nodes if a tree is not passed as an argument

[option 2]: requires an initialized tree to be passed (would need another new public API function to initialize one)

remove max_nodes and tree as arguments to muzero_policy and only support tree re-use in alphazero_policy

This isolates passing a subtree to alphazero_policy alone, which I like given that you'd never actually want to pass a subtree in MuZero. Maybe it could even return the subtree as part of the output??

I implemented these proposed changes (w/ option 1) in a branch on my mctx fork: https://github.com/lowrollr/mctx/tree/alphazero_policy

I also pushed a modified version of the Connect 4 example to that branch that uses alphazero_policy and get_subtree. If you feel this is a good approach I could write some unit tests for alphazero_policy as well.

fidlej · 2024-01-17T22:09:08Z

Thanks for the clarifications.
You understand the code well.

Would it be OK to keep the functionality unmerged?
If people want this alphazero-specific functionality, they can look at your repository.

lowrollr · 2024-01-17T22:26:31Z

You mention AlphaZero in the readme, so in my opinion supporting subtree re-use should be included functionality.

I understand wanting to keep the codebase as lightweight and simple as possible. If you have specific concerns, constraints, or parts of the code you'd prefer be left unchanged I'd be happy to work around them to get this ok to merge.

fidlej · 2024-01-17T22:43:46Z

I want to ensure that mctx will work correctly
and I currently do not have time to carefully review the proposed changes.
Mctx will probably remain mostly frozen.

lowrollr · 2024-01-17T23:06:14Z

That is understandable.
In that case I can document the new functionality in my repository and provide a few examples.
Would appreciate you adding a link to my repo in 'Example Projects' in the mctx readme when I am done.

Thank you for taking a look at my code, I admire this repo a lot.

fidlej · 2024-01-18T10:10:49Z

Thank you.
When you are ready with your repo, please ping me and I will add the link.

lowrollr · 2024-01-18T19:23:01Z

Here's the link: https://github.com/lowrollr/mctx-az

@fidlej

lowrollr added 18 commits January 2, 2024 18:33

new node idx assignment

9be0a13

load_subtree

a58c9e1

fix parent

75c5283

add tree arg to muzero policies

d89074f

organization

64187dd

fix subtree idx mapping + add max_nodes arg

02e6578

fix more edge cases

6849b7f

undo change to masktree

e7b7677

fix backwards logic for updating root

60869b8

add get_subtree to public api

cedb4d0

fix state embeddings

e94e7b5

add tests, add missing children_rewards field to get_subtree

3475c1a

terminate test early when unvisited node is found

f885427

add comments

9ac6f47

more comments

fe26a72

revert tree path

f56df1a

org

a2161e1

more cleanup

fb7d471

lowrollr changed the title ~~Utility functions for AlphaZero subtree persistence~~ functions for AlphaZero subtree persistence Jan 7, 2024

lowrollr changed the title ~~functions for AlphaZero subtree persistence~~ AlphaZero subtree persistence Jan 7, 2024

fix translate overwriting bools as ints

e6c69b8

rm tree+max_nodes from gumbel muzero policy

f756448

fidlej reviewed Jan 16, 2024

View reviewed changes

lowrollr closed this Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AlphaZero subtree persistence #86

AlphaZero subtree persistence #86

lowrollr commented Jan 6, 2024 •

edited

Loading

lowrollr commented Jan 7, 2024 •

edited

Loading

fidlej commented Jan 14, 2024

lowrollr commented Jan 15, 2024 •

edited

Loading

fidlej commented Jan 16, 2024

lowrollr commented Jan 16, 2024 •

edited

Loading

fidlej left a comment

fidlej Jan 14, 2024

lowrollr Jan 16, 2024

fidlej Jan 14, 2024

lowrollr Jan 16, 2024 •

edited

Loading

fidlej commented Jan 17, 2024

lowrollr commented Jan 17, 2024 •

edited

Loading

fidlej commented Jan 17, 2024

lowrollr commented Jan 17, 2024

fidlej commented Jan 18, 2024

lowrollr commented Jan 18, 2024

AlphaZero subtree persistence #86

AlphaZero subtree persistence #86

Conversation

lowrollr commented Jan 6, 2024 • edited Loading

lowrollr commented Jan 7, 2024 • edited Loading

fidlej commented Jan 14, 2024

lowrollr commented Jan 15, 2024 • edited Loading

fidlej commented Jan 16, 2024

lowrollr commented Jan 16, 2024 • edited Loading

fidlej left a comment

Choose a reason for hiding this comment

fidlej Jan 14, 2024

Choose a reason for hiding this comment

lowrollr Jan 16, 2024

Choose a reason for hiding this comment

fidlej Jan 14, 2024

Choose a reason for hiding this comment

lowrollr Jan 16, 2024 • edited Loading

Choose a reason for hiding this comment

fidlej commented Jan 17, 2024

lowrollr commented Jan 17, 2024 • edited Loading

fidlej commented Jan 17, 2024

lowrollr commented Jan 17, 2024

fidlej commented Jan 18, 2024

lowrollr commented Jan 18, 2024

lowrollr commented Jan 6, 2024 •

edited

Loading

lowrollr commented Jan 7, 2024 •

edited

Loading

lowrollr commented Jan 15, 2024 •

edited

Loading

lowrollr commented Jan 16, 2024 •

edited

Loading

lowrollr Jan 16, 2024 •

edited

Loading

lowrollr commented Jan 17, 2024 •

edited

Loading