Solve wordle using reinforcement learning. Assume we know underlying dynamics (i.e. all possible answers and its distribution).
1. Parametrize Q(s,a) using attention. [h1,...hk], [a,w1,...wn]. readout would be average of hs
2. solve game without dynamics
3. Bigger state/action space, e.g. more possible answers and guesses