Skip to content

Commit

Permalink
Merge branch 'master' of github.com:cameronangliss/poke-env into none…
Browse files Browse the repository at this point in the history
…-battleorder-is-pass
  • Loading branch information
cameronangliss committed Dec 26, 2024
2 parents 99d21e2 + 2f98f41 commit c5612fb
Show file tree
Hide file tree
Showing 16 changed files with 85 additions and 56 deletions.
2 changes: 1 addition & 1 deletion docs/source/examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ This page lists detailled examples demonstrating how to use this package. They a
quickstart
using_a_custom_teambuilder
connecting_to_showdown_and_challenging_humans
rl_with_open_ai_gym_wrapper
rl_with_gymnasium_wrapper
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
.. _rl_with_open_ai_gym_wrapper:
.. _rl_with_gymnasium_wrapper:

Reinforcement learning with the OpenAI Gym wrapper
Reinforcement learning with the Gymnasium wrapper
==================================================

The corresponding complete source code can be found `here <https://github.com/hsahovic/poke-env/blob/master/examples/rl_with_new_open_ai_gym_wrapper.py>`__.
The corresponding complete source code can be found `here <https://github.com/hsahovic/poke-env/blob/master/examples/rl_with_new_gymnasium_wrapper.py>`__.

The goal of this example is to demonstrate how to use the `open ai gym <https://gym.openai.com/>`__ interface proposed by ``EnvPlayer``, and to train a simple deep reinforcement learning agent comparable in performance to the ``MaxDamagePlayer`` we created in :ref:`max_damage_player`.
The goal of this example is to demonstrate how to use the `farama gymnasium <https://gymnasium.farama.org/>`__ interface proposed by ``EnvPlayer``, and to train a simple deep reinforcement learning agent comparable in performance to the ``MaxDamagePlayer`` we created in :ref:`max_damage_player`.

.. note:: This example necessitates `keras-rl <https://github.com/keras-rl/keras-rl>`__ (compatible with Tensorflow 1.X) or `keras-rl2 <https://github.com/wau/keras-rl2>`__ (Tensorflow 2.X), which implement numerous reinforcement learning algorithms and offer a simple API fully compatible with the Open AI Gym API. You can install them by running ``pip install keras-rl`` or ``pip install keras-rl2``. If you are unsure, ``pip install keras-rl2`` is recommended.
.. note:: This example necessitates `keras-rl <https://github.com/keras-rl/keras-rl>`__ (compatible with Tensorflow 1.X) or `keras-rl2 <https://github.com/wau/keras-rl2>`__ (Tensorflow 2.X), which implement numerous reinforcement learning algorithms and offer a simple API fully compatible with the Gymnasium API. You can install them by running ``pip install keras-rl`` or ``pip install keras-rl2``. If you are unsure, ``pip install keras-rl2`` is recommended.

Implementing rewards and observations
*************************************

The open ai gym API provides *rewards* and *observations* for each step of each episode. In our case, each step corresponds to one decision in a battle and battles correspond to episodes.
The Gymnasium API provides *rewards* and *observations* for each step of each episode. In our case, each step corresponds to one decision in a battle and battles correspond to episodes.

Defining observations
^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -26,9 +26,9 @@ Observations are embeddings of the current state of the battle. They can be an a

To define our observations, we will create a custom ``embed_battle`` method. It takes one argument, a ``Battle`` object, and returns our embedding.

In addition to this, we also need to describe the embedding to the gym interface.
In addition to this, we also need to describe the embedding to the gymnasium interface.
To achieve this, we need to implement the ``describe_embedding`` method where we specify the low bound and the high bound
for each component of the embedding vector and return them as a ``gym.Space`` object.
for each component of the embedding vector and return them as a ``gymnasium.Space`` object.

Defining rewards
^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -108,7 +108,7 @@ Our player will play the ``gen8randombattle`` format. We can therefore inherit f
Instantiating and testing a player
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Now that our custom class is defined, we can instantiate our RL player and test if it's compliant with the OpenAI gym API.
Now that our custom class is defined, we can instantiate our RL player and test if it's compliant with the Gymnasium API.

.. code-block:: python
Expand Down Expand Up @@ -340,7 +340,7 @@ To use the ``cross_evaluate`` method, the strategy is the same to the one used f
Final result
************

Running the `whole file <https://github.com/hsahovic/poke-env/blob/master/examples/rl_with_new_open_ai_gym_wrapper.py>`__ should take a couple of minutes and print something similar to this:
Running the `whole file <https://github.com/hsahovic/poke-env/blob/master/examples/rl_with_gymnasium_wrapper.py>`__ should take a couple of minutes and print something similar to this:

.. code-block:: console
Expand Down
2 changes: 1 addition & 1 deletion docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Agents in ``poke-env`` are instances of the ``Player`` class. Explore the follow

- Basic agent: :ref:`/examples/cross_evaluate_random_players.ipynb`
- Advanced agent: :ref:`max_damage_player`
- RL agent: :ref:`rl_with_open_ai_gym_wrapper`
- RL agent: :ref:`rl_with_gymnasium_wrapper`
- Using teams: :ref:`ou_max_player`
- Custom team builder: :ref:`using_a_custom_teambuilder`

Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Poke-env: A Python Interface for Training Reinforcement Learning Pokémon Bots

Poke-env provides an environment for engaging in `Pokémon Showdown <https://pokemonshowdown.com/>`__ battles with a focus on reinforcement learning.

It boasts a straightforward API for handling Pokémon, Battles, Moves, and other battle-centric objects, alongside an `OpenAI Gym <https://gym.openai.com/>`__ interface for training agents.
It boasts a straightforward API for handling Pokémon, Battles, Moves, and other battle-centric objects, alongside a `Farama Gymnasium <https://gymnasium.farama.org/>`__ interface for training agents.

.. attention:: While poke-env aims to support all Pokémon generations, it was primarily developed with the latest generations in mind. If you discover any missing or incorrect functionalities for earlier generations, please `open an issue <https://github.com/hsahovic/poke-env/issues>`__ to help improve the library.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/modules/player.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ Player
:undoc-members:
:show-inheritance:

OpenAIGymEnv
GymnasiumEnv
************

.. automodule:: poke_env.player.openai_api
.. automodule:: poke_env.player.gymnasium_api
:members:
:undoc-members:
:show-inheritance:
Expand Down
20 changes: 10 additions & 10 deletions examples/openai_example.py → examples/gymnasium_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@
from poke_env.environment.abstract_battle import AbstractBattle
from poke_env.player import (
Gen8EnvSinglePlayer,
GymnasiumEnv,
ObservationType,
OpenAIGymEnv,
RandomPlayer,
)


class TestEnv(OpenAIGymEnv):
class TestEnv(GymnasiumEnv):
def __init__(self, **kwargs):
self.opponent = RandomPlayer(
battle_format="gen8randombattle",
Expand Down Expand Up @@ -66,31 +66,31 @@ def describe_embedding(self) -> Space:
return Box(np.array([0, 0]), np.array([6, 6]), dtype=int)


def openai_api():
gym_env = TestEnv(
def gymnasium_api():
gymnasium_env = TestEnv(
battle_format="gen8randombattle",
server_configuration=LocalhostServerConfiguration,
start_challenging=True,
)
check_env(gym_env)
gym_env.close()
check_env(gymnasium_env)
gymnasium_env.close()


def env_player():
opponent = RandomPlayer(
battle_format="gen8randombattle",
server_configuration=LocalhostServerConfiguration,
)
gym_env = Gen8(
gymnasium_env = Gen8(
battle_format="gen8randombattle",
server_configuration=LocalhostServerConfiguration,
start_challenging=True,
opponent=opponent,
)
check_env(gym_env)
gym_env.close()
check_env(gymnasium_env)
gymnasium_env.close()


if __name__ == "__main__":
openai_api()
gymnasium_api()
env_player()
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def describe_embedding(self) -> Space:

async def main():
# First test the environment to ensure the class is consistent
# with the OpenAI API
# with the Gymnasium API
opponent = RandomPlayer(battle_format="gen8randombattle")
test_env = SimpleRLPlayer(
battle_format="gen8randombattle", start_challenging=True, opponent=opponent
Expand Down
12 changes: 6 additions & 6 deletions integration_tests/test_env_player.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ def play_function(player, n_battles):


@pytest.mark.timeout(30)
def test_random_gym_players_gen4():
def test_random_gymnasium_players_gen4():
random_player = RandomPlayer(battle_format="gen4randombattle", log_level=25)
env_player = RandomGen4EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
Expand All @@ -100,7 +100,7 @@ def test_random_gym_players_gen4():


@pytest.mark.timeout(30)
def test_random_gym_players_gen5():
def test_random_gymnasium_players_gen5():
random_player = RandomPlayer(battle_format="gen5randombattle", log_level=25)
env_player = RandomGen5EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
Expand All @@ -110,7 +110,7 @@ def test_random_gym_players_gen5():


@pytest.mark.timeout(30)
def test_random_gym_players_gen6():
def test_random_gymnasium_players_gen6():
random_player = RandomPlayer(battle_format="gen6randombattle", log_level=25)
env_player = RandomGen6EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
Expand All @@ -120,7 +120,7 @@ def test_random_gym_players_gen6():


@pytest.mark.timeout(30)
def test_random_gym_players_gen7():
def test_random_gymnasium_players_gen7():
random_player = RandomPlayer(battle_format="gen7randombattle", log_level=25)
env_player = RandomGen7EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
Expand All @@ -130,7 +130,7 @@ def test_random_gym_players_gen7():


@pytest.mark.timeout(30)
def test_random_gym_players_gen8():
def test_random_gymnasium_players_gen8():
random_player = RandomPlayer(battle_format="gen8randombattle", log_level=25)
env_player = RandomGen8EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
Expand All @@ -140,7 +140,7 @@ def test_random_gym_players_gen8():


@pytest.mark.timeout(30)
def test_random_gym_players_gen9():
def test_random_gymnasium_players_gen9():
random_player = RandomPlayer(battle_format="gen9randombattle", log_level=25)
env_player = RandomGen9EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
Expand Down
8 changes: 4 additions & 4 deletions src/poke_env/player/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"""

from poke_env.concurrency import POKE_LOOP
from poke_env.player import env_player, openai_api, player, random_player, utils
from poke_env.player import env_player, gymnasium_api, player, random_player, utils
from poke_env.player.baselines import MaxBasePowerPlayer, SimpleHeuristicsPlayer
from poke_env.player.battle_order import (
BattleOrder,
Expand All @@ -19,7 +19,7 @@
Gen8EnvSinglePlayer,
Gen9EnvSinglePlayer,
)
from poke_env.player.openai_api import ActType, ObsType, OpenAIGymEnv
from poke_env.player.gymnasium_api import ActType, GymnasiumEnv, ObsType
from poke_env.player.player import Player
from poke_env.player.random_player import RandomPlayer
from poke_env.player.utils import (
Expand All @@ -32,7 +32,7 @@

__all__ = [
"env_player",
"openai_api",
"gymnasium_api",
"player",
"random_player",
"utils",
Expand All @@ -47,7 +47,7 @@
"Gen8EnvSinglePlayer",
"Gen9EnvSinglePlayer",
"POKE_LOOP",
"OpenAIGymEnv",
"GymnasiumEnv",
"PSClient",
"Player",
"RandomPlayer",
Expand Down
15 changes: 11 additions & 4 deletions src/poke_env/player/env_player.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""This module defines a player class exposing the Open AI Gym API with utility functions.
"""This module defines a player class exposing the Gymnasium API with utility functions.
"""

from abc import ABC
Expand All @@ -8,15 +8,15 @@

from poke_env.environment.abstract_battle import AbstractBattle
from poke_env.player.battle_order import BattleOrder, ForfeitBattleOrder
from poke_env.player.openai_api import ActType, ObsType, OpenAIGymEnv
from poke_env.player.gymnasium_api import ActType, GymnasiumEnv, ObsType
from poke_env.player.player import Player
from poke_env.ps_client.account_configuration import AccountConfiguration
from poke_env.ps_client.server_configuration import ServerConfiguration
from poke_env.teambuilder.teambuilder import Teambuilder


class EnvPlayer(OpenAIGymEnv[ObsType, ActType], ABC):
"""Player exposing the Open AI Gym Env API."""
class EnvPlayer(GymnasiumEnv[ObsType, ActType], ABC):
"""Player exposing the Gymnasium Env API."""

_ACTION_SPACE: List[int] = []
_DEFAULT_BATTLE_FORMAT = "gen8randombattle"
Expand All @@ -34,6 +34,7 @@ def __init__(
start_listening: bool = True,
accept_open_team_sheet: Optional[bool] = False,
start_timer_on_battle_start: bool = False,
open_timeout: Optional[float] = 10.0,
ping_interval: Optional[float] = 20.0,
ping_timeout: Optional[float] = 20.0,
team: Optional[Union[str, Teambuilder]] = None,
Expand Down Expand Up @@ -69,6 +70,11 @@ def __init__(
:param start_timer_on_battle_start: Whether to automatically start the battle
timer on battle start. Defaults to False.
:type start_timer_on_battle_start: bool
:param open_timeout: How long to wait for a timeout when connecting the socket
(important for backend websockets.
Increase only if timeouts occur during runtime).
If None connect will never time out.
:type open_timeout: float, optional
:param ping_interval: How long between keepalive pings (Important for backend
websockets). If None, disables keepalive entirely.
:type ping_interval: float, optional
Expand Down Expand Up @@ -106,6 +112,7 @@ def __init__(
accept_open_team_sheet=accept_open_team_sheet,
start_timer_on_battle_start=start_timer_on_battle_start,
team=team,
open_timeout=open_timeout,
ping_interval=ping_interval,
ping_timeout=ping_timeout,
start_challenging=start_challenging,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""This module defines a player class with the OpenAI API on the main thread.
"""This module defines a player class with the Gymnasium API on the main thread.
For a black-box implementation consider using the module env_player.
"""

Expand Down Expand Up @@ -62,7 +62,7 @@ class _AsyncPlayer(Generic[ObsType, ActType], Player):

def __init__(
self,
user_funcs: OpenAIGymEnv[ObsType, ActType],
user_funcs: GymnasiumEnv[ObsType, ActType],
username: str,
**kwargs: Any,
):
Expand Down Expand Up @@ -94,12 +94,12 @@ def _battle_finished_callback(self, battle: AbstractBattle):
asyncio.run_coroutine_threadsafe(self.observations.async_put(to_put), POKE_LOOP)


class OpenAIGymEnv(
class GymnasiumEnv(
Env[ObsType, ActType],
ABC,
):
"""
Base class implementing the OpenAI Gym API on the main thread.
Base class implementing the Gymnasium API on the main thread.
"""

_INIT_RETRIES = 100
Expand All @@ -121,6 +121,7 @@ def __init__(
accept_open_team_sheet: Optional[bool] = False,
start_timer_on_battle_start: bool = False,
start_listening: bool = True,
open_timeout: Optional[float] = 10.0,
ping_interval: Optional[float] = 20.0,
ping_timeout: Optional[float] = 20.0,
team: Optional[Union[str, Teambuilder]] = None,
Expand Down Expand Up @@ -154,6 +155,11 @@ def __init__(
:param start_timer_on_battle_start: Whether to automatically start the battle
timer on battle start. Defaults to False.
:type start_timer_on_battle_start: bool
:param open_timeout: How long to wait for a timeout when connecting the socket
(important for backend websockets.
Increase only if timeouts occur during runtime).
If None connect will never time out.
:type open_timeout: float, optional
:param ping_interval: How long between keepalive pings (Important for backend
websockets). If None, disables keepalive entirely.
:type ping_interval: float, optional
Expand Down Expand Up @@ -183,6 +189,7 @@ def __init__(
accept_open_team_sheet=accept_open_team_sheet,
start_timer_on_battle_start=start_timer_on_battle_start,
start_listening=start_listening,
open_timeout=open_timeout,
ping_interval=ping_interval,
ping_timeout=ping_timeout,
team=team,
Expand Down Expand Up @@ -239,7 +246,7 @@ def action_to_move(self, action: int, battle: AbstractBattle) -> BattleOrder:
def embed_battle(self, battle: AbstractBattle) -> ObsType:
"""
Returns the embedding of the current battle state in a format compatible with
the OpenAI gym API.
the Gymnasium API.
:param battle: The current battle state.
:type battle: AbstractBattle
Expand Down Expand Up @@ -416,7 +423,7 @@ def close(self, purge: bool = True):
def background_send_challenge(self, username: str):
"""
Sends a single challenge specified player. The function immediately returns
to allow use of the OpenAI gym API.
to allow use of the Gymnasium API.
:param username: The username of the player to challenge.
:type username: str
Expand All @@ -434,7 +441,7 @@ def background_send_challenge(self, username: str):
def background_accept_challenge(self, username: str):
"""
Accepts a single challenge specified player. The function immediately returns
to allow use of the OpenAI gym API.
to allow use of the Gymnasium API.
:param username: The username of the player to challenge.
:type username: str
Expand Down
Loading

0 comments on commit c5612fb

Please sign in to comment.