Environment documentation

Here you will find documentation for the VCMI environment.

Note

Note that this document is about the VCMI-v3 gym environment, which (as of Aug 2024) is the newest available version. Other environment versions may use different actions, observations and rewards, so please refer to the source code if you need more information about them.

vcmi-gym implements the Gym API spec, please refer to the Gymnasium documentation for reference.

Starting

VCMI is written in C++ and has an ML extension which must be compiled as a dynamic library libmlclient (compilation instructions here). This library is linked to the main python process at runtime as part of the gym environment's boot process.

Starting the RL environment automatically starts VCMI in a separate thread. Communication between the two threads is made possible via the connector component (for details on how it works, see the Connector doc).

Starting the environment is as simple as:

import gymnasium as gym
import vcmi_gym

env = gym.make("VCMI-v3", mapname="gym/A1.vmap")

There are more than 30 optional startup arguments which can be used to customize the environment. Information about them can be found in VcmiEnv's __init__ docstring here.

👁️ Observations

On each timestep, a flat (1-D) Box observation space with a total of 12685 floats is returned.

These floats represent the encoded equivalents of various aspects of the current environment state. Three main encoding types are used in the observation:

Normalized - input value is normalized between 0 and 1
categorical - input value is one-hot encoded
binary - input value is represented as a binary number

Depending on how NULL values are handled, there can be different variations of those encodings. The below table contains examples which should explain the differences:

Encoding	NULL handling	Abbreviation	Example attribute `vmax=5`
Encoding	NULL handling	Abbreviation	Input	Output
Categorical	Explicit	CE	`v=5`	`[0, 0, 0, 0, 0, 0, 1]`
			`v=3`	`[0, 0, 0, 0, 1, 0, 0]`
			`v=0`	`[0, 1, 0, 0, 0, 0, 0]`
			null	`[1, 0, 0, 0, 0, 0, 0]`
Categorical	Strict	CS	`v=5`	`[0, 0, 0, 0, 0, 1]`
			`v=3`	`[0, 0, 0, 1, 0, 0]`
			`v=0`	`[1, 0, 0, 0, 0, 0]`
			null	error
Binary	Explicit	BE	`v=5`	`[0, 1, 0, 1]`
			`v=3`	`[0, 0, 1, 1]`
			`v=0`	`[0, 0, 0, 0]`
			null	`[1, 0, 0, 0]`
Binary	Zero	BZ	`v=5`	`[1, 0, 1]`
			`v=3`	`[0, 1, 1]`
			`v=0`	`[0, 0, 0]`
			null	`[0, 0, 0]`
Binary	Strict	BS	`v=5`	`[1, 0, 1]`
			`v=3`	`[0, 1, 1]`
			`v=0`	`[0, 0, 0]`
			null	error
Normalized	Explicit	NE	`v=5`	`[0, 1]`
			`v=3`	`[0, 0.6]`
			`v=0`	`[0, 0]`
			null	`[1, 0]`
Normalized	Strict	NS	`v=5`	`[1]`
			`v=3`	`[0.6]`
			`v=0`	`[0]`
			null	error

All 12685 floats in the observation represent encoded information for a total of 20 stacks (10 attacker stacks, 10 defender stacks) and 165 hexes (the number of hexes on the battlefield).

Stacks

The first 1960 floats of the observation provide information about the stacks on the battlefield and are distributed as follows:

Stack ID	Index	Description
1	`0` ... `97`	Red army slot #1
2	`98` ... `195`	Red army slot #2
3	`196` ... `293`	Red army slot #3
4	`294` ... `391`	Red army slot #4
5	`392` ... `489`	Red army slot #5
6	`490` ... `587`	Red army slot #6
7	`588` ... `685`	Red army slot #7
8	`686` ... `783`	Red army extra slot #1 *
9	`784` ... `881`	Red army extra slot #2 *
10	`882` ... `979`	Red army extra slot #3 *
11	`980` ... `1077`	Blue army slot #1
12	`1078` ... `1175`	Blue army slot #2
13	`1176` ... `1273`	Blue army slot #3
14	`1274` ... `1371`	Blue army slot #4
15	`1372` ... `1469`	Blue army slot #5
16	`1470` ... `1567`	Blue army slot #6
17	`1568` ... `1665`	Blue army slot #7
18	`1666` ... `1763`	Blue army extra slot #1 *
19	`1764` ... `1861`	Blue army extra slot #2 *
20	`1862` ... `1959`	Blue army extra slot #3 *

* An "extra" slot is used for summoned creatures and war machines. If there are more than 10 alive stacks in the hero's army, some of them will remain "hidden" from the agent.

Note

The terms "attacker", "left" and "red" are used interchangeably here. The same applies for "defender", "right" and "blue". This is because in VCMI, the attacking army is always on the left side and AI training maps are designed such that the red player always attacks the blue player.

Each stack is represented by a total of 98 floats which represent the creature's attirbutes, similar to what the player would see when right-clicking on the creature: owner, quantity, creature type, attack, defence, etc.:

Stack Attribute	Index	Encoding	Description
ID	`0` ... `20`	CE	Stack number
Y_COORD	`21` ... `32`	CE	Y coordinate of stack's front hex
X_COORD	`33` ... `48`	CE	X coordinate of stack's front hex
SIDE	`49` ... `51`	CE	Side in battle (attacker/defender)
QUANTITY	`52` ... `53`	NE	Stack quantity
ATTACK	`54` ... `55`	NE	Attack
DEFENSE	`56` ... `57`	NE	Defense
SHOTS	`58` ... `59`	NE	Shots remaining
DMG_MIN	`60` ... `61`	NE	Dmg (min)
DMG_MAX	`62` ... `63`	NE	Dmg (max)
HP	`64` ... `65`	NE	Hit points
HP_LEFT	`66` ... `67`	NE	Hit points left
SPEED	`68` ... `69`	NE	Speed
WAITED	`70` ... `71`	NE	Waited this turn?
QUEUE_POS	`72` ... `73`	NE	Turn order queue position
RETALIATIONS_LEFT	`74` ... `75`	NE	Retaliations left
IS_WIDE	`76` ... `77`	NE	Is it a two-hex stack?
AI_VALUE	`78` ... `79`	NE	AI value
MORALE	`80` ... `81`	NE	Morale
LUCK	`82` ... `83`	NE	Luck
FLYING	`84` ... `85`	NE	Can fly?
BLIND_LIKE_ATTACK	`86` ... `87`	NE	Chance to blind/paralyze/petrify
ADDITIONAL_ATTACK	`88` ... `89`	NE	Attacks twice?
NO_MELEE_PENALTY	`90` ... `91`	NE	Has no melee penalty?
TWO_HEX_ATTACK_BREATH	`92` ... `93`	NE	Has dragon breath?
NON_LIVING	`94` ... `95`	NE	Is undead or otherwise non-living?
BLOCKS_RETALIATION	`96` ... `97`	NE	Has no enemy retaliation?

Hexes

The remaining 10725 floats of the observation carry information about the 165 battlefield hexes:

Each hex is represented by a total of 65 floats with information about several key characteristics of that hex:

Hex Attribute	Index	Encoding	Description
Y_COORD	`0` ... `10`	CS	Y coordinate of this hex
X_COORD	`11` ... `25`	CS	X coordinate of this hex
STATE_MASK	`26` ... `29`	BS	Hex state flags *
ACTION_MASK	`30` ... `43`	BZ	Hex action flags for the currently active stack **
STACK_ID	`44` ... `63`	CE	Stack ID (see Stacks)

* The hex state flags are used to describe a (combination of) hex properties:

State flag	Explanation
`PASSABLE`	empty/mine/firewall/gate(open)/gate(closed,defender)
`STOPPING`	moat/quicksand
`DAMAGING_L`	moat/mine/firewall that would damage left (i.e. attacker) stacks
`DAMAGING_R`	moat/mine/firewall that would damage right (i.e. defender) stacks

** These hex action flags tell us which actions can the currently active stack perform on that specific hex (see Action space).

The below example shows illustrates to extract information from the observation via the .decode() convenience method:

import gymnasium as gym
import vcmi_gym

env = gym.make("VCMI-v3", mapname="gym/A1.vmap")
observation, _info = env.reset()

""" Raw observation is a numpy array of 12685 floats """
observation
# => array([1., 0., 0., ..., 0., 0., 0.], dtype=float32)

"""
Using env.decode() returns a Battlefield object -- a decoded version of the raw
observation with a minimalistic API allowing to easily explore the information
carried within.
This method is specific to VCMI and is not part of the gym API spec.
"""
bf = env.decode()

""" Get hex 46 (Y=3, X=1). """
h = bf.get_hex(46)  # same as bf.get_hex(3, 1)

""" Print hex data in a human-friendly format. """
h.dump()
# Y_COORD     | 3
# X_COORD     | 1
# STATE_MASK  | PASSABLE
# ACTION_MASK | MOVE

""" Get stack 4. """
s = bf.get_stack(4)

""" Print stack data in a human-friendly format. """
s.dump()
# ID                    | 4
# Y_COORD               | 5
# X_COORD               | 0
# SIDE                  | LEFT
# QUANTITY              | 1
# ATTACK                | 2
# DEFENSE               | 2
# SHOTS                 | 0
# DMG_MIN               | 1
# DMG_MAX               | 1
# HP                    | 1
# ...

🕹️ Actions

vcmi-gym uses a Discrete action space with a total of 2312 actions which is better thought of 2 non-hex action + 2310 hex actions.

The non-hex actions are RETREAT (=0) and WAIT (=1). The remaining values are used for the 14 actions on each hex (there are 165 hexes on the battlefield => 165 * 14 = 2310 hex actions).

For a given Hex (0..164), the action value is: hex_id * 14 + (1 + action_index):

Action index	Description
0..11	Move to hex and attack at direction 0..11*
12	Move to hex
13	Shoot at hex

e.g. Moving to hex #2 (X=2, Y=0) corresponds to action 41.

* The 12 attack directions are as follows: 0..5 are the hexes that surround the current unit, while 7..11 are special cases for 2-hex units (3 per side):

The below

import gymnasium as gym
from vcmi_gym import HexAction

env = gym.make("VCMI-v3", mapname="gym/A2.vmap")

""" Decode the observation. """
bf = env.decode()

""" Get the integer representation of 'move to hex 46' """
action = bf.get_hex(46).action(HexAction.MOVE)
# => 658

""" Execute the action. """
env.step(action)

Action masking

The env object exposes the action_mask() method which is not part of the Gym API spec, but is useful for certain Reinforcement Learning scenarios where invalid actions are masked in order to improve learning performance.

The method returns an np.array with 2312 bool values, indicating the validity of the corresponding action (True means the action is valid).

🍩 Rewards

Rewards are returned on each step based on the calculations below.

Base reward

The base reward on each step is:

$$R_{base} = a * (b + c*D_{net} + V_{net}) + σ*d*V_{diff}$$

where:

R_base is base the reward (optionally modified, see below)
D_net is the net damage since the last step (D_dealt - D_received)
V_net is the net value of units which died since the last step (V_killed - V_lost)
V_diff is the difference in the total army value of the two armies
σ is a term which evaluates to 1 at battle end, 0 otherwise
p₁ is a configurable parameter step_reward_mult. It provides control over the "weight" of per-step rewards. If sparse rewards are desired, this parameter can be set to 0 while the the p₅ to any non-zero value.
p₂ is a configurable parameter step_reward_fixed. A negative value can be set for punishing agents who keep running away from the enemy troops to avoid damage.
p₃ is a configurable parameter reward_dmg_factor. Adjusts the role of the damage in the reward calculations. If set to 0, the reward will only depend on the actual units killed, regardless of the damage dealt.
p₄ is a configurable parameter term_reward_mult. It provides control over the "weight" of the terminal reward.

The resulting value (base reward) is further modified by the following "global" reward modifiers:

Clipped reward

Reward clipping is sometimes advised for more stable learning. It is controlled by the reward_clip_tanh_army_frac parameter:

$$C = p_5 * V_{mean}$$

$$R_{clip} = C * \tanh(R_{base} / C)$$

where:

C is an intermediate variable used for clarity
R_clip is the clipped reward
R_base is the base reward (see above)
V_mean is the mean of the total starting values of the two armies
p₅ is a configurable parameter reward_clip_tanh_army_frac

A value of 0 will disables clipping (i.e. R_clip = R_base)

Scaled reward

Scaling rewards based on the initial starting army values can be achieved via the reward_army_value_ref parameter.

$$R_{scale} = R_{clip} * p_6 * V_{mean}$$

where:

R_sacle is the scaled reward
R_clip is the clipped reward (see above)
p₆ is a configurable parameter reward_army_value_ref

The effect of scaled rewards can be explained via an example:

Consider these two VCMI battles:

(A) armies with total starting army value = 1K (early game army)
(B) armies with total starting army value = 100K (late game army)

Without scaling, the rewards in battle A would be 100 times smaller than the rewards in battle B.

Specifying an army ref of 10K, A and B's rewards will be multiplied by 10 and 0.1, effectively negating this discrepancy and ensuring the RL agent perceives early-game and late-game battles as equally significant.

🖼️ Rendering

This gym environment supports only one type of rendering: the ANSI render.

Should be printed in terminals with ANSI color code support, unicode support and monospaced font (e.g. Ubuntu Mono):

import gymnasium as gym
import vcmi_gym

env = gym.make("VCMI-v3", mapname="gym/A1.vmap")
print(env.render())

Tip

If your output looks unaligned, try changing the font of your terminal

Test Helper

If you want to test the env by playing manually, a convenient helper is provided:

from vcmi_gym import VcmiEnv_v3 as VcmiEnv, HexAction, TestHelper

env = VcmiEnv("gym/A1.vmap");
h = TestHelper(env)

h.move(5, 3)
h.defend()
h.wait()
h.amove(5, 4, HexAction.AMOVE_R)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

env_info.md

env_info.md

Environment documentation

Starting

👁️ Observations

Stacks

Hexes

🕹️ Actions

Action masking

🍩 Rewards

Base reward

Clipped reward

Scaled reward

🖼️ Rendering

Test Helper

Files

env_info.md

Latest commit

History

env_info.md

File metadata and controls

Environment documentation

Starting

👁️ Observations

Stacks

Hexes

🕹️ Actions

Action masking

🍩 Rewards

Base reward

Clipped reward

Scaled reward

🖼️ Rendering

Test Helper