Skip to content

lil-lab/lilgym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lilGym: Natural Language Visual Reasoning with Reinforcement Learning

arXiv | code & data | website | baselines

Table of Contents

About

We present lilGym, a new benchmark for language-conditioned reinforcement learning in visual environments. lilGym is based on 2,661 highly-compositional human-written natural language statements grounded in an interactive visual environment. We annotate all statements with executable Python programs representing their meaning to enable exact reward computation in every possible world state.

Each statement is paired with multiple start states and reward functions to form thousands of distinct Markov Decision Processes of varying difficulty.

We experiment with lilGym with different models and learning regimes. Our results and analysis show that while existing methods are able to achieve non-trivial performance, lilGym forms a challenging open problem.

Examples

TowerScratch (left), TowerFlipIt (right)

tower-scratch tower-flipit

ScatterScratch (left), ScatterFlipIt (right)

scatter-scratch scatter-flipit

Data

The data and details can be found in: lilgym/data/.

A description can be found in lilGym: Natural Language Visual Reasoning with Reinforcement Learning. The data is based on the Cornell Natural Language Visual Reasoning (NLVR) Corpus v1.0 (Suhr et al. 2017) corpus.

Codebase

Installation

Notes:

  • The codebase has been tested with Python 3.7/3.8, with PyTorch 1.12.1+cu102, CUDA 11.2
  • On-going work for compatibility with higher versions
  1. Create a conda environment
conda create -n lilgym python=3.7
conda activate lilgym

Install PyTorch:

pip install torch==1.12.1+cu102 torchvision==0.13.1+cu102 --extra-index-url https://download.pytorch.org/whl/cu102

Note:

  • For using conda with with 3.7 on Apple Silicone, you may check: link
  1. Clone the repo: git clone https://github.com/lil-lab/lilgym.git

  2. Install the dependencies

cd lilgym
pip install -r requirements.txt

Note: the environment is updated to be used with Gymnasium (formerly Gym).

To install the package from source:

cd lilgym
pip install .

Example

The environments follow standard Gym API.

Following is a short demo script:

import gymnasium as gym

env = gym.make("TowerScratch-v0", split="train", stop_forcing=False, disable_env_checker=True)

env.seed(1)
observation, info = env.reset()

for _ in range(100):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        observation, info = env.reset()

Note: disable_env_checker comes with Gymnasium (new Gym), and can be set to False if needed.

Configurations

There are four configurations: TowerScratch, TowerFlipIt, ScatterScratch and ScatterFlipIt. Examples:

env = gym.make("TowerFlipIt-v0", split="train", stop_forcing=False)

env = gym.make("ScatterScratch-v0", split="dev", stop_forcing=False)

env = gym.make("ScatterFlipIt-v0", split="test", stop_forcing=False)

Data splits

There are three data splits for each configuration: train, dev, and test.

Stop forcing

stop_forcing specifies whether to use the algorithm with stop forcing at training time. Inference is always done without stop forcing.

Data reading

There are two ways to load data:

  1. Using the argument split as above

  2. Using the argument data. An example:

import gym
from lilgym.data.utils import get_data

data = get_data('tower', 'scratch', 'train')
env = gym.make("TowerScratch-v0", data=data, stop_forcing=True)

More details about the environment can be found in: lilgym/envs/README.md.

The baselines with the training and inference code will also be soon released.

License

MIT

Citation

@inproceedings{wu-etal-2023-lilgym,
    title = "lil{G}ym: Natural Language Visual Reasoning with Reinforcement Learning",
    author = "Wu, Anne  and
      Brantley, Kiante  and
      Kojima, Noriyuki  and
      Artzi, Yoav",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year = "2023",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.512",
    pages = "9214--9234",
}

Ackowledegment

This research was supported by ARO W911NF21-1-0106, NSF under grant No. 1750499, a gift from Open Philanthropy, and NSF under grant No. 2127309 to the Computing Research Association for the CIFellows Project. Results presented in this paper were obtained using CloudBank, which is supported by the National Science Foundation under award No. 1925001. We thank Alane Suhr, Ge Gao, Justin Chiu, Woojeong Kim, Jack Morris, Jacob Sharf and the Cornell NLP Group for support, comments and helpful discussions.

Contact

Anne Wu ([email protected])

About

lilGym RL benchmark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages