Ensure safe_to_tensor moves tensors to the specified device. #831

ernestum · 2023-12-15T12:55:23Z

This PR fixes a bug in the safe_to_tensor utility: previously it did not move tensors to a new device according to the device kwarg which caused issues when there is more than one device available.
The bug went unnoticed for a long while since our circleCI runners do not have GPUs enabled.

Thanks to @tomtseng for drafting a fix for this in #828

codecov · 2023-12-15T13:02:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.76%. Comparing base (a8b079c) to head (5820cfe).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #831      +/-   ##
==========================================
+ Coverage   95.69%   95.76%   +0.07%     
==========================================
  Files         102      102              
  Lines        9645     9640       -5     
==========================================
+ Hits         9230     9232       +2     
+ Misses        415      408       -7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ernestum · 2024-01-22T16:56:50Z

I think the coverage warning in this one is spurious. Can you merge this @AdamGleave ?

AdamGleave · 2024-01-28T22:57:27Z

I think the coverage warning in this one is spurious. Can you merge this @AdamGleave ?

Looks like the issue is that test_sqil_performance_continuous is being skipped. This is going to continue to cause a problem in all subsequent PRs requiring manual override -- we try and enforce 100% test coverage to make sure nothing is being unintentionally not run. I think you can just add a # pragma: no cover to fix this. If that doesn't work LMK and I can try to troubleshoot.

dominikonysz · 2024-05-06T15:42:35Z

Hello, what's the state of this PR?

ChenJiangxi · 2025-01-04T08:31:37Z

In the latest version I downloaded, this problem is still not solved.

./ci/build_and_activate_venv.sh ~/venv python3.9 fails because python3.9 is not installed.

The error message: Warning, treated as error: ********************************************************************** File "algorithms/dagger.rst", line 45, in default Failed example: import tempfile import numpy as np import gymnasium as gym from stable_baselines3.common.evaluation import evaluate_policy from imitation.algorithms import bc from imitation.algorithms.dagger import SimpleDAggerTrainer from imitation.policies.serialize import load_policy from imitation.util.util import make_vec_env rng = np.random.default_rng(0) env = make_vec_env( "seals:seals/CartPole-v0", rng=rng, ) expert = load_policy( "ppo-huggingface", organization="HumanCompatibleAI", env_name="seals-CartPole-v0", venv=env, ) bc_trainer = bc.BC( observation_space=env.observation_space, action_space=env.action_space, rng=rng, ) with tempfile.TemporaryDirectory(prefix="dagger_example_") as tmpdir: print(tmpdir) dagger_trainer = SimpleDAggerTrainer( venv=env, scratch_dir=tmpdir, expert_policy=expert, bc_trainer=bc_trainer, rng=rng, ) dagger_trainer.train(8_000) reward, _ = evaluate_policy(dagger_trainer.policy, env, 10) print("Reward:", reward) Exception raised: Traceback (most recent call last): File "/usr/lib/python3.8/doctest.py", line 1336, in __run exec(compile(example.source, filename, "single", File "<doctest default[0]>", line 38, in <module> dagger_trainer.train(8_000) File "/venv/lib/python3.8/site-packages/imitation/algorithms/dagger.py", line 669, in train trajectories = rollout.generate_trajectories( File "/venv/lib/python3.8/site-packages/imitation/data/rollout.py", line 447, in generate_trajectories obs, rews, dones, infos = venv.step(acts) File "/venv/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 206, in step return self.step_wait() File "/venv/lib/python3.8/site-packages/imitation/algorithms/dagger.py", line 285, in step_wait _save_dagger_demo(traj, traj_index, self.save_dir, self.rng) File "/venv/lib/python3.8/site-packages/imitation/algorithms/dagger.py", line 147, in _save_dagger_demo serialize.save(npz_path, [trajectory]) File "/venv/lib/python3.8/site-packages/imitation/data/serialize.py", line 23, in save huggingface_utils.trajectories_to_dataset(trajectories).save_to_disk(p) File "/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1470, in save_to_disk fs, _ = url_to_fs(dataset_path, **(storage_options or {})) File "/venv/lib/python3.8/site-packages/fsspec/core.py", line 383, in url_to_fs chain = _un_chain(url, kwargs) File "/venv/lib/python3.8/site-packages/fsspec/core.py", line 323, in _un_chain if "::" in path TypeError: argument of type 'PosixPath' is not iterable

This reverts commit b765aff.

tomtseng · 2025-01-06T20:31:32Z

I can take a look at merging this later this week, but there a bunch of new CI failures that I would need to resolve due to various versions changing in CI or packages over the past year that this PR has sat dormant.

In general no one is actively maintaining this repository now so if there are issues you may need to fork the repo and handle it yourself.

tomtseng · 2025-01-07T05:55:48Z

Made a bunch of misc changes to fix CI, it would've been better for me to make them a separate PR but I'm not sure I would've been able to get a code owner to review the PR. I'll just list the changes in this comment.

doctest had an error:

Exception raised:
    Traceback (most recent call last):
      File "/usr/lib/python3.8/doctest.py", line 1336, in __run
        exec(compile(example.source, filename, "single",
      File "<doctest default[0]>", line 38, in <module>
        dagger_trainer.train(8_000)
      File "/venv/lib/python3.8/site-packages/imitation/algorithms/dagger.py", line 669, in train
        trajectories = rollout.generate_trajectories(
      File "/venv/lib/python3.8/site-packages/imitation/data/rollout.py", line 447, in generate_trajectories
        obs, rews, dones, infos = venv.step(acts)
      File "/venv/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 206, in step
        return self.step_wait()
      File "/venv/lib/python3.8/site-packages/imitation/algorithms/dagger.py", line 285, in step_wait
        _save_dagger_demo(traj, traj_index, self.save_dir, self.rng)
      File "/venv/lib/python3.8/site-packages/imitation/algorithms/dagger.py", line 147, in _save_dagger_demo
        serialize.save(npz_path, [trajectory])
      File "/venv/lib/python3.8/site-packages/imitation/data/serialize.py", line 23, in save
        huggingface_utils.trajectories_to_dataset(trajectories).save_to_disk(p)
      File "/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1470, in save_to_disk
        fs, _ = url_to_fs(dataset_path, **(storage_options or {}))
      File "/venv/lib/python3.8/site-packages/fsspec/core.py", line 383, in url_to_fs
        chain = _un_chain(url, kwargs)
      File "/venv/lib/python3.8/site-packages/fsspec/core.py", line 323, in _un_chain
        if "::" in path
    TypeError: argument of type 'PosixPath' is not iterable

fixed in 30f9048 by casting the PosixPath to a str.

unit-test-windows didn't have pytest installed, this was because the install dependencies CI step silently failed (silence fixed in 637bed5). The step failed because an indirect dependency pywinpty was failing to build with

    = note: ld: cannot find -lktmw32: No such file or directory\x90\x8d
  
  
  error: could not compile `winpty-rs` (build script) due to 1 previous error
  warning: build failed, waiting for other jobs to finish...
  💥 maturin failed
    Caused by: Failed to build a native library through cargo
    Caused by: Cargo build finished with "exit code: 101": `"cargo" "rustc" "--message-format" "json-render-diagnostics" "--manifest-path" "C:\\Users\\circleci\\AppData\\Local\\Temp\\pip-install-j4bmps1w\\pywinpty_d47bf837a6ea40b2a74b0e02c806c1bf\\Cargo.toml" "--release" "--lib"`
  Error: command ['maturin', 'pep517', 'build-wheel', '-i', 'C:\\Users\\circleci\\project\\venv\\Scripts\\python.exe', '--compatibility', 'off'] returned non-zero exit status 1
  [end of output]

I couldn't figure out how to properly fix this but I happened to notice that this didn't happen in newer Python versions so I just changed the windows CI tests to use python 3.9 instead of 3.8.

unit-test-macos didn't have commands virtualenv and python3.9 so I installed them.

mypy had some new errors:

/venv/lib/python3.8/site-packages/torch/_dynamo/mutation_guard.py:2: error: disable_error_code: Invalid error code(s): method-assign  [misc]
/venv/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py:2: error: disable_error_code: Invalid error code(s): method-assign  [misc]
/venv/lib/python3.8/site-packages/torch/_dynamo/debug_utils.py:2: error: disable_error_code: Invalid error code(s): method-assign  [misc]
src/imitation/policies/base.py:38: error: Incompatible types in assignment (expression has type "ndarray[Any, Any]", variable has type "DictObs")  [assignment]
src/imitation/algorithms/adversarial/common.py:219: error: Self? has no attribute "parameters"  [attr-defined]
src/imitation/scripts/ingredients/demonstrations.py:39: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
src/imitation/algorithms/adversarial/gail.py:158: error: Self? not callable  [misc]
src/imitation/algorithms/adversarial/airl.py:118: error: Self? not callable  [misc]
src/imitation/algorithms/adversarial/airl.py:131: error: Self? has no attribute "base"  [attr-defined]

I ignored the torch ones in f64b3fc and fixed the remaining ones in 5820cfe.

ernestum requested a review from tomtseng December 15, 2023 12:55

ernestum mentioned this pull request Dec 15, 2023

Make the default device for policies "cpu" #828

Closed

This was linked to issues Dec 15, 2023

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #820

Closed

Run time Error when run quickstart.py #825

Closed

tomtseng approved these changes Dec 15, 2023

View reviewed changes

ernestum force-pushed the fix_safe_to_tensor_bug branch from 563bccf to 7474c6c Compare December 18, 2023 16:56

ernestum mentioned this pull request Dec 28, 2023

Remove FloatReward #829

Merged

ernestum added 5 commits January 17, 2024 14:43

Ensure safe_to_tensor moves tensors to the specified device.

c40dc98

Add coverage for the warning case of test_safe_to_numpy.

f3e8189

Ignore coverage for clear_screen on windows.

4b7b263

Increase test deadline for sliced_info_dict_access.

9239ca7

Increase test deadline for sliced_access

7e618b0

ernestum force-pushed the fix_safe_to_tensor_bug branch from 02d1c17 to 7e618b0 Compare January 17, 2024 13:43

Add no-cover pragma to the flaky and disabled SQIL performance test.

2cfd7f1

Bpoole908 mentioned this pull request Apr 10, 2024

Device conflict when training BC #843

Open

Ivan-267 mentioned this pull request Jul 11, 2024

Set sb3_imitation.py to CPU to fix error with CUDA edbeeching/godot_rl_agents#196

Merged

tomtseng added 2 commits January 6, 2025 10:25

Empty commit to retrigger CI

692e3a8

ci: build_and_activate_venv needs virtualenv on macOS

cc61abc

tomtseng force-pushed the fix_safe_to_tensor_bug branch from 9945450 to cc61abc Compare January 6, 2025 18:37

tomtseng added 5 commits January 6, 2025 10:45

ci: macOS needs to install python3.9

58c28fc

./ci/build_and_activate_venv.sh ~/venv python3.9 fails because python3.9 is not installed.

ci: Windows needs pytest to be installed

b765aff

Revert "ci: Windows needs pytest to be installed"

3c1aa6a

This reverts commit b765aff.

ci: Windows isn't finding pytest, try refreshing its cache

ca3f001

tomtseng force-pushed the fix_safe_to_tensor_bug branch from be007c6 to ca3f001 Compare January 6, 2025 19:20

ci: Windows is throwing away the exit code of venv build script

637bed5

ci: Fix windows install error by switching to Python 3.9

49b3e70

tomtseng force-pushed the fix_safe_to_tensor_bug branch from b099761 to 49b3e70 Compare January 7, 2025 03:35

mypy: Ignore pytorch type errors

f64b3fc

tomtseng force-pushed the fix_safe_to_tensor_bug branch from d79080b to f64b3fc Compare January 7, 2025 04:16

Fix mypy errors

5820cfe

tomtseng merged commit e5ef188 into master Jan 7, 2025
15 checks passed

tomtseng deleted the fix_safe_to_tensor_bug branch January 7, 2025 05:56

This was referenced Jan 7, 2025

Fix PosixPath being passed to HuggingFace save_to_disk (which does not accept it) #855

Open

TypeError: argument of type 'PosixPath' is not iterable #852

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure safe_to_tensor moves tensors to the specified device. #831

Ensure safe_to_tensor moves tensors to the specified device. #831

ernestum commented Dec 15, 2023 •

edited

Loading

codecov bot commented Dec 15, 2023 •

edited

Loading

ernestum commented Jan 22, 2024

AdamGleave commented Jan 28, 2024

dominikonysz commented May 6, 2024

ChenJiangxi commented Jan 4, 2025

tomtseng commented Jan 6, 2025 •

edited

Loading

tomtseng commented Jan 7, 2025

Ensure safe_to_tensor moves tensors to the specified device. #831

Ensure safe_to_tensor moves tensors to the specified device. #831

Conversation

ernestum commented Dec 15, 2023 • edited Loading

codecov bot commented Dec 15, 2023 • edited Loading

Codecov Report

ernestum commented Jan 22, 2024

AdamGleave commented Jan 28, 2024

dominikonysz commented May 6, 2024

ChenJiangxi commented Jan 4, 2025

tomtseng commented Jan 6, 2025 • edited Loading

tomtseng commented Jan 7, 2025

ernestum commented Dec 15, 2023 •

edited

Loading

codecov bot commented Dec 15, 2023 •

edited

Loading

tomtseng commented Jan 6, 2025 •

edited

Loading