Random policy example without TF1 dependency #8

aqibsaeed · 2021-09-05T21:25:36Z

Is there any example of a random policy to collect data without any dependency on Tensorflow/Keras etc.? As all these frameworks are sort of outdated and there is very little support for them.

I am looking for a way to simply instantiate an environment and collect data with a random policy and afterwards do RL in TF2. Could you please direct me to such an example ? The examples I see under runt_time and test_suites all require TF1/Keras.

Thanks in advance.

The text was updated successfully, but these errors were encountered:

srivatsankrishnan · 2021-09-07T16:11:38Z

What do you mean by a random policy to collect data? Also, what kind of data do you want to collect? For instance, the easiest way to generate random policy is to generate trajectories or change the control API parameters (there are at least 3-4 different APIs to control parameters like position, velocity, pitch, roll, yaw, etc. This doesn't require you to use TF/Keras.
https://github.com/harvard-edge/airlearning-rl/blob/master/test_suites/move.py

The examples you see in the runtime and test_suites (except the move.py) are intended to learn a meaningful policy using RL instead of random policy. Also, if you want to use TF2, you need to change the imports and the agent definitions (and the correct version of stable baselines). So most of the framework should ideally abstract that complexity for you.

aqibsaeed · 2021-09-07T16:15:04Z

Thanks @srivatsankrishnan

What do you mean by a random policy to collect data? Also, what kind of data do you want to collect?
I want to collect state, action, and rewards pair via interaction with the environment, e.g., by taking random action (that's what I meant by random policy). I don't see state, action and reward pairs in move.py.

aqibsaeed · 2021-09-10T19:27:52Z

@srivatsankrishnan any ideas on if it is possible to collect rollouts with a random policy?

srivatsankrishnan · 2021-09-11T15:16:38Z

Definitely possible! You need to set it up something like this:

def setup(difficulty_level='default', env_name = "AirSimEnv-v42"):
    env = gym.make(env_name)
    env.init_again(eval("settings."+difficulty_level+"_range_dic"))
    
    return env

def foo_random (env, action):
    obs, rewards, dones, info = env.step(action)

You can write your own function to randomly generate actions and pass it to the foo_random method. Note, this is just illustration of how you can do it. You might need to include the right includes and make sure it compiles. The step function should return you the env, rewards, dones (status), and info.

aqibsaeed · 2021-09-14T07:00:45Z

Thanks @srivatsankrishnan. I have the following script to test out the idea:

import os
import numpy as np
import time
import gym
import gym_airsim
os.sys.path.insert(0, os.path.abspath('./settings_folder'))
import settings

def setup(difficulty_level='default', env_name = "AirSimEnv-v42"):
    env = gym.make(env_name)
    env.init_again(eval("settings."+difficulty_level+"_range_dic"))
    
    return env

def foo_random (env, action):
    obs, rewards, dones, info = env.step(action)
    print(obs, rewards, dones, info)


env = setup()
print(env.action_space.sample())

time.sleep(10)
print("============================== environment ==========================")


for i in range(1000):
    print(i)
    foo_random(env, [1.5+np.random.uniform(), 2.5+np.random.uniform()])

But I notice after establishing connection with AirSim/Unreal it keeps on printing CONNECTED (and some other text) and does not execute foo_random. Am I missing something here?

srivatsankrishnan · 2021-09-14T21:15:20Z

Hi,
Instead of calling foo_random, can you directly call env.step(actions). in side the for loop? Also please post what stdout along with the "CONNECTED" prompt. Not sure if that is the issue here.

aqibsaeed · 2021-09-14T21:34:21Z

No it does not really work.

[-3.7095962 -4.342524 ]
============================== environment ==========================
0
ENter Step0
------------------------- step failed ----------------  with 'MultirotorState' object has no attribute 'trip_stats'  error
SUCCESS: The process "UE4Editor.exe" with PID 8104 has been terminated.
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
connection not established yet
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
connection not established yet
Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)

Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)

Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)

Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)

Script opens up a new window but the drone stays stationary.

aqibsaeed · 2021-09-22T09:56:33Z

@srivatsankrishnan I would really appreciate any ideas on how to resolve this issue.

qinglong0276 · 2023-12-15T07:43:59Z

Have these issues been resolved now, @aqibsaeed?

aqibsaeed · 2023-12-15T20:48:13Z

nope!

qinglong0276 · 2023-12-21T09:15:43Z

@aqibsaeed I haven't even reached the steps in the picture you posted, and I've been forced to stop due to other issues! I don't know how you got to this point. Envious!

aqibsaeed changed the title ~~Random policy example~~ Random policy example without TF1 dependency Sep 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random policy example without TF1 dependency #8

Random policy example without TF1 dependency #8

aqibsaeed commented Sep 5, 2021

srivatsankrishnan commented Sep 7, 2021

aqibsaeed commented Sep 7, 2021

aqibsaeed commented Sep 10, 2021

srivatsankrishnan commented Sep 11, 2021

aqibsaeed commented Sep 14, 2021

srivatsankrishnan commented Sep 14, 2021

aqibsaeed commented Sep 14, 2021

aqibsaeed commented Sep 22, 2021

qinglong0276 commented Dec 15, 2023

aqibsaeed commented Dec 15, 2023

qinglong0276 commented Dec 21, 2023

Random policy example without TF1 dependency #8

Random policy example without TF1 dependency #8

Comments

aqibsaeed commented Sep 5, 2021

srivatsankrishnan commented Sep 7, 2021

aqibsaeed commented Sep 7, 2021

aqibsaeed commented Sep 10, 2021

srivatsankrishnan commented Sep 11, 2021

aqibsaeed commented Sep 14, 2021

srivatsankrishnan commented Sep 14, 2021

aqibsaeed commented Sep 14, 2021

aqibsaeed commented Sep 22, 2021

qinglong0276 commented Dec 15, 2023

aqibsaeed commented Dec 15, 2023

qinglong0276 commented Dec 21, 2023