Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random policy example without TF1 dependency #8

Open
aqibsaeed opened this issue Sep 5, 2021 · 11 comments
Open

Random policy example without TF1 dependency #8

aqibsaeed opened this issue Sep 5, 2021 · 11 comments

Comments

@aqibsaeed
Copy link

Hi @srivatsankrishnan,

Is there any example of a random policy to collect data without any dependency on Tensorflow/Keras etc.? As all these frameworks are sort of outdated and there is very little support for them.

I am looking for a way to simply instantiate an environment and collect data with a random policy and afterwards do RL in TF2. Could you please direct me to such an example ? The examples I see under runt_time and test_suites all require TF1/Keras.

Thanks in advance.

@aqibsaeed aqibsaeed changed the title Random policy example Random policy example without TF1 dependency Sep 5, 2021
@srivatsankrishnan
Copy link
Member

What do you mean by a random policy to collect data? Also, what kind of data do you want to collect? For instance, the easiest way to generate random policy is to generate trajectories or change the control API parameters (there are at least 3-4 different APIs to control parameters like position, velocity, pitch, roll, yaw, etc. This doesn't require you to use TF/Keras.
https://github.com/harvard-edge/airlearning-rl/blob/master/test_suites/move.py

The examples you see in the runtime and test_suites (except the move.py) are intended to learn a meaningful policy using RL instead of random policy. Also, if you want to use TF2, you need to change the imports and the agent definitions (and the correct version of stable baselines). So most of the framework should ideally abstract that complexity for you.

@aqibsaeed
Copy link
Author

Thanks @srivatsankrishnan

What do you mean by a random policy to collect data? Also, what kind of data do you want to collect?
I want to collect state, action, and rewards pair via interaction with the environment, e.g., by taking random action (that's what I meant by random policy). I don't see state, action and reward pairs in move.py.

@aqibsaeed
Copy link
Author

@srivatsankrishnan any ideas on if it is possible to collect rollouts with a random policy?

@srivatsankrishnan
Copy link
Member

Definitely possible! You need to set it up something like this:

def setup(difficulty_level='default', env_name = "AirSimEnv-v42"):
    env = gym.make(env_name)
    env.init_again(eval("settings."+difficulty_level+"_range_dic"))
    
    return env
def foo_random (env, action):
    obs, rewards, dones, info = env.step(action)

You can write your own function to randomly generate actions and pass it to the foo_random method. Note, this is just illustration of how you can do it. You might need to include the right includes and make sure it compiles. The step function should return you the env, rewards, dones (status), and info.

@aqibsaeed
Copy link
Author

Thanks @srivatsankrishnan. I have the following script to test out the idea:

import os
import numpy as np
import time
import gym
import gym_airsim
os.sys.path.insert(0, os.path.abspath('./settings_folder'))
import settings

def setup(difficulty_level='default', env_name = "AirSimEnv-v42"):
    env = gym.make(env_name)
    env.init_again(eval("settings."+difficulty_level+"_range_dic"))
    
    return env

def foo_random (env, action):
    obs, rewards, dones, info = env.step(action)
    print(obs, rewards, dones, info)


env = setup()
print(env.action_space.sample())

time.sleep(10)
print("============================== environment ==========================")


for i in range(1000):
    print(i)
    foo_random(env, [1.5+np.random.uniform(), 2.5+np.random.uniform()])

But I notice after establishing connection with AirSim/Unreal it keeps on printing CONNECTED (and some other text) and does not execute foo_random. Am I missing something here?

@srivatsankrishnan
Copy link
Member

Hi,
Instead of calling foo_random, can you directly call env.step(actions). in side the for loop? Also please post what stdout along with the "CONNECTED" prompt. Not sure if that is the issue here.

@aqibsaeed
Copy link
Author

No it does not really work.

[-3.7095962 -4.342524 ]
============================== environment ==========================
0
ENter Step0
------------------------- step failed ----------------  with 'MultirotorState' object has no attribute 'trip_stats'  error
SUCCESS: The process "UE4Editor.exe" with PID 8104 has been terminated.
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
connection not established yet
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
connection not established yet
Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)

Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)

Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)

Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)

Script opens up a new window but the drone stays stationary.

image

@aqibsaeed
Copy link
Author

@srivatsankrishnan I would really appreciate any ideas on how to resolve this issue.

@qinglong0276
Copy link

Have these issues been resolved now, @aqibsaeed?

@aqibsaeed
Copy link
Author

nope!

@qinglong0276
Copy link

@aqibsaeed I haven't even reached the steps in the picture you posted, and I've been forced to stop due to other issues! I don't know how you got to this point. Envious!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants