-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need help in defining and solving a POMDP: Varying Mountainhike #43
Comments
Hey Sai, I can help you with this. Let's start with a few questions I have.
Let me know whenever you get a chance! Edit: Also, I did some debugging, and it seems like the POMCPOW tree is reaching the terminal state during planning. Moreover, the simulation stops before executing max_steps because the agent reaches its goal which is a terminal state in your implementation. |
# import Pkg
# Pkg.add("POMDPs")
# Pkg.add("POMCPOW")
# Pkg.add("POMDPModels")
# Pkg.add("POMDPTools")
# Pkg.add("Distributions")
import POMDPs
import POMDPModels
import Distributions
using POMDPs
using POMCPOW
using POMDPModels
using POMDPTools
using POMDPs
using POMDPTools
using LinearAlgebra
using Distributions
using Random
mutable struct MordorHikePOMDP <: POMDP{Vector{Float64}, Int64, Float64}
# Constants
translate_step::Float64
translate_std::Float64
obs_std::Float64
discount::Float64
# Map bounds and positions
map_lower_bound::Vector{Float64}
map_upper_bound::Vector{Float64}
fixed_start_pos::Vector{Float64}
goal_position::Vector{Float64}
# Mountain parameters
slope::Vector{Float64}
mvn_1::MvNormal
mvn_2::MvNormal
mvn_3::MvNormal
# Constructor
function MordorHikePOMDP()
new(
0.1, # translate_step
0.05, # translate_std
0.1, # obs_std
0.99, # discount
[-1.0, -1.0], # map_lower_bound
[1.0, 1.0], # map_upper_bound
[-0.8, -0.8], # fixed_start_pos
[0.8, 0.8], # goal_position
[0.2, 0.2], # slope
MvNormal([0.0, 0.0], [0.005 0.0; 0.0 1.0]), # mvn_1
MvNormal([0.0, -0.8], [1.0 0.0; 0.0 0.01]), # mvn_2
MvNormal([0.0, 0.8], [1.0 0.0; 0.0 0.01]) # mvn_3
)
end
end
# State: [x, y, θ]
# Action: 1=north, 2=south, 3=east, 4=west
# Observation: [altitude]
function POMDPs.actions(pomdp::MordorHikePOMDP)
return [1, 2, 3, 4]
end
function POMDPs.initialstate(pomdp::MordorHikePOMDP)
# Random rotation for medium difficulty
return ImplicitDistribution(rng -> ([pomdp.fixed_start_pos[1], pomdp.fixed_start_pos[2], rand(rng, [0.0, π/2, π, 3π/2])]))
end
function POMDPs.reward(pomdp::MordorHikePOMDP, s::Vector{Float64}, a::Int64, sp::Vector{Float64})
if isterminal(pomdp, s)
0.0
else
return calculate_altitude(pomdp, sp[1:2])
end
end
POMDPs.discount(pomdp::MordorHikePOMDP) = pomdp.discount
function POMDPs.isterminal(pomdp::MordorHikePOMDP, s::Vector{Float64})
return norm(s[1:2] - pomdp.goal_position) <= 2 * pomdp.translate_step
end
function POMDPs.transition(pomdp::MordorHikePOMDP, s::Vector{Float64}, a::Int64)
return ImplicitDistribution(function(rng)
theta = s[3]
# Create movement vector based on theta
forward_vector = [cos(theta), sin(theta)]
lateral_vector = [-sin(theta), cos(theta)]
# Map actions to movement
movement = if a == 1 # North
forward_vector
elseif a == 2 # South
-forward_vector
elseif a == 3 # East
lateral_vector
else # West
-lateral_vector
end
next_pos = s[1:2] + movement * pomdp.translate_step
# Add noise to position
next_pos += rand(rng, Normal(0, pomdp.translate_std), 2)
# Clip to bounds
next_pos = clamp.(next_pos, pomdp.map_lower_bound, pomdp.map_upper_bound)
# Keep the same theta
return [next_pos[1], next_pos[2], theta]
end)
end
function POMDPs.observation(pomdp::MordorHikePOMDP, a::Int64, sp::Vector{Float64})
altitude = calculate_altitude(pomdp, sp[1:2])
return Normal(altitude, pomdp.obs_std)
end
function POMDPs.observation(pomdp::MordorHikePOMDP, s::Vector{Float64}, a::Int64, sp::Vector{Float64})
return observation(pomdp, a, sp)
end
function calculate_altitude(pomdp::MordorHikePOMDP, pos::Vector{Float64})
# Convert pos to a 2-element vector to ensure correct dimensionality
pos_2d = pos[1:2] # Take only x,y coordinates if pos is longer
mountains = [
pdf(pomdp.mvn_1, pos_2d),
pdf(pomdp.mvn_2, pos_2d),
pdf(pomdp.mvn_3, pos_2d)
]
altitude = maximum(mountains)
return -exp(-altitude) + dot(pos_2d, pomdp.slope) - 0.02
end
pomdp = MordorHikePOMDP()
# Use POMCPOW solver (as in your first code block)
using POMCPOW
solver = POMCPOWSolver(criterion=MaxUCB(20.0))
planner = solve(solver, pomdp)
# Simulate
hr = HistoryRecorder(max_steps=200,)
hist = simulate(hr, pomdp, planner)
for (s, b, a, r, sp, o) in hist
@show s, a, r, sp
println(norm(s[1:2] - pomdp.goal_position))
end
rhist = simulate(hr, pomdp, RandomPolicy(pomdp))
println("""
Cumulative Discounted Reward (for 1 simulation)
Random: $(discounted_reward(rhist))
POMCPOW: $(discounted_reward(hist))
""")
When I ran the above code and checked the trajectory (hist), the final state did not satisfy the is_terminal condition of being near [0.8, 0.8]! Do you get a different result? If I try |
Ok, sorry for too many updates. I reran the solver on the updated code. It solves the task sometimes and sometimes it does not. I guess a non-random rollout policy would help here? Can you help me define it? I don't understand the API that well. |
Yeah, hopefully, a better rollout policy will increase the number of times it solves the task. This would be the general layout for the rollout policy code.
Let me know how well it performs after using a better rollout. |
I am working on a POMDP called VaryingMountainHike (which I have redubbed as MordorHike 😉 ). It was introduced here. It is basically a 2d environment with mountains defined by 3 gaussians. The agent has to get from bottom left to top right. The rewards are always negative, and it is least negative on top of the mountains.
I am totally new to julia and pomdp planners. So bear with me if I had made some bugs in the implementation.
I need help on knowing what are the reasonable hyperparameters for the solver for this problem. When I used the default tree queries the rollout terminated before max steps without reaching the terminal state.
The text was updated successfully, but these errors were encountered: