Eventually this will be a repository containing more simplified interfaces for expressing certain classes of POMDPs. The goal is for POMDPs.jl to act as a low level interface (like MathProgBase) and for the interface(s) defined here to act as concise and convenient high-level interface (like JuMP or Convex).
Another package that should be referenced when designing this is PLite.jl.
Contributions of new interfaces for defining specific classes of problems are welcome!
For now, there are just a few sketches of interfaces outlined below:
Can represent any problem with discrete actions, observations, and states using the POMDPs.jl explicit interface. This would just be a tight wrapper over the POMDPs.jl interface and would look very similar to a pure POMDPs.jl implementation. Advantages over direct POMDPs.jl are that it's slightly more compact and you don't have to understand object-oriented programming.
The Tiger problem would look like this:
pomdp = @discretePOMDP begin
@states [:tiger_l, :tiger_r]
@actions [:open_l, :open_r, :listen]
@observations [:tiger_l, :tiger_r]
@transition function (s, a)
if a == :listen
return [s]=>[1.0]
else
return [TIGER_L, TIGER_R]=>[0.5, 0.5] # reset
end
end
@reward Dict((:tiger_l, :open_l) => -100.,
(:tiger_r, :open_r) => -100.,
(:tiger_l, :open_r) => 10.,
(:tiger_r, :open_l) => 10.
)
@default_reward -1.0
@observation function (a, sp)
if a == :listen
if sp == :tiger_l
return [:tiger_l, :tiger_r]=>[0.85, 0.15]
else
return [:tiger_r, :tiger_l]=>[0.85, 0.15]
end
else
return [:tiger_l, :tiger_r]=>[0.5, 0.5]
end
end
@initial [:tiger_l, :tiger_r]=>[0.5, 0.5]
@discount 0.95
end
Note, this could also be done without any macros as a constructor with keyword arguments. Perhaps that would be easier to understand?
Another common problem is one where the dynamics are given by a function. The crying baby problem would look something like this:
pomdp = @generativePOMDP begin
@initial rng -> rand(rng) > 0.5
@dynamics function (s, a, rng)
if s # hungry
sp = true
else # not hungry
sp = rand(rng) < 0.1 ? true : false
end
if sp # hungry
o = rand(rng) < 0.8 ? true : false
else # not hungry
o = rand(rng) < 0.1 ? true : false
end
r = (s ? -10.0 : 0.0) + (a ? -5.0 : 0.0)
return s, o, r
end
@discount 0.95
end
Again, you could do this without macros, and just use keyword arguments.
It might also be more clear what is going on if we declared variables with names as shown in the example below.
This would be tougher to compile though, and it's not clear what the easiest way to express distributions or reward would be.
Ideas welcome!
mdp = @MDP begin
xmax = 10
ymax = 10
@states begin
x in 1:10
y in 1:10
end
@actions begin
dir in [:up, :down, :left, :right]
end
@reward rdict = Dict(
#XXX no idea how to define this in terms of x and y
)
default_reward = 0.0
@transition #XXX what is the most concise way to define the transition distribution??
terminal = vals(reward)
discount = 0.95
initial
end