You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We learn that the info loss is the negative log likelihood of q_\psi( m | expert trajectory)
# Get "m" distribution by feeding expert trajectory
context_dist_info_vars = self.context_encoder.dist_info_sym(expert_traj_var)
# Sample a "m" from the distribution
context_mean_var = context_dist_info_vars["mean"]
context_log_std_var = context_dist_info_vars["log_std"]
eps = tf.random.normal(shape=tf.shape(context_mean_var))
reparam_latent = eps * tf.exp(context_log_std_var) + context_mean_var
# Compute the log probability of the sampled "m" in its own distribution
log_q_m_tau = tf.reshape(self.context_encoder.distribution.log_likelihood_sym(reparam_latent, context_dist_info_vars) ...
info_loss = - tf.reduce_mean(log_q_m_tau ...
However, this is different from the equation shown in the original paper. In the below paragraph, in my understanding, you do the following things:
Use expert trajectory to get a "m distribution": q_\psi( \cdot | expert trajectory)
Use the "m distribution" to sample a context vector "m"
Feed the "m" to the policy and get agent trajectory "tau_agent"
Feed the agent trajectory to the posterior distribution to get an estimated "m'": q_\psi(m' | agent trajectory)
Compute the log probability of the estimated "m'" in the original "m distribution" estimated from expert trajectory.
However, in the code above, you are simply computing the log probability of a sampled "m" on its original distribution q_\psi( \cdot | expert trajectory) estimated from the expert trajectory. That is, you skip the step 3 and 4.
I want to know if this is intended and if my understanding is correct.
Thank you!
The text was updated successfully, but these errors were encountered:
In this file: https://github.com/ermongroup/MetaIRL/blob/master/inverse_rl/models/info_airl_state_train.py#L141
We learn that the info loss is the negative log likelihood of
q_\psi( m | expert trajectory)
However, this is different from the equation shown in the original paper. In the below paragraph, in my understanding, you do the following things:
However, in the code above, you are simply computing the log probability of a sampled "m" on its original distribution q_\psi( \cdot | expert trajectory) estimated from the expert trajectory. That is, you skip the step 3 and 4.
I want to know if this is intended and if my understanding is correct.
Thank you!
The text was updated successfully, but these errors were encountered: