A question on the info loss #9

pengzhenghao · 2023-02-26T09:25:49Z

In this file: https://github.com/ermongroup/MetaIRL/blob/master/inverse_rl/models/info_airl_state_train.py#L141

We learn that the info loss is the negative log likelihood of q_\psi( m | expert trajectory)


# Get "m" distribution by feeding expert trajectory
context_dist_info_vars = self.context_encoder.dist_info_sym(expert_traj_var)

# Sample a "m" from the distribution
context_mean_var = context_dist_info_vars["mean"]
context_log_std_var = context_dist_info_vars["log_std"]
eps = tf.random.normal(shape=tf.shape(context_mean_var))
reparam_latent = eps * tf.exp(context_log_std_var) + context_mean_var

# Compute the log probability of the sampled "m" in its own distribution
log_q_m_tau = tf.reshape(self.context_encoder.distribution.log_likelihood_sym(reparam_latent, context_dist_info_vars) ...

info_loss = - tf.reduce_mean(log_q_m_tau ...

However, this is different from the equation shown in the original paper. In the below paragraph, in my understanding, you do the following things:

Use expert trajectory to get a "m distribution": q_\psi( \cdot | expert trajectory)
Use the "m distribution" to sample a context vector "m"
Feed the "m" to the policy and get agent trajectory "tau_agent"
Feed the agent trajectory to the posterior distribution to get an estimated "m'": q_\psi(m' | agent trajectory)
Compute the log probability of the estimated "m'" in the original "m distribution" estimated from expert trajectory.

However, in the code above, you are simply computing the log probability of a sampled "m" on its original distribution q_\psi( \cdot | expert trajectory) estimated from the expert trajectory. That is, you skip the step 3 and 4.

I want to know if this is intended and if my understanding is correct.

Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question on the info loss #9

A question on the info loss #9

pengzhenghao commented Feb 26, 2023

A question on the info loss #9

A question on the info loss #9

Comments

pengzhenghao commented Feb 26, 2023