Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question on the info loss #9

Open
pengzhenghao opened this issue Feb 26, 2023 · 0 comments
Open

A question on the info loss #9

pengzhenghao opened this issue Feb 26, 2023 · 0 comments

Comments

@pengzhenghao
Copy link

In this file: https://github.com/ermongroup/MetaIRL/blob/master/inverse_rl/models/info_airl_state_train.py#L141

We learn that the info loss is the negative log likelihood of q_\psi( m | expert trajectory)


# Get "m" distribution by feeding expert trajectory
context_dist_info_vars = self.context_encoder.dist_info_sym(expert_traj_var)

# Sample a "m" from the distribution
context_mean_var = context_dist_info_vars["mean"]
context_log_std_var = context_dist_info_vars["log_std"]
eps = tf.random.normal(shape=tf.shape(context_mean_var))
reparam_latent = eps * tf.exp(context_log_std_var) + context_mean_var

# Compute the log probability of the sampled "m" in its own distribution
log_q_m_tau = tf.reshape(self.context_encoder.distribution.log_likelihood_sym(reparam_latent, context_dist_info_vars) ...

info_loss = - tf.reduce_mean(log_q_m_tau ...

However, this is different from the equation shown in the original paper. In the below paragraph, in my understanding, you do the following things:

  1. Use expert trajectory to get a "m distribution": q_\psi( \cdot | expert trajectory)
  2. Use the "m distribution" to sample a context vector "m"
  3. Feed the "m" to the policy and get agent trajectory "tau_agent"
  4. Feed the agent trajectory to the posterior distribution to get an estimated "m'": q_\psi(m' | agent trajectory)
  5. Compute the log probability of the estimated "m'" in the original "m distribution" estimated from expert trajectory.

image

However, in the code above, you are simply computing the log probability of a sampled "m" on its original distribution q_\psi( \cdot | expert trajectory) estimated from the expert trajectory. That is, you skip the step 3 and 4.

I want to know if this is intended and if my understanding is correct.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant