You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm sorry if I understand the paper or the code in a wrong way, but according to my current understanding, the conditional probability in Equation 15 is the prob of Sampled sequence, other than the prob of Target sequence, in other words, there it's supposed to use the indices of the words in sampled sequence, other than the indices of the words in target sequence to 'gather' the probs?
The text was updated successfully, but these errors were encountered:
A follow-up question, again I apologize for my possible misunderstanding.
So far as I see,
the input for the Loss_ml and Loss_rl seems to be different in the original paper?
Since the loss_ml part is basically the same with the traditional teacher forcing way, it uses ground truth as input for next timestep, which is also how the current implementation does.
However, in the Reinforcement Learning part, considering the paper is trying to address the exposure bias which 'come from the fact that the network has knowledge of the ground truth sequence up to the next token' (at the bottom of page 4), and the baseline y^ is obtained essentially by greedy search(at the top of page 5), i feel in the RL part, ground truth should not be given in the training mode, i.e. the decode input should come from last prediction other than the batch.
Based on the above (input of RL come from the last timestep), i'm thinking maybe the procedure of generating y^ and ys should also be separated (now they are both depend on the ground truth input):
since at timestep t, y^(t) is obtained by maximizing p( y^(t)| y^(t-1), ...y^(1) ) , while ys(t) is sampled from p(ys(t)| ys(t-1), ...ys(1)). As u see, these 2 distributions are different, so i'm thinking maybe we are supposed to have 2 generative procedures here: one always takes y^(t-1) as next timestep input and generates y^(t), another always takes ys(t-1) as input and generates ys(t). in this way, we ultimately receive 2 sequences y^ and ys and their corresponding ROUGE value.
Please correct me if you think the other way. I'm still on my way about understanding RL in Summarization. Thanks!!
RLSeq2Seq/src/model.py
Lines 372 to 381 in 515a4cb
I'm sorry if I understand the paper or the code in a wrong way, but according to my current understanding, the conditional probability in Equation 15 is the prob of Sampled sequence, other than the prob of Target sequence, in other words, there it's supposed to use the indices of the words in sampled sequence, other than the indices of the words in target sequence to 'gather' the probs?
The text was updated successfully, but these errors were encountered: