Logs27: Mutual Information

Prepare small dataset to verify this.

Run with large data. Train the seq2seq and backward intensivley.

Steps

Training data set
- p_i: "Let's have curry for lunch."
- q_i: "Maybe Coco ichi?"
- p_i+1: "Sounds good."
Train seq2seq
- X: concat(p_i, q_i)
- Y: p_i+1
Train seq2seq_backward
- X: p_i+1
- Y: q_i
RL Training
1. Beam Search
  - X: concat(p_i, q_i) [batch_size, decoder_length]
    - Note: q_i should be accessible as iterator as well, as we need this when calculating reward.
  - beam_replies: [batch_size, decoder_length, beam_width]
  - logits: [batch_size, decoder_length, vocab_size]
2. Calc reward
  - Given: p_i, q_i, a[beam_index] (from beam_search)
  - 1/N_a * logP_seq2seq(a|p_i, q_i)
    - shape: [batch_size, beam_size]
    - NOTE: Don't use logP_rl here.
    - model_seq2seq.get_logits(p_i + q_i)
      - For 1 data:float value
      - For batch data:
        
        Get logits [batch_size, decoder_length, vocab_size] for [batch_size, decoder_length]
        
        Then calcualte and loop it over all beam candidates
  - 1/N_qi * logP_backward(qi|a)
    - shape: [batch_size, beam_size]
    - model_backward.get_logits(a) for i in range(beam_width)
      - For 1 data: 1 float value.
      - For batch data:
        
        Get logits [batch_size, decoder_length]
        
        Do it # of beam_width times.
3. Get log_prob: [batch_size, decoder_length, beam_width]
  - We already have this.

OLD

Steps

done Make it possible that beam coexists with infer
Return infer_logis when beam search
Get logits for predicted_id
Have beam_logits.
Refactoring
- extract attention method.
- Unify the model class?
Confirm beam_logits is same size as logits and same values.
for one beam search result get indices
Fetch logprob from the indices
reward back? or make it for multiple.

- Wait ... we'll have to use conversations.db finally? because we need p_seq2seq(a| pi, qi) - Fully understand MI - Read the original paper - Read the original original paper - we did not train a joint model (log p(T|S)−λ log p(T)), but instead trained maximum likelihood models, and used the MMI criterion only during testing. - P_MI is trained by caliculating MI between source and target. - P_RL is trained by RL agents (so that they can get dialogue history)? - Let's check the existing implmentation. - Understand where pi, qi comes from in the training - pi let's eat curry - qi How about kokoichi - pi+1 sounds good - Start always with small model. - Have backward seq2seq training in place. - Find old implementation of mutual information.

MI steps

Build MI model, this is happening when decoding best N results and mutual information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logs27: Mutual Information

Prepare small dataset to verify this.

Steps

OLD

Steps

MI steps

Clone this wiki locally