[MoE] notebook for numerical verification #1026

ZhiyuLi-goog · 2024-11-12T19:03:46Z

Description

In addition to the typo, looks like we are missing this normalization in maxtext, which is to re-scale top_k_weights to a normalized one where their sum equals 1. I was able to match layer outputs after adding the normalization. This normalization won't affect training/inferencing since this is a constant term (per each token), which won't change softmax probability. But we can still add it for better alignment.

Update:
fix typo and add normalization for top_k_weights was covered in these PRs #1100 & #1064
Just add numerical verification sheet in this PR.

Tests

Tested locally.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

gobbleturk · 2024-11-12T19:08:40Z

Thanks for the fix Zhiyu. I'll leave for @RissyRan to review

RissyRan · 2024-11-13T05:27:37Z

Thanks Zhiyu for the fix! I am fine to add this normalization if this makes us convenient to compare weights. Could you add this top_k_weights /= top_k_weights.sum(-1, keepdims=True) before this line weights = self.reshape_and_update_weights(top_k_weights, top_k_indices)? It was not used anywhere before that.

RissyRan · 2024-11-13T05:42:08Z

I think we also need to add this normalization to somewhere [here]

maxtext/MaxText/layers/linears.py

Line 488 in 061abd8

softmax_probs *= combined_expert_mask

) for dropping. softmax_probs *= combined_expert_mask

Have you checked the benchmark scores with and without this normalization? using correct ckpt on 8x22b or 8x7b.

ZhiyuLi-goog · 2024-11-15T08:04:27Z

Thanks Zhiyu for the fix! I am fine to add this normalization if this makes us convenient to compare weights. Could you add this top_k_weights /= top_k_weights.sum(-1, keepdims=True) before this line weights = self.reshape_and_update_weights(top_k_weights, top_k_indices)? It was not used anywhere before that.

Done.

ZhiyuLi-goog · 2024-11-15T08:17:31Z

I think we also need to add this normalization to somewhere [here]

maxtext/MaxText/layers/linears.py

Line 488 in 061abd8

softmax_probs *= combined_expert_mask

) for dropping. softmax_probs *= combined_expert_mask
Have you checked the benchmark scores with and without this normalization? using correct ckpt on 8x22b or 8x7b.

Have checked the benchmark scores the results are the same with and without this normalization.

Additionally,added a numerical verification notebook.
Currently able get ~0.1 tolerance in logits for both megablox=True or False.

RissyRan

Thank you!

ZhiyuLi-goog · 2024-11-16T01:27:45Z

@gobbleturk could you review it at your convenience?

ZhiyuLi-goog · 2024-11-22T07:27:10Z

@gobbleturk could you review it at your convenience?

Hi, @gobbleturk I need code owner's review as the final step of this PR, thank you!

ZhiyuLi-goog requested review from gobbleturk, jonb377, khatwanimohit, bvandermoon and vipannalla as code owners November 12, 2024 19:03

gobbleturk assigned RissyRan Nov 12, 2024

ZhiyuLi-goog force-pushed the lizhiyu/fix_moe branch from 613daa2 to f6f6c2d Compare November 15, 2024 08:02

RissyRan approved these changes Nov 15, 2024

View reviewed changes

ZhiyuLi-goog force-pushed the lizhiyu/fix_moe branch from f6f6c2d to 28a634a Compare November 16, 2024 00:58

ZhiyuLi-goog assigned gobbleturk and unassigned RissyRan Nov 16, 2024

gobbleturk approved these changes Dec 27, 2024

View reviewed changes

github-actions bot added the pull ready label Dec 27, 2024

ZhiyuLi-goog force-pushed the lizhiyu/fix_moe branch from 28a634a to 5ea24b9 Compare December 27, 2024 22:18

[MoE] notebook for numerical verification

0af8d97

ZhiyuLi-goog force-pushed the lizhiyu/fix_moe branch from 5ea24b9 to 0af8d97 Compare December 27, 2024 22:40

ZhiyuLi-goog changed the title ~~[MoE] fix typo and add normalization for top_k_weights~~ [MoE] notebook for numerical verification Dec 27, 2024

copybara-service bot merged commit c106fe1 into main Dec 27, 2024
15 of 17 checks passed

copybara-service bot deleted the lizhiyu/fix_moe branch December 27, 2024 23:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MoE] notebook for numerical verification #1026

[MoE] notebook for numerical verification #1026

ZhiyuLi-goog commented Nov 12, 2024 •

edited

Loading

gobbleturk commented Nov 12, 2024

RissyRan commented Nov 13, 2024

RissyRan commented Nov 13, 2024

ZhiyuLi-goog commented Nov 15, 2024

ZhiyuLi-goog commented Nov 15, 2024 •

edited

Loading

RissyRan left a comment

ZhiyuLi-goog commented Nov 16, 2024

ZhiyuLi-goog commented Nov 22, 2024

[MoE] notebook for numerical verification #1026

[MoE] notebook for numerical verification #1026

Conversation

ZhiyuLi-goog commented Nov 12, 2024 • edited Loading

Description

Tests

Checklist

gobbleturk commented Nov 12, 2024

RissyRan commented Nov 13, 2024

RissyRan commented Nov 13, 2024

ZhiyuLi-goog commented Nov 15, 2024

ZhiyuLi-goog commented Nov 15, 2024 • edited Loading

RissyRan left a comment

Choose a reason for hiding this comment

ZhiyuLi-goog commented Nov 16, 2024

ZhiyuLi-goog commented Nov 22, 2024

ZhiyuLi-goog commented Nov 12, 2024 •

edited

Loading

ZhiyuLi-goog commented Nov 15, 2024 •

edited

Loading