[Feature] Add mistral pretrain #204

DumoeDss · 2023-11-03T19:48:44Z

添加mistral预训练脚本与预训练数据集的completion_map_fn

LZHgrla

非常感谢您的贡献，有些comments辛苦看一下～

xtuner/dataset/map_fns/dataset_map_fns/completion_map_fn.py

xtuner/configs/mistral/mistral_7b_qlora_completion.py

DumoeDss · 2023-11-06T05:56:29Z

completion是针对通用的预训练数据集命名的，改为pretrain可能更直观，但是感觉不应该用skypile命名。代码中的 skypile数据集是我从huggingface上随便找的最近开源的预训练数据集。您看是要修改成什么比较好呢？其他部分会按照要求进行修改~ Zhihao Lin ***@***.***> 于2023年11月6日周一 13:19写道：

…

***@***.**** commented on this pull request. 非常感谢您的贡献，有些comments辛苦看一下～ ------------------------------ In xtuner/dataset/map_fns/dataset_map_fns/completion_map_fn.py <#204 (comment)>: > @@ -0,0 +1,14 @@ +# Copyright (c) OpenMMLab. All rights reserved. +def completion_map_fn(example): completion_map_fn 重命名为 skypile_map_fn，是否更合适？ ------------------------------ On xtuner/configs/mistral/mistral_7b_qlora_completion.py <#204 (comment)>: 文件名： 1. completion是否改为skypile或skypile_pretrain更合适？ 2. 参照其他 config，在文件名尾增加epoch数的指示，即e3 ------------------------------ In xtuner/configs/mistral/mistral_7b_qlora_completion.py <#204 (comment)>: > +# learning policy +# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501 +param_scheduler = dict( + type=CosineAnnealingLR, + eta_min=lr * 0.1, + by_epoch=True, + T_max=max_epochs, + convert_to_iter_based=True) + +# train, val, test setting +train_cfg = dict(by_epoch=True, max_epochs=max_epochs, val_interval=1) + +####################################################################### +# PART 5 Runtime # +####################################################################### +# Log the dialogue periodically during the training process, optional 删除此行注释 — Reply to this email directly, view it on GitHub <#204 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACRS7W5ZCSQ253LOGPVNCXTYDBXM5AVCNFSM6AAAAAA64457M2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOMJUGI3TOMRSHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

LZHgrla · 2023-11-06T09:19:20Z

@DumoeDss

谢谢！
map_fn建议重命名为pretrain_map_fn，config命名则建议改为mistral_7b_qlora_skypile_pretrain_e3.py

同时，还有两处如果方便的话也可以一并完善

config 增加 EvaluateChatHook，以在训练中查看训练效果。需注意设置max_new_tokens=100，以避免生成太长而过于耗时。
PROMPT_TEMPLATE新增一个空，例如
```
    none=dict(
        SYSTEM='{system}\n',
        INSTRUCTION='{input}'),
```
并将 xtuner chat 的默认--prompt-template设置为此，以便于验证预训练效果

xtuner/xtuner/tools/chat.py

Line 35 in 0badead

default=PROMPT_TEMPLATE.default,

DumoeDss · 2023-11-07T21:20:51Z

@LZHgrla 已进行修改，新增PROMPT_TEMPLATE类型为pretrain，值为None，在进行设置的时候添加了判空处理。

LZHgrla

@DumoeDss
xtuner/tools/chat.py 的修改貌似引入了 bug。如果 prompt_template 是 None，输入的 text 不会被追加到 inputs。
我尝试修改了一下 xtuner/tools/chat.py，可以检查一下是否会有错误。
0a1a504

同时，我们可以直接在 pretrain 阶段删除 prompt_template，我已经在reviews中对此进行了修改，需要你点击“同意”，并进行验证。

其他 LGTM，非常感谢您的贡献！

xtuner/utils/templates.py

xtuner/configs/mistral/mistral_7b_qlora_skypile_pretrain_e1.py

DumoeDss · 2023-11-08T08:38:05Z

@LZHgrla 应该是修改时遗漏了，已进行确认~

LZHgrla · 2023-11-08T08:40:15Z

@DumoeDss
需要点击每一个修改的Add suggestion to batch，以应用reviews的更新

同时，请修正pre-commit的 CI 错误～

Co-authored-by: Zhihao Lin <[email protected]>

* [Feature] Add mistral pretrain * [feat] rename pretrain_map_fn * [feat] add custom hook * [feat] change mistral config name * Update chat.py * Update xtuner/utils/templates.py Co-authored-by: Zhihao Lin <[email protected]> * Update xtuner/configs/mistral/mistral_7b_qlora_skypile_pretrain_e1.py Co-authored-by: Zhihao Lin <[email protected]> * Update xtuner/configs/mistral/mistral_7b_qlora_skypile_pretrain_e1.py Co-authored-by: Zhihao Lin <[email protected]> * Update xtuner/configs/mistral/mistral_7b_qlora_skypile_pretrain_e1.py Co-authored-by: Zhihao Lin <[email protected]> * fix pre-commit --------- Co-authored-by: Zhihao Lin <[email protected]> Co-authored-by: LZHgrla <[email protected]>

[Feature] Add mistral pretrain

363650c

LZHgrla reviewed Nov 6, 2023

View reviewed changes

xtuner/dataset/map_fns/dataset_map_fns/completion_map_fn.py Outdated Show resolved Hide resolved

xtuner/configs/mistral/mistral_7b_qlora_completion.py Outdated Show resolved Hide resolved

xtuner/configs/mistral/mistral_7b_qlora_completion.py Outdated Show resolved Hide resolved

DumoeDss added 3 commits November 7, 2023 21:07

[feat] rename pretrain_map_fn

9370aa1

[feat] add custom hook

f084034

[feat] change mistral config name

1005544

Update chat.py

0a1a504

LZHgrla reviewed Nov 8, 2023

View reviewed changes

DumoeDss and others added 5 commits November 8, 2023 16:43

Update xtuner/utils/templates.py

3f779df

Co-authored-by: Zhihao Lin <[email protected]>

Update xtuner/configs/mistral/mistral_7b_qlora_skypile_pretrain_e1.py

6e9c8ed

Co-authored-by: Zhihao Lin <[email protected]>

Update xtuner/configs/mistral/mistral_7b_qlora_skypile_pretrain_e1.py

469a5ee

Co-authored-by: Zhihao Lin <[email protected]>

Update xtuner/configs/mistral/mistral_7b_qlora_skypile_pretrain_e1.py

961a944

Co-authored-by: Zhihao Lin <[email protected]>

fix pre-commit

830ac2d

LZHgrla approved these changes Nov 8, 2023

View reviewed changes

LZHgrla merged commit 8ce2569 into InternLM:main Nov 8, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add mistral pretrain #204

[Feature] Add mistral pretrain #204

DumoeDss commented Nov 3, 2023

LZHgrla left a comment

DumoeDss commented Nov 6, 2023 via email

LZHgrla commented Nov 6, 2023

DumoeDss commented Nov 7, 2023

LZHgrla left a comment •

edited

Loading

DumoeDss commented Nov 8, 2023

LZHgrla commented Nov 8, 2023

[Feature] Add mistral pretrain #204

[Feature] Add mistral pretrain #204

Conversation

DumoeDss commented Nov 3, 2023

LZHgrla left a comment

Choose a reason for hiding this comment

DumoeDss commented Nov 6, 2023 via email

LZHgrla commented Nov 6, 2023

DumoeDss commented Nov 7, 2023

LZHgrla left a comment • edited Loading

Choose a reason for hiding this comment

DumoeDss commented Nov 8, 2023

LZHgrla commented Nov 8, 2023

LZHgrla left a comment •

edited

Loading