Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add mistral pretrain #204

Merged
merged 10 commits into from
Nov 8, 2023
Merged

[Feature] Add mistral pretrain #204

merged 10 commits into from
Nov 8, 2023

Conversation

DumoeDss
Copy link
Contributor

@DumoeDss DumoeDss commented Nov 3, 2023

添加mistral预训练脚本与预训练数据集的completion_map_fn

Copy link
Collaborator

@LZHgrla LZHgrla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

非常感谢您的贡献,有些comments辛苦看一下~

xtuner/configs/mistral/mistral_7b_qlora_completion.py Outdated Show resolved Hide resolved
xtuner/configs/mistral/mistral_7b_qlora_completion.py Outdated Show resolved Hide resolved
@DumoeDss
Copy link
Contributor Author

DumoeDss commented Nov 6, 2023 via email

@LZHgrla
Copy link
Collaborator

LZHgrla commented Nov 6, 2023

@DumoeDss

谢谢!
map_fn建议重命名为pretrain_map_fn,config命名则建议改为mistral_7b_qlora_skypile_pretrain_e3.py

同时,还有两处如果方便的话也可以一并完善

  1. config 增加 EvaluateChatHook,以在训练中查看训练效果。需注意设置max_new_tokens=100,以避免生成太长而过于耗时。
  2. PROMPT_TEMPLATE新增一个空,例如
        none=dict(
            SYSTEM='{system}\n',
            INSTRUCTION='{input}'),
    并将 xtuner chat 的默认--prompt-template设置为此,以便于验证预训练效果
    default=PROMPT_TEMPLATE.default,

@DumoeDss
Copy link
Contributor Author

DumoeDss commented Nov 7, 2023

@LZHgrla 已进行修改,新增PROMPT_TEMPLATE类型为pretrain,值为None,在进行设置的时候添加了判空处理。

Copy link
Collaborator

@LZHgrla LZHgrla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DumoeDss
xtuner/tools/chat.py 的修改貌似引入了 bug。如果 prompt_templateNone,输入的 text 不会被追加到 inputs
我尝试修改了一下 xtuner/tools/chat.py,可以检查一下是否会有错误。
0a1a504

同时,我们可以直接在 pretrain 阶段删除 prompt_template,我已经在reviews中对此进行了修改,需要你点击“同意”,并进行验证。

其他 LGTM,非常感谢您的贡献!

xtuner/utils/templates.py Outdated Show resolved Hide resolved
@DumoeDss
Copy link
Contributor Author

DumoeDss commented Nov 8, 2023

@LZHgrla 应该是修改时遗漏了,已进行确认~

@LZHgrla
Copy link
Collaborator

LZHgrla commented Nov 8, 2023

@DumoeDss
需要点击每一个修改的Add suggestion to batch,以应用reviews的更新

同时,请修正pre-commit的 CI 错误~

@LZHgrla LZHgrla merged commit 8ce2569 into InternLM:main Nov 8, 2023
1 check passed
llkn-2 pushed a commit to llkn-2/xtuner that referenced this pull request Jul 31, 2024
* [Feature] Add mistral pretrain

* [feat] rename pretrain_map_fn

* [feat] add custom hook

* [feat] change mistral config name

* Update chat.py

* Update xtuner/utils/templates.py

Co-authored-by: Zhihao Lin <[email protected]>

* Update xtuner/configs/mistral/mistral_7b_qlora_skypile_pretrain_e1.py

Co-authored-by: Zhihao Lin <[email protected]>

* Update xtuner/configs/mistral/mistral_7b_qlora_skypile_pretrain_e1.py

Co-authored-by: Zhihao Lin <[email protected]>

* Update xtuner/configs/mistral/mistral_7b_qlora_skypile_pretrain_e1.py

Co-authored-by: Zhihao Lin <[email protected]>

* fix pre-commit

---------

Co-authored-by: Zhihao Lin <[email protected]>
Co-authored-by: LZHgrla <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants