Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Refactor the preprocess of dataset #158

Merged
merged 45 commits into from
Oct 12, 2023

Conversation

LZHgrla
Copy link
Collaborator

@LZHgrla LZHgrla commented Oct 8, 2023

dataset_map_fn: map the original data to the format of

[{
    "system": "You are an assistant.",
    "input": "Tell me a story.",
    "output": "One day a student went to schoool."
}]

template_map_fn: [defined on PROMPT_TEMPLATE] map the above format data to trainable data:

[{
    "input": "<|System|>: You are an assistant.\n<|User|>: Tell me a story.<eoh>\n<|Bot|>:",
    "output": "One day a student went to schoool."
}]

@LZHgrla LZHgrla marked this pull request as draft October 8, 2023 12:12
@pppppM pppppM requested a review from HIT-cwh October 9, 2023 04:23
@LZHgrla LZHgrla force-pushed the lzh/refactor_dataset branch from c1e18b9 to 8641b07 Compare October 10, 2023 05:19
@LZHgrla LZHgrla force-pushed the lzh/refactor_dataset branch from b8e3877 to 8e5b38e Compare October 10, 2023 09:16
@LZHgrla LZHgrla marked this pull request as ready for review October 10, 2023 10:04
@LZHgrla LZHgrla force-pushed the lzh/refactor_dataset branch 2 times, most recently from 34dc327 to 37a0827 Compare October 11, 2023 07:17
@LZHgrla LZHgrla changed the title [Feature] Refactor the preprocess of dataset [Refactor] Refactor the preprocess of dataset Oct 11, 2023
@LZHgrla LZHgrla changed the base branch from main to lzh/refactor October 12, 2023 08:10
@LZHgrla LZHgrla merged commit f7a5c9c into InternLM:lzh/refactor Oct 12, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant