Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据通过DataLoader后,训练集只有一个Batch_size大小 #88

Open
Shajiu opened this issue Mar 10, 2023 · 1 comment
Open

数据通过DataLoader后,训练集只有一个Batch_size大小 #88

Shajiu opened this issue Mar 10, 2023 · 1 comment

Comments

@Shajiu
Copy link

Shajiu commented Mar 10, 2023

开始如数数据集时时整个训练集的大小,当数据通过DataLoader后,训练集只有一个Batch_size大小,随后正式训练时数据只有这部分数据集,具体显示在data_loader.py的第274行进行计算,先打印出len(datasets[0]),后打印len(train_loader),前后大小不一致了。随后模型都是在train_loader上进行训练的,请问这是怎么回事儿?论文里的指标都是这么计算的么?这个也太离谱了吧~

@ljynlp
Copy link
Owner

ljynlp commented Mar 11, 2023

len(datasets[0])显示的是训练集的数据量,随后train_loader会将dataset处理成batch,每次训练通过循环来取一个batch训练,取完所有batch后记为一个epoch,因此len(train_loader)显示的是batch的数量,所以len(train_loader)会比len(datasets[0])小。train_loader加载了训练集并将处理成batch,本身就是用于训练的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants