Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练时长的问题 #1

Open
gms5144 opened this issue Oct 24, 2023 · 4 comments
Open

训练时长的问题 #1

gms5144 opened this issue Oct 24, 2023 · 4 comments

Comments

@gms5144
Copy link

gms5144 commented Oct 24, 2023

我后台挂载了一晚上也还没有跑完,而且batch_size那么大,那个tqdm也没用,不显示进度,

@yaokui2018
Copy link
Owner

我之前是用的 NVIDIA RTX 3080 (10G) 跑的,看文件里的日志应该是跑了4个多小时。
batch_size 可以根据显卡情况适当调小一点,tqdm 不显示进度是因为 batch_size 太大了,超过了数据量,没分批次。

@gms5144
Copy link
Author

gms5144 commented Oct 24, 2023

感谢您的回复,我之前试过将batch_size调小看训练的结果,
image
b报错这个,我将原始数据spam0和normal0数据删除后可以完成训练。

@yaokui2018
Copy link
Owner

yaokui2018 commented Oct 24, 2023

感谢您的回复,我之前试过将batch_size调小看训练的结果, image b报错这个,我将原始数据spam0和normal0数据删除后可以完成训练。

这个错误是训练的数据里面只有一个分类,是不是 batch_size 设置的太小了,导致一个批次里面的数据都是正样本或者都是负样本。按理来说,这个错误跟【spam0和normal0数据】没啥关系的,应该是因为重新运行的原因,数据重新划分了。

代码里加载数据的时候做了 shuffle ,正常情况下应该不容易出现一个批次都是一类数据的,,

@gms5144
Copy link
Author

gms5144 commented Oct 24, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants