We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
# 原max_length选择方法,逻辑有问题 # for i in len_dict: # rate = i[1] / all_sent # cover_rate += rate # if cover_rate >= limit_ratio: # max_length = i[0] # break
分析:len_dict是句子长度的频数统计list[(15,3700),(12,2800),(8,500)...(20,30)],每个元素(句长,频数) 按上述逻辑,当3700+2800+500大于总频数95%时,max_len是8,这里就产生了错误。
应该修改为:
temp = sorted(len_dict, key=lambda x:x[0], reverse=False) for i in temp: rate = i[1] / all_sent cover_rate += rate if cover_rate >= limit_ratio: max_length = i[0] break
The text was updated successfully, but these errors were encountered:
No branches or pull requests
分析:len_dict是句子长度的频数统计list[(15,3700),(12,2800),(8,500)...(20,30)],每个元素(句长,频数)
按上述逻辑,当3700+2800+500大于总频数95%时,max_len是8,这里就产生了错误。
应该修改为:
改成:将len_dict按照句子长度从小到大排序,从大到小筛选
The text was updated successfully, but these errors were encountered: