We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
这是识别pdf生成的json一部分 缺少text字段。 用的模型如下:
下面是识别用到的文件 中医药单用_联合抗生素治疗社区获得性肺炎临床实践指南_李得民.pdf 桂枝茯苓胶囊临床应用指南(2021年)_《中成药治疗优势病种临床应用指南》标准化项目组.pdf
The text was updated successfully, but these errors were encountered:
标题和text是独立检测的,不会放在一起保存(比如layout如果有10个类,你可以理解为text是第11个类),后处理阶段会比较text的框和layout的框,从而把有文字的布局框中的文字提取出来,可以试试把配置文件中的merge2markdown设置为True看下效果,如果文档排版比较复杂的话,也可以试试MinerU
Sorry, something went wrong.
No branches or pull requests
这是识别pdf生成的json一部分
缺少text字段。
用的模型如下:
标题识别不准确
这种换行标题以及标题和正文在一起有办法识别吗?
下面是识别用到的文件
中医药单用_联合抗生素治疗社区获得性肺炎临床实践指南_李得民.pdf
桂枝茯苓胶囊临床应用指南(2021年)_《中成药治疗优势病种临床应用指南》标准化项目组.pdf
The text was updated successfully, but these errors were encountered: