Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

布局识别速度缓慢,出现报错但不影响程序 #1560

Closed
rouxiang opened this issue Jan 17, 2025 · 2 comments
Closed

布局识别速度缓慢,出现报错但不影响程序 #1560

rouxiang opened this issue Jan 17, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@rouxiang
Copy link

Description of the bug | 错误描述

[2025-01-17 00:54:37,502] [ ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
[2025-01-17 00:54:37,505] [ ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
[2025-01-17 00:54:37,506] [ ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
[2025-01-17 00:54:37,508] [ ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
[2025-01-17 00:54:37,511] [ ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
[2025-01-17 00:54:37,515] [ ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
[2025-01-17 00:54:37,518] [ ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
2025-01-17 00:54:39.078 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 3509, cid_chars_radio: 0.0
2025-01-17 00:54:39.082 | WARNING | magic_pdf.filter.pdf_classify_by_type:classify:334 - pdf is not classified by area and text_len, by_image_area: False, by_text: True, by_avg_words: True, by_img_num: True, by_text_layout: True, by_img_narrow_strips: True, by_invalid_chars: True
2025-01-17 00:54:54.595 | INFO | magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 14.52
2025-01-17 00:55:21.352 | INFO | magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 26.75
2025-01-17 00:55:21.352 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 0, page total time: 41.28-----
2025-01-17 00:55:37.557 | INFO | magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 14.74
2025-01-17 00:55:49.726 | INFO | magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 12.16
2025-01-17 00:55:49.726 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 1, page total time: 26.91-----
2025-01-17 00:56:05.174 | INFO | magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 14.19
2025-01-17 00:56:08.088 | INFO | magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 2.9
2025-01-17 00:56:08.088 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 2, page total time: 17.1-----
2025-01-17 00:56:22.839 | INFO | magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 13.72
2025-01-17 00:56:42.865 | INFO | magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 20.02
2025-01-17 00:56:42.865 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 3, page total time: 33.75-----
2025-01-17 00:56:57.983 | INFO | magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 14.15
2025-01-17 00:57:18.312 | INFO | magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 20.32
2025-01-17 00:57:18.313 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 4, page total time: 34.48-----
2025-01-17 00:57:34.336 | INFO | magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 14.86
2025-01-17 00:57:35.842 | INFO | magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 1.5
2025-01-17 00:57:35.842 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 5, page total time: 16.37-----
2025-01-17 00:57:51.104 | INFO | magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 14.14
2025-01-17 00:58:12.490 | INFO | magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 21.38
2025-01-17 00:58:12.490 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 6, page total time: 35.53-----
2025-01-17 00:58:12.816 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:234 - gc time: 0.33
2025-01-17 00:58:12.816 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:238 - doc analyze time: 213.73, speed: 0.04 pages/second

How to reproduce the bug | 如何复现

虽然是cpu运行,但是layout的速度似乎还是不太正常,另外error日志是否对推理速度有影响

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

1.0.x

Device mode | 设备模式

cpu

@rouxiang rouxiang added the bug Something isn't working label Jan 17, 2025
@myhloli
Copy link
Collaborator

myhloli commented Jan 17, 2025

linux arm环境运行需要升级到1.0.1以获得最佳性能

@myhloli myhloli closed this as completed Jan 17, 2025
@rouxiang
Copy link
Author

linux arm环境运行需要升级到1.0.1以获得最佳性能

好的谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants