布局识别速度缓慢，出现报错但不影响程序 #1560

rouxiang · 2025-01-17T01:08:23Z

Description of the bug | 错误描述

[2025-01-17 00:54:37,502] [ [2025-01-17 00:54:37,505] [ [2025-01-17 00:54:37,506] [ [2025-01-17 00:54:37,508] [ [2025-01-17 00:54:37,511] [ [2025-01-17 00:54:37,515] [ [2025-01-17 00:54:37,518] [ 2025-01-17 00:54:39.078 | INFO 2025-01-17 00:54:39.082 | WARNING 2025-01-17 00:54:54.595 | INFO 2025-01-17 00:55:21.352 | INFO 2025-01-17 00:55:21.352 | INFO 2025-01-17 00:55:37.557 | INFO 2025-01-17 00:55:49.726 | INFO 2025-01-17 00:55:49.726 | INFO 2025-01-17 00:56:05.174 | INFO 2025-01-17 00:56:08.088 | INFO 2025-01-17 00:56:08.088 | INFO 2025-01-17 00:56:22.839 | INFO 2025-01-17 00:56:42.865 | INFO 2025-01-17 00:56:42.865 | INFO 2025-01-17 00:56:57.983 | INFO 2025-01-17 00:57:18.312 | INFO 2025-01-17 00:57:18.313 | INFO 2025-01-17 00:57:34.336 | INFO 2025-01-17 00:57:35.842 | INFO 2025-01-17 00:57:35.842 | INFO 2025-01-17 00:57:51.104 | INFO 2025-01-17 00:58:12.490 | INFO 2025-01-17 00:58:12.490 | INFO 2025-01-17 00:58:12.816 | INFO 2025-01-17 00:58:12.816 | INFO ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
ERROR] infer.py:218 - fast-langdetect:Error during language detection: predict processes one line at a time (remove '\n')
| magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 3509, cid_chars_radio: 0.0
| magic_pdf.filter.pdf_classify_by_type:classify:334 - pdf is not classified by area and text_len, by_image_area: False, by_text: True, by_avg_words: True, by_img_num: True, by_text_layout: True, by_img_narrow_strips: True, by_invalid_chars: True
| magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 14.52
| magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 26.75
| magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 0, page total time: 41.28-----
| magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 14.74
| magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 12.16
| magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 1, page total time: 26.91-----
| magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 14.19
| magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 2.9
| magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 2, page total time: 17.1-----
| magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 13.72
| magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 20.02
| magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 3, page total time: 33.75-----
| magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 14.15
| magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 20.32
| magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 4, page total time: 34.48-----
| magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 14.86
| magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 1.5
| magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 5, page total time: 16.37-----
| magic_pdf.model.pdf_extract_kit:call:202 - layout detection time: 14.14
| magic_pdf.model.pdf_extract_kit:call:249 - ocr time: 21.38
| magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:223 - -----page_id : 6, page total time: 35.53-----
| magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:234 - gc time: 0.33
| magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:238 - doc analyze time: 213.73, speed: 0.04 pages/second

How to reproduce the bug | 如何复现

虽然是cpu运行，但是layout的速度似乎还是不太正常，另外error日志是否对推理速度有影响

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

1.0.x

Device mode | 设备模式

cpu

myhloli · 2025-01-17T01:29:37Z

linux arm环境运行需要升级到1.0.1以获得最佳性能

rouxiang · 2025-01-20T00:33:23Z

linux arm环境运行需要升级到1.0.1以获得最佳性能

好的谢谢

rouxiang added the bug Something isn't working label Jan 17, 2025

myhloli closed this as completed Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

布局识别速度缓慢，出现报错但不影响程序 #1560

布局识别速度缓慢，出现报错但不影响程序 #1560

rouxiang commented Jan 17, 2025

myhloli commented Jan 17, 2025

rouxiang commented Jan 20, 2025

布局识别速度缓慢，出现报错但不影响程序 #1560

布局识别速度缓慢，出现报错但不影响程序 #1560

Comments

rouxiang commented Jan 17, 2025

Description of the bug | 错误描述

How to reproduce the bug | 如何复现

Operating system | 操作系统

Python version | Python 版本

Software version | 软件版本 (magic-pdf --version)

Device mode | 设备模式

myhloli commented Jan 17, 2025

rouxiang commented Jan 20, 2025