Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

新版本运行出现bug:IndexError: index 10 is out of bounds for axis 0 with size 10 #627

Closed
Maple0709 opened this issue Sep 18, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@Maple0709
Copy link

Description of the bug | 错误描述

Traceback (most recent call last):
File "/opt/conda/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/opt/conda/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/data/MinerU/app.py", line 61, in file_extract
pipe.pipe_analyze(pdf_bytes, pdf_type)
File "/data/MinerU/magic_pdf/pipe/UNIPipe.py", line 69, in pipe_analyze
self.model_list = doc_analyze(pdf_bytes, self.ocr_custom_model, ocr=True,isimage=False,
File "/data/MinerU/magic_pdf/model/doc_analyze_by_custom_model.py", line 136, in doc_analyze
result = custom_model(img)
File "/data/MinerU/magic_pdf/model/pdf_extract_kit.py", line 351, in call
ocr_res = self.ocr_model.ocr(new_image, mfd_res=adjusted_mfdetrec_res)[0]
File "/data/MinerU/magic_pdf/model/pek_sub_modules/self_modify.py", line 290, in ocr
dt_boxes, rec_res, _ = self.call(img, cls, mfd_res=mfd_res)
File "/data/MinerU/magic_pdf/model/pek_sub_modules/self_modify.py", line 371, in call
rec_res, elapse = self.text_recognizer(img_crop_list)
File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/tools/infer/predict_rec.py", line 630, in call
rec_res[indices[beg_img_no + rno]] = rec_result[rno]
IndexError: index 10 is out of bounds for axis 0 with size 10

How to reproduce the bug | 如何复现

新版本中,使用多线程执行应用的时候,会出现IndexError: index 10 is out of bounds for axis 0 with size 10

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.8.x

Device mode | 设备模式

cuda

@Maple0709 Maple0709 added the bug Something isn't working label Sep 18, 2024
@myhloli
Copy link
Collaborator

myhloli commented Sep 18, 2024

报错的pdf文件在单线程会触发这个问题吗?

@myhloli
Copy link
Collaborator

myhloli commented Sep 18, 2024

最近的版本修改了magic_pdf/model/pdf_extract_kit.py和magic_pdf/model/pek_sub_modules/self_modify.py的一些代码,看了你的报错,代码行数和最新的版本对不上,可以尝试更新到最新版本再进行测试

@georgewangchn
Copy link

georgewangchn commented Sep 23, 2024

我也碰到了这个问题,并发请求8个报错,并发请求4个不报错。
paddlepaddle-gpu 2.6.2
paddleocr 2.8.1

@myhloli myhloli closed this as completed Jan 5, 2025
@myhloli
Copy link
Collaborator

myhloli commented Jan 5, 2025

paddleocr不支持多线程导致的,请尽量使用多进程而不是多线程来处理并发。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants