Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pymudocdataset object has no attribute classify #1573

Closed
Fanxhion opened this issue Jan 18, 2025 · 3 comments
Closed

pymudocdataset object has no attribute classify #1573

Fanxhion opened this issue Jan 18, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@Fanxhion
Copy link

Description of the bug | 错误描述

测试pdf解析时报错如下:
pymudocdataset object has no attribute classify.

How to reproduce the bug | 如何复现

pymudocdataset object has no attribute classify.

Operating system | 操作系统

Linux

Python version | Python 版本

3.11

Software version | 软件版本 (magic-pdf --version)

1.0.x

Device mode | 设备模式

cuda

@Fanxhion Fanxhion added the bug Something isn't working label Jan 18, 2025
@Fanxhion
Copy link
Author

Fanxhion commented Jan 18, 2025

貌似是这段代码中,创建的数据集对象没有classify()属性:
reader1 = FileBasedDataReader("")
pdf_bytes = reader1.read(pdf_file_name) # read the pdf content
ds = PymuDocDataset(pdf_bytes)

inference

if ds.classify() == SupportedPdfParseMethod.OCR:
infer_result = ds.apply(doc_analyze, ocr=True)

## pipeline
pipe_result = infer_result.pipe_ocr_mode(image_writer)

else:
infer_result = ds.apply(doc_analyze, ocr=False)

## pipeline
pipe_result = infer_result.pipe_txt_mode(image_writer)

@Fanxhion
Copy link
Author

参照了demo中的方法。

@myhloli
Copy link
Collaborator

myhloli commented Jan 18, 2025

通过magic-pdf -v 命令看下版本,1.0.1应该不会报这个错的

@myhloli myhloli closed this as completed Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants