pymudocdataset object has no attribute classify #1573

Fanxhion · 2025-01-18T07:44:40Z

Description of the bug | 错误描述

测试pdf解析时报错如下：
pymudocdataset object has no attribute classify.

How to reproduce the bug | 如何复现

pymudocdataset object has no attribute classify.

Operating system | 操作系统

Linux

Python version | Python 版本

3.11

Software version | 软件版本 (magic-pdf --version)

1.0.x

Device mode | 设备模式

cuda

Fanxhion · 2025-01-18T07:57:39Z

貌似是这段代码中，创建的数据集对象没有classify()属性：
reader1 = FileBasedDataReader("")
pdf_bytes = reader1.read(pdf_file_name) # read the pdf content
ds = PymuDocDataset(pdf_bytes)

inference

if ds.classify() == SupportedPdfParseMethod.OCR:
infer_result = ds.apply(doc_analyze, ocr=True)

## pipeline
pipe_result = infer_result.pipe_ocr_mode(image_writer)

else:
infer_result = ds.apply(doc_analyze, ocr=False)

## pipeline
pipe_result = infer_result.pipe_txt_mode(image_writer)

Fanxhion · 2025-01-18T08:04:09Z

参照了demo中的方法。

myhloli · 2025-01-18T10:50:20Z

通过magic-pdf -v 命令看下版本，1.0.1应该不会报这个错的

Fanxhion added the bug Something isn't working label Jan 18, 2025

myhloli closed this as completed Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pymudocdataset object has no attribute classify #1573

pymudocdataset object has no attribute classify #1573

Fanxhion commented Jan 18, 2025

Fanxhion commented Jan 18, 2025 •

edited

Loading

Fanxhion commented Jan 18, 2025

myhloli commented Jan 18, 2025

pymudocdataset object has no attribute classify #1573

pymudocdataset object has no attribute classify #1573

Comments

Fanxhion commented Jan 18, 2025

Description of the bug | 错误描述

How to reproduce the bug | 如何复现

Operating system | 操作系统

Python version | Python 版本

Software version | 软件版本 (magic-pdf --version)

Device mode | 设备模式

Fanxhion commented Jan 18, 2025 • edited Loading

inference

Fanxhion commented Jan 18, 2025

myhloli commented Jan 18, 2025

Fanxhion commented Jan 18, 2025 •

edited

Loading