Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PaddleX Ocr产线 可以同时使用多个GPU 进行推理吗?要怎么设置? #2754

Open
1756112901 opened this issue Jan 2, 2025 · 1 comment
Assignees

Comments

@1756112901
Copy link

1756112901 commented Jan 2, 2025

PaddleX Ocr产线 可以同时使用多个GPU进行推理吗?要怎么设置

@cuicheng01
Copy link
Collaborator

cuicheng01 commented Jan 6, 2025

暂时不支持,不过你可以加一些后处理的代码来支持,后续官方会考虑支持。

import os
import multiprocessing

def process_image(file_path, input_dir, output_dir, pipeline):
    # 预测
    output = pipeline.predict(file_path)

    # 生成输出目录路径
    relative_path = os.path.relpath(file_path, input_dir)
    json_output_path = os.path.join(output_dir, os.path.splitext(relative_path)[0] + '.json')

    # 创建输出文件夹
    os.makedirs(os.path.dirname(json_output_path), exist_ok=True)

    # 保存结果
    try:
        for res in output:
            # res.print()  # 打印预测的结构化输出
            res.save_to_json(json_output_path)
            res.save_to_img("output")
        print(f"Process {file_path} Successful!")
    except Exception as e:
        print(f"Process {file_path} Failed with error: {e}")

def worker(gpu_id, image_files, input_dir, output_dir):
    # 初始化每个进程的pipeline
    from paddlex import create_pipeline
    pipeline = create_pipeline(pipeline="OCR", device=f"gpu:{gpu_id}")
    for file_path in image_files:
        process_image(file_path, input_dir, output_dir, pipeline)

def process_images(input_dir, output_dir):
    # 收集所有图片文件路径
    image_files = []
    for root, _, files in os.walk(input_dir):
        for file in files:
            if file.endswith(('.png', '.jpg', '.jpeg')):
                image_files.append(os.path.join(root, file))

    # 确定GPU数量和每个GPU上的进程数量
    num_gpus = 4
    processes_per_gpu = 10

    # 将图片文件分配到每个GPU和进程
    chunk_size = len(image_files) // (num_gpus * processes_per_gpu)
    chunks = [image_files[i:i + chunk_size] for i in range(0, len(image_files), chunk_size)]

    # 创建多进程
    processes = []
    for gpu_id in range(num_gpus):
        for _ in range(processes_per_gpu):
            if chunks:
                chunk = chunks.pop(0)
                p = multiprocessing.Process(target=worker, args=(gpu_id, chunk, input_dir, output_dir))
                processes.append(p)
                p.start()

    # 等待所有进程完成
    for p in processes:
        p.join()

# 输入图片目录和输出JSON目录
input_directory = "./images"
output_directory = "./output"

if __name__ == '__main__':
    process_images(input_directory, output_directory)

其中num_gpus processes_per_gpu需要根据你的实际情况进行改动,另外代码整体上也需要根据你的实际情况改动

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants