Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Rotate part of OCR Text Recognition Evaluation #12

Open
cpx111 opened this issue Dec 29, 2024 · 8 comments
Open

Text Rotate part of OCR Text Recognition Evaluation #12

cpx111 opened this issue Dec 29, 2024 · 8 comments

Comments

@cpx111
Copy link

cpx111 commented Dec 29, 2024

Hi! I'm wondering why PaddleOCR performs so well in "Text Rotate" part of OCR Text Recognition Evaluation, since our model performs better in "normal" but fails in "rotate" text. Have you undergone some operation like rotation to change it to one-line form for these vertical shape text? I'd appreciate it if you could offer the evaluation code of PaddleOCR!

@ouyanglinke
Copy link
Collaborator

Hi, we have not performed any rotation processing on the documents or cropped images.

We just upload the moder infer code of paddleOCR for text recognition. See here.
For evaluation, please refer to Text Recognition section in this repo. We have provided a config template of Text Recognition. You can run the evaluation by:

python pdf_validation.py --config ./configs/ocr.yaml

@cpx111
Copy link
Author

cpx111 commented Jan 2, 2025

Actually,I find that your evaluation concerning PaddleOCR's det+rec+cls parts,rather than only the rec part. Anyway, I fixed the result following your evaluation code by only replacing the recognition model and found our result a bit better than PaddleOCR. Thanks for your attention sincerely!

@ouyanglinke
Copy link
Collaborator

ouyanglinke commented Jan 3, 2025

Yes. Our text recognition is evaluated at the paragraph level, det+rec+cls is necessary for some OCR-based Models (e.g., PaddleOCR).

If your model is open-sourced, we also welcome updating the evaluation results to our leaderboard.

@cpx111
Copy link
Author

cpx111 commented Jan 4, 2025

Thanks for your reply sincerely! Our model OpenOCR uses SVTRv2 as the rec algorithm, which is also supported in PaddleOCR and performs better than default ppocr-v4 rec model. Our evaluation result is as follows. Overall, thanks for your attention again!

Model Type Model Language Text background Text Rotate
EN ZH Mixed White Single Multi Normal Rotate90 Rotate270 Horizontal
Expert Vision Models OpenOCR-repsvtr(mobile) 0.071 0.054 0.103 0.060 0.037 0.0777 0.059 0.101 0.296 0.020
OpenOCR-svtrv2(server) 0.069 0.051 0.094 0.057 0.036 0.0653 0.056 0.017 0.295 0.027
PaddleOCR-reproduct 0.073 0.056 0.123 0.062 0.045 0.0830 0.062 0.017 0.294 0.019
PaddleOCR 0.071 0.055 0.118 0.060 0.038 0.0848 0.060 0.015 0.285 0.021
Tesseract OCR 0.179 0.553 0.553 0.453 0.463 0.394 0.448 0.369 0.979 0.982
Surya 0.057 0.123 0.164 0.093 0.186 0.235 0.104 0.634 0.767 0.255
GOT-OCR 0.041 0.112 0.135 0.092 0.052 0.155 0.091 0.562 0.966 0.097
Mathpix 0.033 0.240 0.261 0.185 0.121 0.166 0.180 0.038 0.185 0.638
Vision Language Models Qwen2-VL-72B 0.072 0.274 0.286 0.234 0.155 0.148 0.223 0.273 0.721 0.067
InternVL2-Llama3-76B 0.074 0.155 0.242 0.113 0.352 0.269 0.132 0.610 0.907 0.595
GPT4o 0.020 0.224 0.125 0.167 0.140 0.220 0.168 0.115 0.718 0.132

@ouyanglinke
Copy link
Collaborator

Thank you for your contribute. We will double-check the results and update the leaderboard soon.

@ouyanglinke
Copy link
Collaborator

ouyanglinke commented Jan 7, 2025

For double-checking, we also evaluated openOCR. In most evaluation dimension, the scores show only slight fluctuations. However, there is a significant difference in the evaluation results when the text rotation angle is 270 degrees. The comparison of results is as follows:

Model Type Model Language Text background Text Rotate
EN ZH Mixed White Single Multi Normal Rotate90 Rotate270 Horizontal
Expert Vision Models OpenOCR-repsvtr(mobile)-reproduct 0.070 0.068 0.106 0.069 0.058 0.081 0.069 0.038 0.891 0.025
OpenOCR-repsvtr(mobile) 0.071 0.054 0.103 0.060 0.037 0.0777 0.059 0.101 0.296 0.020
OpenOCR-svtrv2(server) 0.069 0.051 0.094 0.057 0.036 0.0653 0.056 0.017 0.295 0.027

This is our model inference code for openOCR, aligned with model inference code for PaddleOCR, adding a 50-pixel white border to each image.

import cv2
import numpy as np
from openocr import OpenOCR  
from pathlib import Path
import pdb
import sys
from tqdm import tqdm
import logging

import os
import json
import numpy

from PIL import Image, ImageOps

def parse_ocr_line(ocr_line, tmp_img_name):
    ocr_data = json.loads(ocr_line.strip().replace(f'{tmp_img_name}\t[', '[').replace(']\n', ']'))
    parsed_ocr = []
    for item in ocr_data:
        transcription = item["transcription"]
        points = [[float(coord) for coord in point] for point in item["points"]]
        score = float(item["score"])
        parsed_ocr.append([points, [transcription, score]])
    return parsed_ocr

def model_infer(engine, img, lan, img_name):
    img_add_border = add_white_border(img)
    img_ndarray = numpy.array(img_add_border)
    # img = cv2.imdecode(img_ndarray, cv2.IMREAD_COLOR)
    try:
        tmp_img_path = f'tmp_openocr.jpg'
        cv2.imwrite(tmp_img_path, img_ndarray)
        result, elapse = engine(tmp_img_path)
        ocr_results = [parse_ocr_line(line, tmp_img_path) for line in result]
    except:
        print(f"Error performing OCR on {img_name}:")
        return ""

    text = ''
    for idx in range(len(ocr_results)):
        res = ocr_results[idx]
        if not res:
            continue
        for line in res:
            t = line[1][0]
            print(t)
            text += t
    return text

def add_white_border(img: Image):
    border_width = 50
    border_color = (255, 255, 255)
    img_with_border = ImageOps.expand(img, border=border_width, fill=border_color)
    return img_with_border


def poly2bbox(poly):
    L = poly[0]
    U = poly[1]
    R = poly[2]
    D = poly[5]
    L, R = min(L, R), max(L, R)
    U, D = min(U, D), max(U, D)
    bbox = [L, U, R, D]
    return bbox


def main():
    engine = OpenOCR()
    with open('./OmniDocBench/OmniDocBench.json', 'r') as f:
        samples = json.load(f)
    for sample in samples:
        img_name = os.path.basename(sample['page_info']['image_path'])
        img_path = os.path.join('./OmniDocBench/images', img_name)
        img = Image.open(img_path)
        if not os.path.exists(img_path):
            print('No exist: ', img_name)
            continue
        for i, anno in enumerate(sample['layout_dets']):
            if not anno.get('text'):
                continue
            print(anno)
            lan = anno['attribute'].get('text_language', 'mixed')
            bbox = poly2bbox(anno['poly'])
            image = img.crop(bbox).convert('RGB') # crop text block
            outputs = model_infer(engine, image, lan, img_name)

            anno['pred'] = outputs
        with open('./OmniDocBench/result/OmniDocBench_openocr_text_ocr.jsonl', 'a', encoding='utf-8') as f:
            json.dump(sample, f, ensure_ascii=False)
            f.write('\n')

def save_json():
    with open('./OmniDocBench/result/OmniDocBench_openocr_text_ocr.jsonl', 'r') as f:
        lines = f.readlines()
    samples = [json.loads(line) for line in lines]
    with open('./OmniDocBench/result/OmniDocBench_openocr_text_ocr.json', 'w', encoding='utf-8') as f:
        json.dump(samples, f, indent=4, ensure_ascii=False)

if __name__ == '__main__':
    main()
    save_json()

Please let us know if there are any issues in the infer code.
If you have no questions about the results, we will update the openOCR model results on the evaluation leaderboard soon.

@cpx111
Copy link
Author

cpx111 commented Jan 7, 2025

Thanks for your double-checking!I'm sorry that my comment before is not clear enough. The official implementation of OpenOCR's det+cls parts are different from PaddleOCR,which could cause bias in comparison. Thus,for fair comparison only in recognition, we just replace the rec part of PaddleOCR(since repsvtr and svtrv2 are all supported by PaddleOCR) and our infer code is as follows.The model weight of repsvtr(openatom_rec_repsvtr_ch_infer) or svtrv2 (openatom_rec_svtrv2_ch_infer)can be downloaded from here.

import os
import json
import numpy
import paddle
import gc
from PIL import Image, ImageOps
from paddleocr import PaddleOCR, draw_ocr
from tqdm import tqdm
def test_paddle(model_ch,model_en,model_all,img: Image, lan: str ):
    img_add_border = add_white_border(img)
    img_ndarray = numpy.array(img_add_border)

    if lan == 'text_simplified_chinese':
        ocr = model_ch
    elif lan == 'text_english':
        ocr = model_en
    else:
        ocr = model_all

    result = ocr.ocr(img_ndarray, cls=True)

    text = ''
    for idx in range(len(result)):
        res = result[idx]
        if not res:
            continue
        for line in res:
            t = line[1][0]
            # print(t)
            text += t
    return text

def add_white_border(img: Image):
    border_width = 50
    border_color = (255, 255, 255)  # 白色
    img_with_border = ImageOps.expand(img, border=border_width, fill=border_color)
    return img_with_border


def poly2bbox(poly):
    L = poly[0]
    U = poly[1]
    R = poly[2]
    D = poly[5]
    L, R = min(L, R), max(L, R)
    U, D = min(U, D), max(U, D)
    bbox = [L, U, R, D]
    return bbox
def main():
    model_ch = PaddleOCR(use_angle_cls=True, lang='ch',rec_model_dir="/data/duyongkun/CPX/OpenOCR/openatom_rec_repsvtr_ch_infer",rec_char_dict_path='/data/duyongkun/CPX/OpenOCR/tools/utils/ppocr_keys_v1.txt')
    model_en = PaddleOCR(use_angle_cls=True, lang='en',rec_model_dir="/data/duyongkun/CPX/OpenOCR/openatom_rec_repsvtr_ch_infer",rec_char_dict_path='/data/duyongkun/CPX/OpenOCR/tools/utils/ppocr_keys_v1.txt')
    model_all = PaddleOCR(use_angle_cls=True,rec_model_dir="/data/duyongkun/CPX/OpenOCR/openatom_rec_repsvtr_ch_infer",rec_char_dict_path='/data/duyongkun/CPX/OpenOCR/tools/utils/ppocr_keys_v1.txt')

    with open('/data/duyongkun/CPX/OmniDocBench/OmniDocBench.json', 'r') as f:
        samples = json.load(f)
    for sample in tqdm(samples):
        img_name = os.path.basename(sample['page_info']['image_path'])
        img_path = os.path.join('/data/duyongkun/CPX/OmniDocBench/images', img_name)
        img = Image.open(img_path)
        if not os.path.exists(img_path):
            print('No exist: ', img_name)
            continue
        for i, anno in enumerate(sample['layout_dets']):
            if not anno.get('text'):
                continue
            # print(anno)
            lan = anno['attribute'].get('text_language', 'mixed')
            bbox = poly2bbox(anno['poly'])
            image = img.crop(bbox).convert('RGB') # crop text block
            outputs = test_paddle(model_ch,model_en,model_all,image, lan) # !!!! String text block的文本内容
            anno['pred'] = outputs
        with open('/data/duyongkun/CPX/OmniDocBench/full_result_valid/OmniDocBench_demo_text_ocr_repsvtr.jsonl', 'a', encoding='utf-8') as f:
            json.dump(sample, f, ensure_ascii=False)
            f.write('\n')

def save_json():
    # 文本OCR质检:gpt-4o/internvl jsonl2json
    with open('/data/duyongkun/CPX/OmniDocBench/full_result_valid/OmniDocBench_demo_text_ocr_repsvtr.jsonl', 'r') as f:
        lines = f.readlines()
    samples = [json.loads(line) for line in lines]
    with open('/data/duyongkun/CPX/OmniDocBench/full_result_valid/OmniDocBench_demo_text_ocr_repsvtr.json', 'w', encoding='utf-8') as f:
        json.dump(samples, f, indent=4, ensure_ascii=False)

if __name__ == '__main__':
    main()
    save_json()

I'd appreciate it if you could re-product the result based on the above code we provide! We have double-checked and found the result nearly the same as we provided before.

@ouyanglinke
Copy link
Collaborator

This is a great strategy and a valuable ablation study result. However, in the leaderboard of OCR evaluate module, we prefers to report end-to-end OCR results of an OCR model. This allows other users to see the direct result of invoking each model according to its official instructions, making it more reproducible. All other OCR model evaluation results are also end-to-end outputs, without using a unified text detection model.

But still, thank you very much for contributing your results and pointing out the potential misunderstandings that our current module naming 'Text Rcognition' might cause. We will rename the module to "Text OCR-end2end" to avoid any misunderstandings.

Moreover, we will consider adding a pure Text Recognition evaluation module in the future.

Have you considered submitting a PR to Paddle or releasing this end-to-end OCR model on PaddleOCR‘s det + PaddleOCR‘s cls + openOCR's rec in OpenOCR GitHub repo? If so, please provide the official model API and we would add this end-to-end model's result to the leaderboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants