Text Rotate part of OCR Text Recognition Evaluation #12

cpx111 · 2024-12-29T07:08:42Z

Hi! I'm wondering why PaddleOCR performs so well in "Text Rotate" part of OCR Text Recognition Evaluation, since our model performs better in "normal" but fails in "rotate" text. Have you undergone some operation like rotation to change it to one-line form for these vertical shape text? I'd appreciate it if you could offer the evaluation code of PaddleOCR!

ouyanglinke · 2025-01-02T11:28:06Z

Hi, we have not performed any rotation processing on the documents or cropped images.

We just upload the moder infer code of paddleOCR for text recognition. See here.
For evaluation, please refer to Text Recognition section in this repo. We have provided a config template of Text Recognition. You can run the evaluation by:

python pdf_validation.py --config ./configs/ocr.yaml

cpx111 · 2025-01-02T16:35:28Z

Actually,I find that your evaluation concerning PaddleOCR's det+rec+cls parts,rather than only the rec part. Anyway, I fixed the result following your evaluation code by only replacing the recognition model and found our result a bit better than PaddleOCR. Thanks for your attention sincerely!

ouyanglinke · 2025-01-03T01:58:16Z

Yes. Our text recognition is evaluated at the paragraph level, det+rec+cls is necessary for some OCR-based Models (e.g., PaddleOCR).

If your model is open-sourced, we also welcome updating the evaluation results to our leaderboard.

cpx111 · 2025-01-04T07:37:28Z

Thanks for your reply sincerely! Our model OpenOCR uses SVTRv2 as the rec algorithm, which is also supported in PaddleOCR and performs better than default ppocr-v4 rec model. Our evaluation result is as follows. Overall, thanks for your attention again!

Model Type	Model	Language			Text background			Text Rotate
Model Type	Model	EN	ZH	Mixed	White	Single	Multi	Normal	Rotate90	Rotate270	Horizontal
Expert Vision Models	OpenOCR-repsvtr(mobile)	0.071	0.054	0.103	0.060	0.037	0.0777	0.059	0.101	0.296	0.020
	OpenOCR-svtrv2(server)	0.069	0.051	0.094	0.057	0.036	0.0653	0.056	0.017	0.295	0.027
	PaddleOCR-reproduct	0.073	0.056	0.123	0.062	0.045	0.0830	0.062	0.017	0.294	0.019
	PaddleOCR	0.071	0.055	0.118	0.060	0.038	0.0848	0.060	0.015	0.285	0.021
	Tesseract OCR	0.179	0.553	0.553	0.453	0.463	0.394	0.448	0.369	0.979	0.982
	Surya	0.057	0.123	0.164	0.093	0.186	0.235	0.104	0.634	0.767	0.255
	GOT-OCR	0.041	0.112	0.135	0.092	0.052	0.155	0.091	0.562	0.966	0.097
	Mathpix	0.033	0.240	0.261	0.185	0.121	0.166	0.180	0.038	0.185	0.638
Vision Language Models	Qwen2-VL-72B	0.072	0.274	0.286	0.234	0.155	0.148	0.223	0.273	0.721	0.067
	InternVL2-Llama3-76B	0.074	0.155	0.242	0.113	0.352	0.269	0.132	0.610	0.907	0.595
	GPT4o	0.020	0.224	0.125	0.167	0.140	0.220	0.168	0.115	0.718	0.132

ouyanglinke · 2025-01-06T02:25:30Z

Thank you for your contribute. We will double-check the results and update the leaderboard soon.

ouyanglinke · 2025-01-07T08:17:43Z

For double-checking, we also evaluated openOCR. In most evaluation dimension, the scores show only slight fluctuations. However, there is a significant difference in the evaluation results when the text rotation angle is 270 degrees. The comparison of results is as follows:

Model Type	Model	Language			Text background			Text Rotate
Model Type	Model	EN	ZH	Mixed	White	Single	Multi	Normal	Rotate90	Rotate270	Horizontal
Expert Vision Models	OpenOCR-repsvtr(mobile)-reproduct	0.070	0.068	0.106	0.069	0.058	0.081	0.069	0.038	0.891	0.025
	OpenOCR-repsvtr(mobile)	0.071	0.054	0.103	0.060	0.037	0.0777	0.059	0.101	0.296	0.020
	OpenOCR-svtrv2(server)	0.069	0.051	0.094	0.057	0.036	0.0653	0.056	0.017	0.295	0.027

This is our model inference code for openOCR, aligned with model inference code for PaddleOCR, adding a 50-pixel white border to each image.

import cv2
import numpy as np
from openocr import OpenOCR  
from pathlib import Path
import pdb
import sys
from tqdm import tqdm
import logging

import os
import json
import numpy

from PIL import Image, ImageOps

def parse_ocr_line(ocr_line, tmp_img_name):
    ocr_data = json.loads(ocr_line.strip().replace(f'{tmp_img_name}\t[', '[').replace(']\n', ']'))
    parsed_ocr = []
    for item in ocr_data:
        transcription = item["transcription"]
        points = [[float(coord) for coord in point] for point in item["points"]]
        score = float(item["score"])
        parsed_ocr.append([points, [transcription, score]])
    return parsed_ocr

def model_infer(engine, img, lan, img_name):
    img_add_border = add_white_border(img)
    img_ndarray = numpy.array(img_add_border)
    # img = cv2.imdecode(img_ndarray, cv2.IMREAD_COLOR)
    try:
        tmp_img_path = f'tmp_openocr.jpg'
        cv2.imwrite(tmp_img_path, img_ndarray)
        result, elapse = engine(tmp_img_path)
        ocr_results = [parse_ocr_line(line, tmp_img_path) for line in result]
    except:
        print(f"Error performing OCR on {img_name}:")
        return ""

    text = ''
    for idx in range(len(ocr_results)):
        res = ocr_results[idx]
        if not res:
            continue
        for line in res:
            t = line[1][0]
            print(t)
            text += t
    return text

def add_white_border(img: Image):
    border_width = 50
    border_color = (255, 255, 255)
    img_with_border = ImageOps.expand(img, border=border_width, fill=border_color)
    return img_with_border


def poly2bbox(poly):
    L = poly[0]
    U = poly[1]
    R = poly[2]
    D = poly[5]
    L, R = min(L, R), max(L, R)
    U, D = min(U, D), max(U, D)
    bbox = [L, U, R, D]
    return bbox


def main():
    engine = OpenOCR()
    with open('./OmniDocBench/OmniDocBench.json', 'r') as f:
        samples = json.load(f)
    for sample in samples:
        img_name = os.path.basename(sample['page_info']['image_path'])
        img_path = os.path.join('./OmniDocBench/images', img_name)
        img = Image.open(img_path)
        if not os.path.exists(img_path):
            print('No exist: ', img_name)
            continue
        for i, anno in enumerate(sample['layout_dets']):
            if not anno.get('text'):
                continue
            print(anno)
            lan = anno['attribute'].get('text_language', 'mixed')
            bbox = poly2bbox(anno['poly'])
            image = img.crop(bbox).convert('RGB') # crop text block
            outputs = model_infer(engine, image, lan, img_name)

            anno['pred'] = outputs
        with open('./OmniDocBench/result/OmniDocBench_openocr_text_ocr.jsonl', 'a', encoding='utf-8') as f:
            json.dump(sample, f, ensure_ascii=False)
            f.write('\n')

def save_json():
    with open('./OmniDocBench/result/OmniDocBench_openocr_text_ocr.jsonl', 'r') as f:
        lines = f.readlines()
    samples = [json.loads(line) for line in lines]
    with open('./OmniDocBench/result/OmniDocBench_openocr_text_ocr.json', 'w', encoding='utf-8') as f:
        json.dump(samples, f, indent=4, ensure_ascii=False)

if __name__ == '__main__':
    main()
    save_json()

Please let us know if there are any issues in the infer code.
If you have no questions about the results, we will update the openOCR model results on the evaluation leaderboard soon.

cpx111 · 2025-01-07T08:46:36Z

Thanks for your double-checking!I'm sorry that my comment before is not clear enough. The official implementation of OpenOCR's det+cls parts are different from PaddleOCR,which could cause bias in comparison. Thus,for fair comparison only in recognition, we just replace the rec part of PaddleOCR(since repsvtr and svtrv2 are all supported by PaddleOCR) and our infer code is as follows.The model weight of repsvtr(openatom_rec_repsvtr_ch_infer) or svtrv2 (openatom_rec_svtrv2_ch_infer)can be downloaded from here.

import os
import json
import numpy
import paddle
import gc
from PIL import Image, ImageOps
from paddleocr import PaddleOCR, draw_ocr
from tqdm import tqdm
def test_paddle(model_ch,model_en,model_all,img: Image, lan: str ):
    img_add_border = add_white_border(img)
    img_ndarray = numpy.array(img_add_border)

    if lan == 'text_simplified_chinese':
        ocr = model_ch
    elif lan == 'text_english':
        ocr = model_en
    else:
        ocr = model_all

    result = ocr.ocr(img_ndarray, cls=True)

    text = ''
    for idx in range(len(result)):
        res = result[idx]
        if not res:
            continue
        for line in res:
            t = line[1][0]
            # print(t)
            text += t
    return text

def add_white_border(img: Image):
    border_width = 50
    border_color = (255, 255, 255)  # 白色
    img_with_border = ImageOps.expand(img, border=border_width, fill=border_color)
    return img_with_border


def poly2bbox(poly):
    L = poly[0]
    U = poly[1]
    R = poly[2]
    D = poly[5]
    L, R = min(L, R), max(L, R)
    U, D = min(U, D), max(U, D)
    bbox = [L, U, R, D]
    return bbox
def main():
    model_ch = PaddleOCR(use_angle_cls=True, lang='ch',rec_model_dir="/data/duyongkun/CPX/OpenOCR/openatom_rec_repsvtr_ch_infer",rec_char_dict_path='/data/duyongkun/CPX/OpenOCR/tools/utils/ppocr_keys_v1.txt')
    model_en = PaddleOCR(use_angle_cls=True, lang='en',rec_model_dir="/data/duyongkun/CPX/OpenOCR/openatom_rec_repsvtr_ch_infer",rec_char_dict_path='/data/duyongkun/CPX/OpenOCR/tools/utils/ppocr_keys_v1.txt')
    model_all = PaddleOCR(use_angle_cls=True,rec_model_dir="/data/duyongkun/CPX/OpenOCR/openatom_rec_repsvtr_ch_infer",rec_char_dict_path='/data/duyongkun/CPX/OpenOCR/tools/utils/ppocr_keys_v1.txt')

    with open('/data/duyongkun/CPX/OmniDocBench/OmniDocBench.json', 'r') as f:
        samples = json.load(f)
    for sample in tqdm(samples):
        img_name = os.path.basename(sample['page_info']['image_path'])
        img_path = os.path.join('/data/duyongkun/CPX/OmniDocBench/images', img_name)
        img = Image.open(img_path)
        if not os.path.exists(img_path):
            print('No exist: ', img_name)
            continue
        for i, anno in enumerate(sample['layout_dets']):
            if not anno.get('text'):
                continue
            # print(anno)
            lan = anno['attribute'].get('text_language', 'mixed')
            bbox = poly2bbox(anno['poly'])
            image = img.crop(bbox).convert('RGB') # crop text block
            outputs = test_paddle(model_ch,model_en,model_all,image, lan) # !!!! String text block的文本内容
            anno['pred'] = outputs
        with open('/data/duyongkun/CPX/OmniDocBench/full_result_valid/OmniDocBench_demo_text_ocr_repsvtr.jsonl', 'a', encoding='utf-8') as f:
            json.dump(sample, f, ensure_ascii=False)
            f.write('\n')

def save_json():
    # 文本OCR质检：gpt-4o/internvl jsonl2json
    with open('/data/duyongkun/CPX/OmniDocBench/full_result_valid/OmniDocBench_demo_text_ocr_repsvtr.jsonl', 'r') as f:
        lines = f.readlines()
    samples = [json.loads(line) for line in lines]
    with open('/data/duyongkun/CPX/OmniDocBench/full_result_valid/OmniDocBench_demo_text_ocr_repsvtr.json', 'w', encoding='utf-8') as f:
        json.dump(samples, f, indent=4, ensure_ascii=False)

if __name__ == '__main__':
    main()
    save_json()

I'd appreciate it if you could re-product the result based on the above code we provide! We have double-checked and found the result nearly the same as we provided before.

ouyanglinke · 2025-01-08T02:09:47Z

This is a great strategy and a valuable ablation study result. However, in the leaderboard of OCR evaluate module, we prefers to report end-to-end OCR results of an OCR model. This allows other users to see the direct result of invoking each model according to its official instructions, making it more reproducible. All other OCR model evaluation results are also end-to-end outputs, without using a unified text detection model.

But still, thank you very much for contributing your results and pointing out the potential misunderstandings that our current module naming 'Text Rcognition' might cause. We will rename the module to "Text OCR-end2end" to avoid any misunderstandings.

Moreover, we will consider adding a pure Text Recognition evaluation module in the future.

Have you considered submitting a PR to Paddle or releasing this end-to-end OCR model on PaddleOCR‘s det + PaddleOCR‘s cls + openOCR's rec in OpenOCR GitHub repo? If so, please provide the official model API and we would add this end-to-end model's result to the leaderboard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text Rotate part of OCR Text Recognition Evaluation #12

Text Rotate part of OCR Text Recognition Evaluation #12

cpx111 commented Dec 29, 2024

ouyanglinke commented Jan 2, 2025

cpx111 commented Jan 2, 2025

ouyanglinke commented Jan 3, 2025 •

edited

Loading

cpx111 commented Jan 4, 2025

ouyanglinke commented Jan 6, 2025

ouyanglinke commented Jan 7, 2025 •

edited

Loading

cpx111 commented Jan 7, 2025 •

edited

Loading

ouyanglinke commented Jan 8, 2025

Text Rotate part of OCR Text Recognition Evaluation #12

Text Rotate part of OCR Text Recognition Evaluation #12

Comments

cpx111 commented Dec 29, 2024

ouyanglinke commented Jan 2, 2025

cpx111 commented Jan 2, 2025

ouyanglinke commented Jan 3, 2025 • edited Loading

cpx111 commented Jan 4, 2025

ouyanglinke commented Jan 6, 2025

ouyanglinke commented Jan 7, 2025 • edited Loading

cpx111 commented Jan 7, 2025 • edited Loading

ouyanglinke commented Jan 8, 2025

ouyanglinke commented Jan 3, 2025 •

edited

Loading

ouyanglinke commented Jan 7, 2025 •

edited

Loading

cpx111 commented Jan 7, 2025 •

edited

Loading