-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text Rotate part of OCR Text Recognition Evaluation #12
Comments
Hi, we have not performed any rotation processing on the documents or cropped images. We just upload the moder infer code of paddleOCR for text recognition. See here. python pdf_validation.py --config ./configs/ocr.yaml |
Actually,I find that your evaluation concerning PaddleOCR's det+rec+cls parts,rather than only the rec part. Anyway, I fixed the result following your evaluation code by only replacing the recognition model and found our result a bit better than PaddleOCR. Thanks for your attention sincerely! |
Yes. Our text recognition is evaluated at the paragraph level, det+rec+cls is necessary for some OCR-based Models (e.g., PaddleOCR). If your model is open-sourced, we also welcome updating the evaluation results to our leaderboard. |
Thanks for your reply sincerely! Our model OpenOCR uses SVTRv2 as the rec algorithm, which is also supported in PaddleOCR and performs better than default ppocr-v4 rec model. Our evaluation result is as follows. Overall, thanks for your attention again!
|
Thank you for your contribute. We will double-check the results and update the leaderboard soon. |
For double-checking, we also evaluated openOCR. In most evaluation dimension, the scores show only slight fluctuations. However, there is a significant difference in the evaluation results when the text rotation angle is 270 degrees. The comparison of results is as follows:
This is our model inference code for openOCR, aligned with model inference code for PaddleOCR, adding a 50-pixel white border to each image. import cv2
import numpy as np
from openocr import OpenOCR
from pathlib import Path
import pdb
import sys
from tqdm import tqdm
import logging
import os
import json
import numpy
from PIL import Image, ImageOps
def parse_ocr_line(ocr_line, tmp_img_name):
ocr_data = json.loads(ocr_line.strip().replace(f'{tmp_img_name}\t[', '[').replace(']\n', ']'))
parsed_ocr = []
for item in ocr_data:
transcription = item["transcription"]
points = [[float(coord) for coord in point] for point in item["points"]]
score = float(item["score"])
parsed_ocr.append([points, [transcription, score]])
return parsed_ocr
def model_infer(engine, img, lan, img_name):
img_add_border = add_white_border(img)
img_ndarray = numpy.array(img_add_border)
# img = cv2.imdecode(img_ndarray, cv2.IMREAD_COLOR)
try:
tmp_img_path = f'tmp_openocr.jpg'
cv2.imwrite(tmp_img_path, img_ndarray)
result, elapse = engine(tmp_img_path)
ocr_results = [parse_ocr_line(line, tmp_img_path) for line in result]
except:
print(f"Error performing OCR on {img_name}:")
return ""
text = ''
for idx in range(len(ocr_results)):
res = ocr_results[idx]
if not res:
continue
for line in res:
t = line[1][0]
print(t)
text += t
return text
def add_white_border(img: Image):
border_width = 50
border_color = (255, 255, 255)
img_with_border = ImageOps.expand(img, border=border_width, fill=border_color)
return img_with_border
def poly2bbox(poly):
L = poly[0]
U = poly[1]
R = poly[2]
D = poly[5]
L, R = min(L, R), max(L, R)
U, D = min(U, D), max(U, D)
bbox = [L, U, R, D]
return bbox
def main():
engine = OpenOCR()
with open('./OmniDocBench/OmniDocBench.json', 'r') as f:
samples = json.load(f)
for sample in samples:
img_name = os.path.basename(sample['page_info']['image_path'])
img_path = os.path.join('./OmniDocBench/images', img_name)
img = Image.open(img_path)
if not os.path.exists(img_path):
print('No exist: ', img_name)
continue
for i, anno in enumerate(sample['layout_dets']):
if not anno.get('text'):
continue
print(anno)
lan = anno['attribute'].get('text_language', 'mixed')
bbox = poly2bbox(anno['poly'])
image = img.crop(bbox).convert('RGB') # crop text block
outputs = model_infer(engine, image, lan, img_name)
anno['pred'] = outputs
with open('./OmniDocBench/result/OmniDocBench_openocr_text_ocr.jsonl', 'a', encoding='utf-8') as f:
json.dump(sample, f, ensure_ascii=False)
f.write('\n')
def save_json():
with open('./OmniDocBench/result/OmniDocBench_openocr_text_ocr.jsonl', 'r') as f:
lines = f.readlines()
samples = [json.loads(line) for line in lines]
with open('./OmniDocBench/result/OmniDocBench_openocr_text_ocr.json', 'w', encoding='utf-8') as f:
json.dump(samples, f, indent=4, ensure_ascii=False)
if __name__ == '__main__':
main()
save_json() Please let us know if there are any issues in the infer code. |
Thanks for your double-checking!I'm sorry that my comment before is not clear enough. The official implementation of OpenOCR's det+cls parts are different from PaddleOCR,which could cause bias in comparison. Thus,for fair comparison only in recognition, we just replace the rec part of PaddleOCR(since repsvtr and svtrv2 are all supported by PaddleOCR) and our infer code is as follows.The model weight of repsvtr(openatom_rec_repsvtr_ch_infer) or svtrv2 (openatom_rec_svtrv2_ch_infer)can be downloaded from here. import os
import json
import numpy
import paddle
import gc
from PIL import Image, ImageOps
from paddleocr import PaddleOCR, draw_ocr
from tqdm import tqdm
def test_paddle(model_ch,model_en,model_all,img: Image, lan: str ):
img_add_border = add_white_border(img)
img_ndarray = numpy.array(img_add_border)
if lan == 'text_simplified_chinese':
ocr = model_ch
elif lan == 'text_english':
ocr = model_en
else:
ocr = model_all
result = ocr.ocr(img_ndarray, cls=True)
text = ''
for idx in range(len(result)):
res = result[idx]
if not res:
continue
for line in res:
t = line[1][0]
# print(t)
text += t
return text
def add_white_border(img: Image):
border_width = 50
border_color = (255, 255, 255) # 白色
img_with_border = ImageOps.expand(img, border=border_width, fill=border_color)
return img_with_border
def poly2bbox(poly):
L = poly[0]
U = poly[1]
R = poly[2]
D = poly[5]
L, R = min(L, R), max(L, R)
U, D = min(U, D), max(U, D)
bbox = [L, U, R, D]
return bbox
def main():
model_ch = PaddleOCR(use_angle_cls=True, lang='ch',rec_model_dir="/data/duyongkun/CPX/OpenOCR/openatom_rec_repsvtr_ch_infer",rec_char_dict_path='/data/duyongkun/CPX/OpenOCR/tools/utils/ppocr_keys_v1.txt')
model_en = PaddleOCR(use_angle_cls=True, lang='en',rec_model_dir="/data/duyongkun/CPX/OpenOCR/openatom_rec_repsvtr_ch_infer",rec_char_dict_path='/data/duyongkun/CPX/OpenOCR/tools/utils/ppocr_keys_v1.txt')
model_all = PaddleOCR(use_angle_cls=True,rec_model_dir="/data/duyongkun/CPX/OpenOCR/openatom_rec_repsvtr_ch_infer",rec_char_dict_path='/data/duyongkun/CPX/OpenOCR/tools/utils/ppocr_keys_v1.txt')
with open('/data/duyongkun/CPX/OmniDocBench/OmniDocBench.json', 'r') as f:
samples = json.load(f)
for sample in tqdm(samples):
img_name = os.path.basename(sample['page_info']['image_path'])
img_path = os.path.join('/data/duyongkun/CPX/OmniDocBench/images', img_name)
img = Image.open(img_path)
if not os.path.exists(img_path):
print('No exist: ', img_name)
continue
for i, anno in enumerate(sample['layout_dets']):
if not anno.get('text'):
continue
# print(anno)
lan = anno['attribute'].get('text_language', 'mixed')
bbox = poly2bbox(anno['poly'])
image = img.crop(bbox).convert('RGB') # crop text block
outputs = test_paddle(model_ch,model_en,model_all,image, lan) # !!!! String text block的文本内容
anno['pred'] = outputs
with open('/data/duyongkun/CPX/OmniDocBench/full_result_valid/OmniDocBench_demo_text_ocr_repsvtr.jsonl', 'a', encoding='utf-8') as f:
json.dump(sample, f, ensure_ascii=False)
f.write('\n')
def save_json():
# 文本OCR质检:gpt-4o/internvl jsonl2json
with open('/data/duyongkun/CPX/OmniDocBench/full_result_valid/OmniDocBench_demo_text_ocr_repsvtr.jsonl', 'r') as f:
lines = f.readlines()
samples = [json.loads(line) for line in lines]
with open('/data/duyongkun/CPX/OmniDocBench/full_result_valid/OmniDocBench_demo_text_ocr_repsvtr.json', 'w', encoding='utf-8') as f:
json.dump(samples, f, indent=4, ensure_ascii=False)
if __name__ == '__main__':
main()
save_json() I'd appreciate it if you could re-product the result based on the above code we provide! We have double-checked and found the result nearly the same as we provided before. |
This is a great strategy and a valuable ablation study result. However, in the leaderboard of OCR evaluate module, we prefers to report end-to-end OCR results of an OCR model. This allows other users to see the direct result of invoking each model according to its official instructions, making it more reproducible. All other OCR model evaluation results are also end-to-end outputs, without using a unified text detection model. But still, thank you very much for contributing your results and pointing out the potential misunderstandings that our current module naming 'Text Rcognition' might cause. We will rename the module to "Text OCR-end2end" to avoid any misunderstandings. Moreover, we will consider adding a pure Text Recognition evaluation module in the future. Have you considered submitting a PR to Paddle or releasing this end-to-end OCR model on PaddleOCR‘s det + PaddleOCR‘s cls + openOCR's rec in OpenOCR GitHub repo? If so, please provide the official model API and we would add this end-to-end model's result to the leaderboard. |
Hi! I'm wondering why PaddleOCR performs so well in "Text Rotate" part of OCR Text Recognition Evaluation, since our model performs better in "normal" but fails in "rotate" text. Have you undergone some operation like rotation to change it to one-line form for these vertical shape text? I'd appreciate it if you could offer the evaluation code of PaddleOCR!
The text was updated successfully, but these errors were encountered: