magic_pdf-0.10.3-released
myhloli
released this
29 Nov 08:05
·
120 commits
to master
since this release
What's Changed
- fix(Hybrid OCR):Enable Hybrid OCR for Empty Spans That Contain a Certain Number of Placeholders but No Actual Text by @myhloli in #1132
- refactor(para): improve language detection and block splitting by @myhloli in #1134
- feat(pdf_parse): filter out skewed text lines by @myhloli in #1135
- refactor(ocr): improve text processing and span handling by @myhloli in #1136
- refactor(pdf_check): improve character detection using PyMuPDF by @myhloli in #1137
- feat(pdf_parse): add line start flag detection and optimize line stop flag logic by @myhloli in #1138
- fix(ocr_mkcontent): handle empty paragraphs on pages by @myhloli in #1139
- refactor(pdf_parse): adjust character-axis alignment algorithm by @myhloli in #1140
- refactor(ocr): Fix the error of paddleocr failing to initialize in a multi-threaded environment by @myhloli in #1141
Full Changelog: magic_pdf-0.10.2-released...magic_pdf-0.10.3-released