Suggestions for Improving User Experience in This Great Tool #439

codeplay1997 · 2024-12-21T17:34:47Z

First of all, thank you for creating marker - it's an excellent and very useful tool for PDF text extraction. The quality of the OCR and the overall functionality is impressive.

While using this great tool, I noticed a couple of areas where the user experience could potentially be enhanced:

Language Code Documentation Inconsistency:
- In the README.md under "Convert a single file" > Options, the example shows:
```
--languages TEXT : Optionally specify which languages to use for OCR processing.
Example: --languages "eng,fra,deu" for English, French, and German.
```
- However, Surya OCR actually uses "en" (not "eng"), resulting in:
```
KeyError: 'eng'
```
- The "here" link in documentation shows "en" as the correct code, which creates some confusion
Process Flow Optimization Opportunity:
- Current behavior with large documents (e.g., 700+ pages):
  1. User runs command with incorrect language code
  2. System performs full bbox detection process (taking several minutes)
  3. Error about invalid language code is reported only after this process
- This creates a longer than necessary wait time for users when there's a simple parameter error

Suggested Improvements:

Documentation:
- Either update README to use "en" instead of "eng"
- Or modify Surya's language mapping to accept both codes
Process Flow:
- Add early validation for command arguments before processing starts
- Validate:
  - Language codes
  - File paths
  - Other parameter syntax
- Expected behavior:
```
$ marker_single input.pdf --languages "eng"
Error: Invalid language code "eng". Available codes are: "en" (English), "de" (German), etc.
```

Benefits:

Even better user experience for this already great tool
Saves processing time and computational resources
Clearer documentation for new users

Would you consider implementing these improvements to further enhance this valuable tool?

Thank you again for maintaining this excellent project!

The text was updated successfully, but these errors were encountered:

codeplay1997 changed the title ~~Documentation Inconsistency: Language Code for English ("eng" vs "en")~~ Suggestions for Improving User Experience in This Great Tool Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions for Improving User Experience in This Great Tool #439

Suggestions for Improving User Experience in This Great Tool #439

codeplay1997 commented Dec 21, 2024 •

edited

Loading

Suggestions for Improving User Experience in This Great Tool #439

Suggestions for Improving User Experience in This Great Tool #439

Comments

codeplay1997 commented Dec 21, 2024 • edited Loading

codeplay1997 commented Dec 21, 2024 •

edited

Loading