Tess4j - Error opening tessdata file by non-ASCII path #190

AliaksandrKi · 2020-07-22T16:32:25Z

OS: Windows 10
IDE: IntelliJ
tess4j: 4.5.1

I have two folders on my disc with equal 'eng.traineddata' files:

c:/data/eng.traineddata
c:/дата/eng.traineddata

And tesseract fails while running next code:

Tesseract instance = new Tesseract();
// instance.setDatapath("c:/data");    // works without issues
instance.setDatapath("c:/дата");    // see Error message below
instance.setLanguage("eng");

String result = instance.doOCR(new File("c:/numbers.jpg"));

Error message:

Error opening data file c:/дата/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!

The text was updated successfully, but these errors were encountered:

nguyenq · 2020-07-22T19:05:13Z

The error is pretty clear: you can't have non-ASCII characters in tessdata path. 'д' is not an ASCII character.

Snipx · 2020-07-23T22:53:35Z

@nguyenq thanks for the feedback! Could you provide a but more context here? Like if the root cause is on the Tesseract side or on the wrapper side, are there any workarounds available or any plans to support non-ASCII paths?

nguyenq · 2020-07-23T23:40:43Z

It could be JNA or it could be inside Tesseract native code. On Linux, Tesseract and its tessdata directory are placed in standard system directories, so I doubt Tesseract code would ever need to deal with non-ASCII characters in those paths.

On Windows, you may want to try with a relative path without containing non-ASCII characters to see if it would work.

Maybe related to Issue #75.

Mararsh · 2020-10-08T06:59:23Z

Failure may happen when non-ascii exist in either source filename, data files names, or target filename.
Meanwhile, same file names work when run tesseract command by ProcessBuilder.

You are right that the reason may be at java side when it handle filename with local API.
A jdk bug:
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8205991

nguyenq changed the title ~~Tessj4 - Error opening tessdata file by non-ASCII path~~ Tess4j - Error opening tessdata file by non-ASCII path Jul 26, 2021

nguyenq mentioned this issue Jan 25, 2024

Unable to set non-English datapath in Tess4J #252

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tess4j - Error opening tessdata file by non-ASCII path #190

Tess4j - Error opening tessdata file by non-ASCII path #190

AliaksandrKi commented Jul 22, 2020

nguyenq commented Jul 22, 2020

Snipx commented Jul 23, 2020

nguyenq commented Jul 23, 2020 •

edited

Loading

Mararsh commented Oct 8, 2020

Tess4j - Error opening tessdata file by non-ASCII path #190

Tess4j - Error opening tessdata file by non-ASCII path #190

Comments

AliaksandrKi commented Jul 22, 2020

nguyenq commented Jul 22, 2020

Snipx commented Jul 23, 2020

nguyenq commented Jul 23, 2020 • edited Loading

Mararsh commented Oct 8, 2020

nguyenq commented Jul 23, 2020 •

edited

Loading