Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Font #7

Open
Bengeljo opened this issue Feb 23, 2024 · 1 comment
Open

Custom Font #7

Bengeljo opened this issue Feb 23, 2024 · 1 comment

Comments

@Bengeljo
Copy link

I always get an error when I want to use a font, it is installed and can be find by windows and even looking it up works perfectly. When I run the split_training_text.py I get the following Error:
Fontconfig error: Cannot load default config file: No such file: (null)
Fontconfig error: Cannot load default config file: No such file: (null)
Could not find font named 'Quadrant'.
Pango suggested font 'Cascadia Code'.
Please correct --font arg.

I want to train the model on Quadrat-Serial-Regular.ttf but it just won't regognize it. I tried to look it up but can't find it. Modifying the font flag doesn't help since it wants a name but it can't find it even tho it is there, but tbh I don't know where it is searching for the fonts.

The Folder is located on the SSD E: and the operating system is on C: but tesseract and python are in the path of C: so they should get access to it. Please help

@Antonio-Serrat
Copy link

I always get an error when I want to use a font, it is installed and can be find by windows and even looking it up works perfectly. When I run the split_training_text.py I get the following Error: Fontconfig error: Cannot load default config file: No such file: (null) Fontconfig error: Cannot load default config file: No such file: (null) Could not find font named 'Quadrant'. Pango suggested font 'Cascadia Code'. Please correct --font arg.

I want to train the model on Quadrat-Serial-Regular.ttf but it just won't regognize it. I tried to look it up but can't find it. Modifying the font flag doesn't help since it wants a name but it can't find it even tho it is there, but tbh I don't know where it is searching for the fonts.

The Folder is located on the SSD E: and the operating system is on C: but tesseract and python are in the path of C: so they should get access to it. Please help

Hi @BengeljoI have the same issue, and finally can use the script use my font but I use Linux. Anyway I think this probably helps you to get an idea about this error. For now I don't know what is the problem.

I rewrote the python script to that. and this works for me.

I made a font.config '''locally''' only for the script, but before that you need to properly install the font.

After install you can check this using command fc-list | gresp "fontname" this should show you the font and in which dir si placed.

Then you have to use this path to place it in your <dir> in the custom font.config.

import os
import random
import pathlib
import subprocess
import tempfile

# Create the fontconfig file into temp dir
with tempfile.TemporaryDirectory() as tempdir:
    fontconfig_dir = os.path.join(tempdir, 'fontconfig')
    os.makedirs(fontconfig_dir)

    fontconfig_content = """<?xml version="1.0"?>
    <!DOCTYPE fontconfig SYSTEM "fonts.dtd">
    <fontconfig>
      <dir> <HERE/YOUR/FONT/PATH> </dir>
      <cachedir>YOUR/CACHE/DIR</cachedir>
      <config>
        <match target="scan">
          <test name="family">
            <string>YOURFONT</string>
          </test>
          <edit name="family" mode="assign">
            <string>YOURFONT</string>
          </edit>
        </match>
      </config>
    </fontconfig>
    """

    fontconfig_file_path = os.path.join(fontconfig_dir, 'fonts.conf')
    with open(fontconfig_file_path, 'w') as f:
        f.write(fontconfig_content)

    # Add the specifics env variables only for use with this script
    os.environ['FONTCONFIG_PATH'] = fontconfig_dir
    os.environ['FONTCONFIG_FILE'] = fontconfig_file_path

    # Update Fontconfig cache
    subprocess.run(['fc-cache', '-fv'], check=True)

    training_text_file = 'YOUR/LANG/TRAIINIG/DATA'
    lines = []

    with open(training_text_file, 'r') as input_file:
        for line in input_file.readlines():
            lines.append(line.strip())

    output_directory = 'WHERE_YOU/WANT_TO/OUTPUT_DATA'

    if not os.path.exists(output_directory):
        os.mkdir(output_directory)

    random.shuffle(lines)
    count = 20000
    lines = lines[:count]

    line_count = 0
    for line in lines:
        training_text_file_name = pathlib.Path(training_text_file).stem
        line_training_text = os.path.join(output_directory, f'{training_text_file_name}_{line_count}.gt.txt')
        with open(line_training_text, 'w') as output_file:
            output_file.writelines([line])

        file_base_name = f'LANG_{line_count}' 

        subprocess.run([
            'text2image',
            f'--font=YOURFONT',
            f'--text={line_training_text}',
            f'--outputbase={output_directory}/{file_base_name}',
            '--max_pages=1',
            '--strip_unrenderable_words',
            '--leading=32',
            '--xsize=3600',
            '--ysize=480',
            '--char_spacing=1.0',
            '--exposure=0',
            '--unicharset_file=langdata/eng.unicharset',
        ], check=True)

        line_count += 1

I hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants