Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize the hyphenation bounds? #20

Closed
baskerville opened this issue Dec 3, 2018 · 8 comments
Closed

Customize the hyphenation bounds? #20

baskerville opened this issue Dec 3, 2018 · 8 comments

Comments

@baskerville
Copy link
Contributor

The hyphenation bounds are currently hardcoded. Would it be possible to load the bounds from a file instead? For example, a file named en-us.bnd.txt with content 2 3 would define the bounds for the EnglishUS language.

@tapeinosyne
Copy link
Owner

Although default values are indeed hard-coded, the hyphenation boundaries are a public field of the dictionary type and can be changed freely. I will add a note in the documentation to this effect.

@baskerville
Copy link
Contributor Author

baskerville commented Jan 4, 2019

It doesn't seem to be working:

    let word = "hyphenation";
    let mut en_us = Standard::from_path(Language::EnglishUS,
                                        "en-us.standard.bincode").unwrap();
    println!("{:?}", en_us.opportunities(word));
    en_us.minima.0 = 3;
    println!("{:?}", en_us.opportunities(word));

The output I'm getting is:

[2, 6, 7]
[2, 6, 7]

@baskerville
Copy link
Contributor Author

baskerville commented Jan 5, 2019

I can get close to the expected result ([6, 7]), with:

en_us.opportunities_within(word, en_us.boundaries(word).unwrap())

which returns [6].

@baskerville
Copy link
Contributor Author

In fact it seems to occur only when a word has a known hyphenation:

    fn opportunities(&'h self, lowercase_word : &str) -> Vec<Self::Opportunity> {
        match self.boundaries(lowercase_word) {
            None => vec![],
            Some(mins) => {
                match self.exact(lowercase_word) {
                    None => self.opportunities_within(lowercase_word, mins),
                    Some(known) => known
                }
            }
        }
    }

You can't just return known here, you have to take mins into account.

@baskerville
Copy link
Contributor Author

The reason opportunities_within returns [6] instead of [6, 7], in the above example, is because it doesn't rely on self.exact.

@tapeinosyne
Copy link
Owner

I have a vague recollection of deliberately skipping boundary checks for exceptions, but you are right: the current behavior is counter-intuitive at best and plainly broken at worst. I'll see if I can release an updated version over the weekend.

@baskerville
Copy link
Contributor Author

the current behavior is counter-intuitive at best and plainly broken at worst

I concur.

@tapeinosyne
Copy link
Owner

Fixed in b5f526f.

Exceptions now respect hyphenation boundaries. The old behavior can be replicated by manually calling the internal method exception_within with (0, word.len()) as custom boundaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants