Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch for distance.c: minor off-by-one error #32

Open
GoogleCodeExporter opened this issue Feb 27, 2016 · 0 comments
Open

Patch for distance.c: minor off-by-one error #32

GoogleCodeExporter opened this issue Feb 27, 2016 · 0 comments

Comments

@GoogleCodeExporter
Copy link

This is really nitpicky but.... When populating the vocab array, distance.c 
begins skipping characters after index max_w (having read 51 characters), but 
it should have stopped after index max_w - 1. Consequently, the string 
terminator for long strings is entered in the space reserved for the subsequent 
string, and is overwritten when the next string is read in causing the two to 
be mashed together.

For example, when searching for Cash_Flow given the current (as of 2015-06-15) 
GoogleNews-vectors-negative300.bin, two results overflow the printf format 
buffer, which is padded for strings up to length 50; indeed these two string do 
not appear in the vocabulary, but are constructed when two vocabulary entries 
-- a long one followed by a normal one -- are mashed together as described 
above. After applying the attached patch the printf formatting looks fine as 
only the first 50 characters of the long entries are printed.

Original issue reported on code.google.com by [email protected] on 17 Jun 2015 at 12:24

Attachments:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant