Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make_secret but for strings or other data #133

Open
DannyDaemonic opened this issue Mar 24, 2023 · 0 comments
Open

make_secret but for strings or other data #133

DannyDaemonic opened this issue Mar 24, 2023 · 0 comments

Comments

@DannyDaemonic
Copy link

DannyDaemonic commented Mar 24, 2023

I noticed wyhash doesn't provide a way to convert a string or array of bytes into secret data. In most of the example code I've seen outside of this depot people use strings to seed the secret data poorly. There's no good examples.

At first I thought about XOR'ing the data with the _wyp default secret data. I just looped over the secret array until I ran out of data. Then I worried common letters in a string might cancel out each other. (For example, if both data[0] and data[32] were the same letter.) So I thought to mix the data in using _wymix:

    secret[0] = _wyp[0]; secret[1] = _wyp[1]; secret[2] = _wyp[2]; secret[3] = _wyp[3]; 
    uint64_t offset = 0;
    while (len > 8) {
        secret[offset] = _wymix(*((uint64_t *)p), secret[offset]);
        p += 8;
        len -= 8;
        offset = (offset + 1) % 4;
    }

But then sometimes we have 1 to 7 bytes remaining.

    if (len > 0) {
        uint64_t last = 0;
        memcpy(&last, p, len);
        last = _wyr8((const uint8_t *)&last); // Correct endianness?
        secret[offset] = _wymix(last, secret[offset]);
    }

But then I realized strings can be short and in those cases, this function may not perform as well as make_secret.

So I thought about making a special case for when data length is 8 bytes or less, but then it was getting to branchy for my taste. I thought, what if I don't use _wyp to initialize the secret data? What if I let make_secret initialize the secret data using the last 1 to 8 bytes?

This is what I've got currently:

static void make_secret_from_data(const void *data, int64_t len, uint64_t *secret) {
    const uint8_t *p = (const uint8_t *)data;

    // Initialize secret with the last 1 to 8 bytes
    uint64_t remainder_len = ((len - 1) % 8u) + 1;
    uint64_t remainder_start = len - remainder_len;
    len -= remainder_len;
    uint64_t remainder_data = 0;
    memcpy(&remainder_data, p + remainder_start, remainder_len);
    remainder_data = _wyr8((const uint8_t *)&remainder_data); // Correct endianness?
    make_secret(remainder_data, secret);

    // Mix in the rest of the data
    uint64_t offset = 0;
    while (len >= 8) {
        secret[offset] = _wymix(*((uint64_t *)p), secret[offset]);
        offset = (offset + 1) % 4;
        p += 8;
        len -= 8;
    }
}

I had to do the weird ((len - 1) % 8u) + 1 thing so that when a len of 0 is passed in, it calls memcpy with 0 instead of 8.

Alternatively I thought about just using the last 8 bytes of data to seed make_secret regardless of alignment:

static void make_secret_from_data(const void *data, int64_t len, uint64_t *secret) {
    const uint8_t *p = (const uint8_t *)data;

    // Initialize secret with the last 1 to 8 bytes
    uint64_t last_len = len < 8 ? len : 8;
    uint64_t last_start = len - last_len;
    uint64_t last = 0;
    memcpy(&last, p + last_start, last_len);
    last = _wyr8((const uint8_t *)&last); // Correct endianness?
    make_secret(last, secret);

    // Mix in the rest of the data
    uint64_t offset = 0;
    while (len > 8) {
        secret[offset] = _wymix(*((uint64_t *)p), secret[offset]);
        offset = (offset + 1) % 4;
        p += 8;
        len -= 8;
    }
}

The memcpy isn't always aligned in this case and the ternary operator adds an extra if branch, but those probably aren't a big deal overall. (I also adjusted the last loop from len >=8 to len > 8 since we can skip the last 8 now.)

Do either of those seem appropriate? Is there a better approach to building secret data from string/data?

I think it would be useful to have an official way to do this. Would you be interested in a pull request adding one of these make_secret_from_data functions? I would compress the code down to the style currently used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant