-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API and speed are... problematic. #8
Comments
Hello! Thanks for opening an issue. I'll preface my response by pointing out that this crate is sort of dead/hasn't been updated in a year, and there are some critical issues needed for basic intended functionality, namely #1, #2, #3 Anyway, on to your points:
Can you tell me why you want to benchmark crates? Are you using morton encoding for performance-critical applications? I think there's a huge context gap here because I've never actually used morton encoding in production. I'll comment on that further in a later part of your question.
Yes, I agree. This is tracked in #6 and #7. You can take a look at the structs in the docs for a reference on which primitives have partner BitCollections provided. There are also a couple examples of custom BitCollections, since the provided ones are sort of half-way between reference and utility functions.
It seems a little bit weird to me to try writing some random code without first reading the 3-line example on the readme... In any case, though, I agree with your point.
Well, no 😅 I designed this for use with heterogeneous types. Maybe you can explain how morton encoding is normally used? I think I am missing something, but heterogeneous is definitely my goal from the start.
Good point. I hope that you at least saw this example before trying to figure out the Vob APIs. I've opened #9 to track this.
(Commenting on the whole problem 2 now) something is a bit off about the conversation here, but it's not clear to me what the most crucial detail is. I think it comes down to not really know ing how morton encoding is used in production. Anyway:
I would definitely be interested in using morton-encoding as a dependency to provide the higher level APIs, but AFAICT your basic unit of API comprises flat functions with type parameters in the name, and the only way I can imagine using that in a generic way is with some macro kung-fu. (Is that true? It looks like the morton_encode function is generic, actually - in which case perhaps the specialised functions could be moved into a module for legibility) And finally, the answer to your question, I'll repeat that I've never used morton encoding in production. I made this crate after reading this dynamodb blog post, which has a couple implications...:
BTW, both BitCollection and Vob (used heavily in zdex) are (a) probably not performance-optimised or production-ready and (b) seem to be optimised for size, which isn't really needed here |
Sorry about the delay! Time to answer. First things first: Now that I re-read the original post without the sleep deprivation, I realise it was probably more confrontational than it should have been. Seriously, thank you for responding so positively. Now, on to the contents themselves: I won't go through every point in order, but I hope to cover them all by the end.
So, to-do list for later: |
So, I got around to benchmarking it. I whipped up a quick implementation of my fractal dimension counting algorithm and tested it on some colour image substitutes, using Results: For 16 megapixels (ie And that's not even the worst part. The worst part would be...
But can the Sir, am I correct in understanding that you're trying to make this crate so the result can be used as a key in a database? If so, wouldn't it need to be, y'know, comparable? Please find the benchmarking code attached.const TIME_ENTIRE_ALGORITHM: bool = false;
macro_rules! time {
($x: expr) => {{
// eprintln!("Measuring expression: {}", stringify!($x));
let begin = std::time::Instant::now();
let result = $x;
let tim = begin.elapsed();
println!("Time elapsed: {:#?}\n", begin.elapsed());
(result, tim)
}};}
fn morton_encode_u8_5d_zdex (input: [u8; 5]) -> u64 {
use zdex::*;
let usize_bits = 8*core::mem::size_of::<usize>();
let transmute_input = |x: &u8| -> FromU8 {(*x).into()};
input // Take the original input,
.iter() // element by element...
.map(transmute_input) // Transform each to the custom input types zdex needs...
.z_index() // Compute the result...
.unwrap() // Panic if there's an error... (Can there be one? Who knows!)
.iter_storage() // Take the result usize by usize...
.fold(0 as u64, |acc, a| (acc<<usize_bits) | a as u64)
// ...and finally, unify the iterator of usizes into a single u64.
// Can you just FEEL the ergonomics?
}
fn morton_encode_u8_5d_naive (input: [u8; 5]) -> u64 {
let mut coordinate_mask = 1u8;
let mut key_mask = 1u64;
let mut result = 0u64;
for _ in 0..8 {
for &coordinate in input.iter().rev() {
if coordinate & coordinate_mask != 0 {
result |= key_mask;
}
key_mask <<= 1;
}
coordinate_mask <<= 1;
}
result
}
fn main() {
println!("First, two small tests to ensure that the functions work correctly (Which, siiiiiiiigh...)");
let argh_2 = [1, 2, 3, 4, 5u8];
let argh = [128u8; 5usize];
println!("Input:[\n{}]", argh.iter().map(|x| format!("{:08b}\n", x)).collect::<String>());
println!("{:040b} {:040b}\nAs if all the rest wasn't enough, zdex puts the most significant bits at the end...", morton_encode_u8_5d_naive(argh), morton_encode_u8_5d_zdex(argh));
//println!("{:040b}", morton_encode_u8_5d_zdex(argh).max(morton_encode_u8_5d_zdex(argh_2)));
// Create the 4 images we usually use as touch-stones...
let limit_1d: usize = 1<<12;
let limit_2d: usize = limit_1d * limit_1d;
let mut test_images: [Vec<[u8; 5]>; 4] = [Vec::with_capacity(limit_2d), Vec::with_capacity(limit_2d), Vec::with_capacity(limit_2d), Vec::with_capacity(limit_2d)];
for x in 0..limit_1d {
for y in 0..limit_1d {
let x = ((x*256)/limit_1d) as u8;
let y = ((y*256)/limit_1d) as u8;
let pixel_fd_2 = [x, y, x, y, 128];
let pixel_fd_3 = [x, y, x, y, rand::random::<u8>()];
let pixel_fd_4 = [x, y, x, rand::random::<u8>(), rand::random::<u8>()];
let pixel_fd_5 = [x, y, rand::random::<u8>(), rand::random::<u8>(), rand::random::<u8>()];
test_images[0].push(pixel_fd_2);
test_images[1].push(pixel_fd_3);
test_images[2].push(pixel_fd_4);
test_images[3].push(pixel_fd_5);
// The fractal dimensions of the test images will be respectively equal to 2, 3, 4 and 5.
}
}
// Time to get working!
for i in 0..2 {
let encod_fn = if i==0 {morton_encode_u8_5d_naive} else {morton_encode_u8_5d_zdex};
println!("\n\n\nNow testing: {}!", if i==0 {"morton_encoding"} else {"zdex"});
for test_image in &test_images {
let _ = time!{{
let mut partition_count = [0usize; 8];
partition_count[0] += 1; //
let mut img_copy = test_image
.iter()
.map(|&x| encod_fn(x))
.collect::<Vec<_>>();
if TIME_ENTIRE_ALGORITHM {
img_copy.sort();
img_copy
.windows(2)
.map(|x| x[0] ^ x[1])
.map(|x| x.leading_zeros() as u8)
.map(|x| x - 24) // The first 24 bits are always going to be zero.
.map(|x| x/5) // We take it 5 bits at a time, because we always divide every axis in two simultaneously.
.filter(|&x| x<8) // If they're the same bit for bit, there's no reason to count them twice...
.for_each(|x|
partition_count[x as usize] += 1
);
for i in 0..7 {
partition_count[i+1] += partition_count[i];
// We want cumulative counts.
}
println!("{:?}", partition_count);
}
}};
}
}
} |
I read the article. With that, the afore-mentioned to-do list is complete. So, with all that said: I now realise why exactly you wanted z-order keys of heterogeneous data. However, I don't really think there's a general solution to the issue. For instance, let's take a look at the simple example you gave, about the latitude and the longitude. Thus, for an ideal z-order key of latitude and longitude, you would need to...
As you can see, even this was actually very complicated, and most importantly it does not generalise to other |
Hey, Sorry for the delay in response here. Happy new year! As mentioned above, I am not really investing in this crate at the moment. In any case:
I understand you've read the AWS blog post now, but for any external readers: morton encoding indeed groups bits of equal importance, but if you right-pad unequal sized inputs, you can still get a useful encoding on the output. Right-padding is not the only critical technique though for useful z-order indexing, as described in more detail in the aforementioned blog.
I don't really understand why this is needed, but I am probably not quite invested enough. The approach taken in zdex is using trait bounds to allow input types to determine their bit representation - i.e. zdex computes morton encoding of bit collections and types that have bitcollection representations. Providing a default bitcollection representation for primitive types is just convenience.
Cool! Thanks for explaining.
On a completely unrelated side-note, this doesn't seem like a big problem in today's world, as your desired work is trivially parallelisable
Hmm, I'm not sure what's wrong. Is it possible that the code fails when iter_storage has more than one element? My example code does
Yes, but I think this just comes back to #9
For avoidance of doubt: At the moment, no there can not but I made the API use Result to avoid breaks in future.
Yes, it's described in the DynamoDB blog post that z-order indexing requires thinking about your schema's bit-representation and making sure that your encoding works for your use case. So I agree that there's no general purpose solution to the z-indexing database key problem, but disagree that zdex can't be used for it in the general sense.
Probably it doesn't matter
The use-case is for database query constraints.
I think this comes down to a question: is it worth providing default transforms for floating point types? I think yes, as long as the limitations are documented. I am not really sure what next-steps are. I didn't check your code, but I suspect that I can't yet make zdex utilise morton-encoding, is that right? |
First things first: Happy new year! I'll answer your post in more detail tomorrow. For now, however, I need to ask you something much more important: For your use-case, a Hilbert index would be in all ways completely superior to a Z-index, wouldn't it? I ask this because over at the |
It doesn't seem likely that understanding the hilbert encoding would be a worthwhile investment given that I need #2 and #3, which may take deep investigation/understanding. @DoubleHyphen I recommend that we close this issue if there isn't a library readily available to act as a drop-in replacement for BitCollection with all of the requirements outlined above (and the snippet in that hilbert link is far from close enough, frankly) |
Yes, I agree with the closure of this issue. Any productive results that could have arisen out of it have already arisen. Slinks away as inconspicuously as possible |
tl;dr Please, help me make both our crates better.
Hello! I'm John, creator and maintainer of rival crate "
morton-encoding
".With the arrival of
min-const-generics
pretty soon, I wanted to refactor, and so I was doing some snooping around to see if there was anyone else who had implemented the same things. Thus, I chanced upon your crate, and decided to benchmark it. So I ran into...Problem no. 1: The API is really, really hard to use with primitive types.
OK, time to get results from the crate. First try,
[x1, x2].z_index()
... No dice.Maybe
[x1, x2].iter().z_index()
? Nope, that's not it either.At this point, I gave up and did it with a tuple just as it shows in the documentation. But seriously, a tuple?! You want the user to input a certain quantity of homogeneous data, and your first idea is to use a generally heterogeneous type?
And don't even get me started on trying to transmute the result to a primitive type. Was
result as Key
too easy or something? Or wasresult.unwrap().iter_storage().fold(0 as Key, |acc, a| (acc<<usize_bits) | a as Key)
the most ergonomic way you could manage? (Oh, and did I need to put a.rev()
there or not? Who knows!)Okay, fine. We've managed to make a function that takes an input (
[u32; 2]
) and gives a result (u64
). That's something. Until, of course, I ran into...Problem no. 2: The speed is low. Comically low. Abysmally, inexplicably low.
I ran
cargo rustc --release --lib -- --emit asm
and took a look at the emitted assembly.Then, I called a crane to pick my jaw off the floor.
Then I wired it shut and looked again.
The one, single, paltry
[u32; 2] -> u64
function ends up emitting three thousand lines of assembly. Half of those are the meat of the function, and several of those are function calls to the other half. I didn't have the courage to benchmark it.For reference, the naïve implementation, that uses masks to check each bit and copy it to the result? Six hundred instructions.
morton-encoding
does it in less than 50.I'd like it if similar crates could somehow be consolidated. For that reason, I have to ask: Sir, what problem are you trying to solve? What needs do you have that
morton-encoding
cannot cover? Because, from what I can see, it's far superior, and it's, like... right there. You can use it, either directly as a dependency or indirectly as inspiration.Help me make both our crates better, please?
The text was updated successfully, but these errors were encountered: