-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use bytebuffer abstraction instead of byte[] #20
Comments
I'm not sure, but using ByteBuffer may cause serious performance decrease because of copy operation from native memory to heap. new TailLOUDSTrie(
firstTrie,
new LOUDSBvTree(new ByteBufferSuccinctBitVector(firstTrie.nodeSize() * 2)),
new ConcatTailArrayBuilder(firstTrie.size() * 4)); |
There might be some performance decrease, but in some cases (like mine, where things are really huge) eliminating extra GC could prove to be beneficial in the big picture.. In any case this assumption has to be measured. When Ill get to this, should I bother with the specialized BitVectors as well (such as BytesRank0OnlySuccinctBitVector)? |
I see. FYI, according to the URL below, using Unsafe class and DirectBuffer class seem to provide faster access to natively allocated memory like this: Field field = sun.misc.Unsafe.class.getDeclaredField("theUnsafe");
field.setAccessible(true);
unsafe = (sun.misc.Unsafe) field.get(null);
sun.nio.ch.DirectBuffer buffer=(sun.nio.ch.DirectBuffer)ByteBuffer.allocateDirect(size);
unsafe.getInt(byteBuffer.address(), offset); I don't think you must consider about bit vector variations because these classes provides slightly low performance improvements (But if you need it, you can implement it :). |
Thanks :)
So there is a little delay for a little improvement in memory consumption. Looking further with memory analyzer, it appears that TailLOUDSTrie.labels is actually taking the majority of space. Ill make a BBTailLOUDSTrie with similar optimizations and recheck. |
That's so nice!!! Ahh.. Yes, indeed, the tail array is a memory eater. Though I want to improve that, currently no idea. |
Thanks! Yes, I plan on sending a pull request, still some more things to check though. Here is another attempt, this time adding a BBTailLOUDSTrie:
We again see some slight degradation in speed but this time a more noticeable improvement in memory consumption! I am sure we could propagate these changes to more places and reduce memory usage even further.. Do you have any papers you were basing your implementation on that I could read? I would love improve my understanding of this project. |
That's great! I read Japanese book http://www.amazon.co.jp/gp/product/4774149934/ (Technologies to support Japanese input method editors)
I think base implementation is similar to original papers because the authors of documents above refer those, but I add following some little optimizations:
and so on. |
What ever happened to this? Would indeed be nice with an implementation that can use off-heap memory. |
@knutwannheden But we still need to implement the ByteBuffer version of TailLOUDSTrie according to #20 (comment) |
@knutwannheden |
Hey,
I would love it if trie4j could use ByteBuffer instead of byte[]. This could be useful for avoiding heap space for large static tries, by using DirectByteBuffer, MappedByteBuffer etc..
I only looked at the code briefly, and found that it could be quite simple to modify BytesSuccinctBitVector.java to use ByteBuffer. Wanted to know if there are more places I should be looking at?
WDYT?
Thanks!
The text was updated successfully, but these errors were encountered: