-
-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible improvements for new ram middle #1466
Comments
Regarding "Reading node location back from node_locations_t": Some SIMD-based LEB128 decoders claim to provide faster speeds than the current byte-by-byte approach used in protozero. It might be fast enough to avoid the additional caching of decoded blocks. Lots of research on this topic by https://lemire.me/blog. Besides, the current approach already turned out to be about 10% faster than the old one in my tests. My impression was that fetching node locations isn't one of the major bottleneck in the overall process anymore. This might be different for other test cases, though. |
When I worked on protozero I looked at all thoses schemes to improve varint decoding speed but decided against them. I don't want to maintain something like that. This certainly applies much more here. And yes, it is unclear how much any of these possible improvements would actually help and we'd need to look at the resulting code and benchmarks to see whether they are worth it. |
Store ids and locations as they come in without going through a temporary block first. Slightly simpler code and we don't need the freeze() function any more. See also osm2pgsql-dev#1466.
Store ids and locations as they come in without going through a temporary block first. Slightly simpler code and we don't need the freeze() function any more. See also osm2pgsql-dev#1466.
The |
The first item above (that To fix this we need to either implement the resizing of |
The new ram middle (#1461) is better than the previous implementations but there are several places where it could be made even better. Here are some ideas that we can work on as time permits:
Note that many of these are interrelated and related to other upcoming changes in the code. So in many cases it makes sense to defer decisions on what best to do until later when we have a better idea of the environment this code runs in. (In fact this is one reason I am writing this issue instead of improving the code directly, the other is limited time...)
node_locations_t::m_data
andmiddle_ram_t::m_way_nodes_data
, respectively). As more data arrives this will be resized again and again. Copying the memory around when that happens isn't that expensive, so it isn't too bad. But while the resizing takes place, we temporarily basically need twice the memory. So when you get near your memory limit, this can break although enough memory would be available if we break the data into smaller chunks.node_locations_t
store writes new locations into a temporary cache (m_block
) which gets written out to the real data storage (m_data
) when full. This could be done directly instead. We don't need thefreeze()
function any more then.node_locations_t
store writes out all ids of one block and then all locations of that block. It might be better to write out all (id, location) pairs in order.node_locations_t::block_size = 32
should be evaluated and tuned.node_locations_t
is expensive, because we have to read through a lot of varints to find the data we need. We should probably use some kind of cache so that we can reuse decoded blocks. I have not implemented this, because we might want to run the reading code in multiple threads later. If that's the case we might want to have the cache outside thenode_locations_t
class so threads don't trample on each other. Or maybe we need amutex
.osmium::memory::Buffer
which takes a lot of memory. This was the simplest implementation I could think of and two-stage processing isn't used widely, so it is a reasonable compromise for the time being. But we can do much better here, details TBD.The text was updated successfully, but these errors were encountered: