Replies: 4 comments 5 replies
-
Interesting ideas! The slowest individual operations are generally the complex geometry ones. In boost::geometry, that's things like My gut feeling is that the best way to optimise for frequent updates is to capitalise on the fact that most data doesn't change frequently. Coastlines change barely at all. Rivers and railways infrequently. Roads change a bit more often and POIs more often still. Because vector tiles are organised into layers, it might be possible to update particular layers and/or features. A naive, but I suspect effective, implementation might just have a "volatile" flag in the layer JSON which indicates whether it should be reprocessed with a partial update. Happily, the objects that change most often - roads and POIs - are probably pretty quick to update as they're either linestrings, points or simple polygons. (Compare to landuse and coastlines which are often expensive polygons.) A smarter implementation would look at the revision date of individual OSM objects, and flag tiles up as needing rewriting as part of that. But that might lead into a massive rabbit-hole. But this is only a 15-second brain dump! |
Beta Was this translation helpful? Give feedback.
-
Oh, that reframes things and refines my understanding of the problem space. Very clever. I now think a transparent function cache could still be useful for the case where a user is iterating on their Lua profile, but I agree it may not be as useful for the case where the only input that's changing is the PBF file, not the Lua script. Can I pester you with some followup questions?
When does I guess right now it's eagerly called -- it might be interesting to make it lazy and see if that buys some improvements.
I think right now simplification happens repeatedly on the base z14 geometry, e.g. at z6, we simplify the z14 geometry, at z7, we simplify the z14 geometry and so on. Would it make sense if tiles were written out in the opposite zoom order, and simplification re-used the output of the next highest zoom's simplification? eg: z14 writes the raw geometry, z13 simplifies the z14 geometry and writes it, z12 simplifies the z13 geometry and writes it I haven't looked very closely at the output phase of Tilemaker yet. My hope is that it's cheaper to simplify an already simplified geometry. But I don't know if (a) that's true or (b) it would introduce weird artifacts / oversimplify.
Related, if a z14 geometry is valid, and we simplify it, is the simplified geometry guaranteed to be valid?
Is the purpose of an opt-in volatile flag a performance optimization? If it was possible to cheaply compute the set of features that had changed, and only process geometries for those, it might be more convenient to just do all the changed features vs putting the burden on the user to know how to set the config knobs correctly.
Oh, interesting. I see For versioning purposes, you wouldn't even need the full fidelity of the 8-byte timestamp -- just a byte tracking if it changed in the last 256 days would be enough to discriminate which things need to be reprocessed, so long as you were reprocessing on a cadence faster than 256 days. I suspect the majority of items would not have changed in the last 256 days, so you could skip storing those and treat absence as You'd have to store the revision dates in the mbtiles somehow, I guess?
But there'd be rabbits at the end! |
Beta Was this translation helpful? Give feedback.
-
Potentially answering my own question: yes, it's a performance optimization. The cost isn't so much in tracking what has changed, but in materializing those changes into the tileset. For example, small changes to very large relations cause a tile write amplification issue. If someone makes a small adjustment to the borders of their waterfront town in Florida, it's cheap for us to flag the entire USA relation as being invalidated. But that would then mean that hundreds of thousands of tiles are invalidated, and, upon rematerializing them, we'd likely find that only a handful of them have actually changed. This probably implies that you'd want to track the age of staleness at the per layer, per tile level. |
Beta Was this translation helpful? Give feedback.
-
Oh, I guess the ability to do partial updates also gets you 90% of the way to being able to do a layer-by-layer creation of a tileset. If you lacked RAM, that'd be useful, as your peak memory usage would just be node store (all) + The node stores a big burden for the entire planet, but I have a few ideas that ought to help there. |
Beta Was this translation helpful? Give feedback.
-
I'm considering adding an optional feature to tilemaker: a cache of expensive function calls. This would not affect the initial runtime of Tilemaker on a PBF, but instead would decrease the runtime of subsequent runs. This would be of interest to people who want to maintain ongoing updates of their vector tiles.
By default, the cache would be disabled.
A user would opt-in via a CLI flag:
tilemaker --function-cache /tmp/some.db
Tilemaker would then create a SQLite database with a schema like:
The first thing I have my eye on
geom::is_valid
. It's an attractive thing to memoize:Adding persistent storage feels a bit against the spirit of Tilemaker, though. I rationalize it to myself by saying it'd be optional, and it'd have minimal knobs (just absent or present).
I started down the path of doing a proof of concept, and ran into some fiddly threading issues with SQLite. I have some ideas on how to work around them, but figured I'd double-check that you'd be interested in such a feature before spending the time to resolve the issues.
On Great Britain, I think the best case improvement from caching
is_valid
would be perhaps a 10% decrease in PBF reading time. There might be other places such a cache could be gainfully employed -- I thinkis_valid
is now also called on output. It may also be reasonable to cachemake_valid
calls, although I haven't looked deeply into that yet to understand the storage implications.Beta Was this translation helpful? Give feedback.
All reactions