Skip to content

Commit

Permalink
Reduce maxzooms being guessed a little: (#2)
Browse files Browse the repository at this point in the history
* Reduce maxzooms being guessed a little:

* Use 1.5 standard deviations, not 2, as the minimum distinguishable
* Give overlapping polygons and linestrings more distinct indices

* Add another drop rate guessing options, from the same metrics -zg uses

* Guard against using -rp without -zg
  • Loading branch information
e-n-f authored Sep 8, 2022
1 parent a447dfc commit 073700a
Show file tree
Hide file tree
Showing 66 changed files with 24,907 additions and 24,028 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
## 2.6.0

* Add another drop rate guessing options, from the same metrics -zg uses
* Reduce maxzooms being guessed a little:
* Use 1.5 standard deviations, not 2, as the minimum distinguishable
* Give overlapping polygons and linestrings more distinct indices

## 2.5.0

* Add an option to add extra detail at maxzoom that does not factor into guessing
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -447,6 +447,8 @@ the same layer, enclose them in an `all` expression so they will all be evaluate
If you use `-rg`, it will guess a drop rate that will keep at most 50,000 features in the densest tile.
You can also specify a marker-width with `-rg`*width* to allow fewer features in the densest tile to
compensate for the larger marker, or `-rf`*number* to allow at most *number* features in the densest tile.
If you use `-rp` with `-zg` or `--smallest-maximum-zoom-guess` it will choose a drop rate from the same
distance-between-features metrics as are used to choose the maxzoom.
* `-B` _zoom_ or `--base-zoom=`_zoom_: Base zoom, the level at and above which all points are included in the tiles (default maxzoom).
If you use `-Bg`, it will guess a zoom level that will keep at most 50,000 features in the densest tile.
You can also specify a marker-width with `-Bg`*width* to allow fewer features in the densest tile to
Expand Down
27 changes: 26 additions & 1 deletion main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2074,7 +2074,7 @@ int read_input(std::vector<source> &sources, char *fname, int maxzoom, int minzo
// are typically lognormally distributed. Two standard deviations
// below the mean should be enough to distinguish most features.
double avg = exp(mean);
double nearby = exp(mean - 2 * stddev);
double nearby = exp(mean - 1.5 * stddev);

// Convert approximately from tile units to feet.
// See empirical data above for source
Expand Down Expand Up @@ -2117,6 +2117,24 @@ int read_input(std::vector<source> &sources, char *fname, int maxzoom, int minzo
if (changed) {
printf("Choosing a maxzoom of -z%d to keep most features distinct with cluster distance %d\n", maxzoom, cluster_distance);
}

if (droprate == -3) {
// This mysterious formula is the result of eyeballing the appropriate drop rate
// for several point tilesets using -zg and then fitting a curve to the pattern
// that emerged. It appears that if the standard deviation of the distances between
// features is small, the drop rate should be large because the features are evenly
// spaced, and if the standard deviation is large, the drop rate can be small because
// the features are in clumps.
droprate = exp(-0.7681 * log(stddev) + 1.582);

if (droprate < 0) {
droprate = 0;
}

if (!quiet) {
fprintf(stderr, "Choosing a drop rate of %f\n", droprate);
}
}
}

if (dist_count != 0) {
Expand Down Expand Up @@ -3044,6 +3062,8 @@ int main(int argc, char **argv) {
case 'r':
if (strcmp(optarg, "g") == 0) {
droprate = -2;
} else if (strcmp(optarg, "p") == 0) {
droprate = -3;
} else if (optarg[0] == 'g' || optarg[0] == 'f') {
droprate = -2;
if (optarg[0] == 'g') {
Expand Down Expand Up @@ -3254,6 +3274,11 @@ int main(int argc, char **argv) {
full_detail = 12;
}

if (droprate == -3 && !guess_maxzoom) {
fprintf(stderr, "Can't use -rp without either -zg or --smallest-maximum-zoom-guess\n");
exit(EXIT_FAILURE);
}

if (maxzoom > MAX_ZOOM) {
maxzoom = MAX_ZOOM;
fprintf(stderr, "Highest supported zoom is -z%d\n", maxzoom);
Expand Down
2 changes: 2 additions & 0 deletions man/tippecanoe.1
Original file line number Diff line number Diff line change
Expand Up @@ -559,6 +559,8 @@ the same layer, enclose them in an \fB\fCall\fR expression so they will all be e
If you use \fB\fC\-rg\fR, it will guess a drop rate that will keep at most 50,000 features in the densest tile.
You can also specify a marker\-width with \fB\fC\-rg\fR\fIwidth\fP to allow fewer features in the densest tile to
compensate for the larger marker, or \fB\fC\-rf\fR\fInumber\fP to allow at most \fInumber\fP features in the densest tile.
If you use \fB\fC\-rp\fR with \fB\fC\-zg\fR or \fB\fC\-\-smallest\-maximum\-zoom\-guess\fR it will choose a drop rate from the same
distance\-between\-features metrics as are used to choose the maxzoom.
.IP \(bu 2
\fB\fC\-B\fR \fIzoom\fP or \fB\fC\-\-base\-zoom=\fR\fIzoom\fP: Base zoom, the level at and above which all points are included in the tiles (default maxzoom).
If you use \fB\fC\-Bg\fR, it will guess a zoom level that will keep at most 50,000 features in the densest tile.
Expand Down
33 changes: 28 additions & 5 deletions serial.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -557,12 +557,35 @@ int serialize_feature(struct serialization_state *sst, serial_feature &sf) {
sf.seq = 0;
}

long long bbox_index;
unsigned long long bbox_index;
long long midx, midy;

if (sf.t == VT_POINT) {
// keep old behavior, which loses one bit of precision at the bottom
midx = (sf.bbox[0] / 2 + sf.bbox[2] / 2) & ((1LL << 32) - 1);
midy = (sf.bbox[1] / 2 + sf.bbox[3] / 2) & ((1LL << 32) - 1);
} else {
// To reduce the chances of giving multiple polygons or linestrings
// the same index, use an arbitrary but predictable point from the
// geometry as the index point rather than the bounding box center
// as was previously used. The index point chosen comes from a hash
// of the overall geometry, so features with the same geometry will
// still have the same index. Specifically this avoids guessing
// too high a maxzoom for a data source that has a large number of
// LineStrings that map essentially the same route but with slight
// jitter between them, even though the geometries themselves are
// not very detailed.
size_t ix = 0;
for (size_t i = 0; i < sf.geometry.size(); i++) {
ix += sf.geometry[i].x + sf.geometry[i].y;
}
ix = ix % sf.geometry.size();

// If off the edge of the plane, mask to bring it back into the addressable area
midx = sf.geometry[ix].x & ((1LL << 32) - 1);
midy = sf.geometry[ix].y & ((1LL << 32) - 1);
}

// Calculate the center even if off the edge of the plane,
// and then mask to bring it back into the addressable area
long long midx = (sf.bbox[0] / 2 + sf.bbox[2] / 2) & ((1LL << 32) - 1);
long long midy = (sf.bbox[1] / 2 + sf.bbox[3] / 2) & ((1LL << 32) - 1);
bbox_index = encode_index(midx, midy);

if (additional[A_DROP_DENSEST_AS_NEEDED] || additional[A_COALESCE_DENSEST_AS_NEEDED] || additional[A_CLUSTER_DENSEST_AS_NEEDED] || additional[A_CALCULATE_FEATURE_DENSITY] || additional[A_DROP_SMALLEST_AS_NEEDED] || additional[A_COALESCE_SMALLEST_AS_NEEDED] || additional[A_INCREASE_GAMMA_AS_NEEDED] || sst->uses_gamma || cluster_distance != 0) {
Expand Down
Loading

0 comments on commit 073700a

Please sign in to comment.