Convert GPX/KML to geojson #61

GretaCB · 2016-03-07T18:47:03Z

Convert each layer of KML/GPX into geojson file (max 15 layers)
Add index file for each new file/layer > 10MB
Validate layer names
Include metadata file for original KML/GPX (run through mapnik-omnivore)
Drop GPX track_points. Found that these are automatically inserted by gdal (most of the time). @BergWerkGIS did we decide on if possible to allow track_points produced by an actual GPS device?
Tests (anything we're not testing that we should?)

GretaCB · 2016-03-14T22:31:22Z

Per https://github.com/mapbox/unpacker/issues/907, this PR just needs to run mapnik-omnivore and add the output as a JSON file to the bundle (for both KML/GPX preprocessor and geojson-index preprocessor). Happy to jump on this tomorrow.

… tests though

GretaCB · 2016-03-18T17:28:18Z

The togeojson KML tests fail 50% of the time locally with the following error:

not ok 9 .end() called twice
  ---
    operator: fail
  ...

/Users/mapbox/dev/preprocessorcerer/node_modules/queue-async/build/queue.js:41
        if (!tasks[i]) throw new Error; // detect multiple callbacks

Going to split each fixture into its own tape test as a first step. Wonder if a previous test in the for loop is erroring and an assert.end() is lingering.

GretaCB · 2016-03-18T18:42:02Z

Looks like the original error that was causing the .end() callback error is:

max tree depth:8
split ratio:0.55
processing '/var/folders/2p/m7jcvfp90wngshgr9j0cd9tc0000gn/T/f32881494358414c/empty-features-only.geojson' as GeoJSON
Mapnik LOG> 2016-03-18 14:39:26: Error loading mapped memory file: '/var/folders/2p/m7jcvfp90wngshgr9j0cd9tc0000gn/T/f32881494358414c/empty-features-only.geojson' (Invalid argument)
Error : cannot memory map /var/folders/2p/m7jcvfp90wngshgr9j0cd9tc0000gn/T/f32881494358414c/empty-features-only.geojson
Error: failed to process /var/folders/2p/m7jcvfp90wngshgr9j0cd9tc0000gn/T/f32881494358414c/empty-features-only.geojson
done!

I console logged the above error from within the indexing logic.

I suspect mapnik doesnt like some of the geojson files we are feeding it for indexing or some kind of race condition when indexing each layer?

cc @artemp @springmeyer

GretaCB · 2016-03-18T19:17:26Z

Per chat with @springmeyer , looks like the indexing error is pointing to something else going on in the converted geojson layers.

I took a look at the resulting converted geojson layers for the tests that are failing, and looks as if they are producing incomplete objects (invalid json).

Note: not all resulting geojson layers are invalid.

Going to keep 👀

wilhelmberg · 2016-03-21T14:53:15Z

One step further with debugging the failing test.

I have proof now, that something is still holding on to tmpfile when it should be deleted:

Unfortunately it's a pain to find out: running the test from the console the exception occurs every time.
Slowly stepping thru the test in a debug session works.
Jumping between different break points in a debug session fails.
I've already tried different things (without success):

delay before delete
nulling src in wmtiff
...

GretaCB · 2016-03-21T19:35:42Z

I think we have here a good ole fashioned control flow problem (when I remove createIndex locally, tests no longer fail). Looks like createIndex can potentially trigger before the forEach loops are finished. This is because createIndex is being called outside of the forEach loops, and is not relying on any sort of callback or way to determine when layers are in fact ready to be indexed.

I like your idea from yesterday, that we should index after all layers are converted. We can also refactor the main function a bit to better manage flow and modularize a bit.

Found a great example of how we can try untangling some of this:

var fs = require('fs');

function read_directory(path, next) {
  fs.readdir(".", function (err, files) {
    var count = files.length,
        results = {};
    files.forEach(function (filename) {
      fs.readFile(filename, function (data) {
        results[filename] = data;
        count--;
        if (count <= 0) {
          next(results);
        }
      });
    });
  });
}

function read_directories(paths, next) {
  var count = paths.length,
      data = {};
  paths.forEach(function (path) {
    read_directory(path, function (results) {
      data[path] = results;
      count--;
      if (count <= 0) {
        next(data);
      }
    });
  });
}

read_directories(['articles', 'authors', 'skin'], function (data) {
  // Do something
});

…cerer#61

wilhelmberg · 2016-03-22T11:21:26Z

Nothing wrong with the control flow, just gdal has to be manually disposed to get rid of the file reference faster:

preprocessorcerer: 8dc4291
wmtiff: mapbox/node-wmtiff@6c80973

The 4 references to the dataset in the screenshot above came from

wmtiff
tif-reproject.preprocessor.js
tif-toBytes.preprocessor.js
preprocessor.criteria

as each of them opens the dataset (maybe room for improvement?) and ds is still holding on to the file (despite of ds.close()).
Maybe because everything is done in callbacks and node hasn't yet let go of the objects.

Working on the next failing test now:

# [parts mbtiles] correct number of parts for mbtiles without grids table
ok 40 no error
ok 41 four parts

fs.js:772
  return binding.unlink(pathModule._makeLong(path));
                 ^
Error: EBUSY, resource busy or locked 'C:\Users\ADMINI~1\AppData\Local\Temp\2\79c7b0a1613d996b'
    at Object.fs.unlinkSync (fs.js:772:18)
    at c:\mb\preprocessorcerer\test\mbtiles-byTiles.test.js:17:10
    at Statement.<anonymous> (c:\mb\preprocessorcerer\parts\mbtiles-byTiles.js:25:9)
npm ERR! Test failed.  See above for more details.

wilhelmberg · 2016-03-22T11:59:55Z

Same with failing mbtiles test: resources are not properly disposed.
In this case connection to the database is not closed.
Fixed: abf63bc

Now all tests run through again, but 8 are failing.
Continuing the test quest ... 😏

wilhelmberg · 2016-03-22T12:22:17Z

I think we have here a good ole fashioned control flow problem

Yes, you are right. This is a control flow problem.

I didn't realize that you were talking about GPX test and I was talking about tiff test

Here createIndex is called async, but at the same time control flow continues to here

wilhelmberg · 2016-03-22T12:26:19Z

That's also why .end() is called twiced: 1x from createIndex() and 1x from the exported function itself.

not ok 4 expected error message
  ---
    operator: equal
    expected: 'GPX does not contain any valid features.'
    actual:   undefined
  ...
not ok 5 .end() called twice
  ---
    operator: fail
  ...

coveralls · 2016-03-22T13:51:12Z

Coverage decreased (-1.5%) to 95.238% when pulling c9ca030 on togeojson into 455fc99 on master.

wilhelmberg · 2016-03-22T13:57:59Z

npm test works for me now - no failed tests.
But I have to use mapbox/node-wmtiff@6c80973.

GretaCB · 2016-03-23T03:09:46Z

Node v0.10.40 worked fine on OSX as well. I took a look around preprocessorcer and did a little research, but didnt find much that explains the immortal tmpfile. I did find this

Datasets get closed automatically when node garbage collects them, but there is no guarantee that node will garbage collect before exiting.

This lead me down a google search of node GC, but not sure that's the proper route at this point. Will keep pondering.

wilhelmberg · 2016-03-23T11:21:57Z

I did find this

Great find. Thanks.
This was exactly my hunch:
"Making sure the dataset is closed and flushed before the process exits fixes the problem for me. Datasets get closed automatically when node garbage collects them, but there is no guarantee that node will garbage collect before exiting."

but there is no guarantee that node will garbage collect before exiting or in our case before we try to delete the dataset.

but didnt find much that explains the immortal tmpfile

The explanation above was the reason for not being able to delete the tmpfile, as node hat not yet done its garbage collection.

Sorry if I caused any confusion, but this should be fixed now and the file should not be immortal anymore.

ds.close() and ds = null I've put into the code at various places should take care of disposing the objects (references to the file).

This lead me down a google search of node GC, but not sure that's the proper route at this point. Will keep pondering.

Not necessary to spend more time on this.
I've literally spent weeks on node GC when investigating strange node-mapnik crashes some time ago.

There are GC parameters you can pass to the node executable to fiddle with the internals of its GC
e.g.: --expose-gc, or others to change memory usage before GC kicks in, or to prevent GC at all.
But none of them really helped and we decided to stick with the defaults.

The take away here is: always dispose of resources properly as soon as you are done with them.

Be it a file (fs.close()), a database connection (db.close()) or another native wrapper (like gdal).

springmeyer · 2016-03-23T15:22:24Z

The take away here is: always dispose of resources properly as soon as you are done with them.

Awesome work. Also agree @BergWerkGIS. This is a pretty classic case of windows behavior being more strict and that helping highlight core issues of the code that should be written more robustly. 💯

ds.close() and ds = null I've put into the code at various places should take care of disposing the objects (references to the file).

That is neat/great that setting to null triggers the actual close (I presume because it triggers the garbage collection sooner). I'm left however thinking that it is probably worth bringing this up again with Brian and Brandon. I'm sure the current close() behavior of not releasing until garbage collection makes sense given the design and object inheritance in node-gdal. But nevertheless it seems like odd behavior. Perhaps they would reconsider having ds.close() actually close the underlying gdal file handle directly or perhaps adding a ds.closeNow() method to force it. I've not looked at the code enough to know if this is feasible, but I think it would be great @BergWerkGIS if you would raise this for them in a ticket to flesh this out further.

wilhelmberg · 2016-03-23T15:53:51Z

@springmeyer Already looked at the code and I think this is purely on node's side (and our implementation) and there is nothing else gdal can do about it (although not familiar with node's GC internals).

ds.close() calls dispose() to null the internal resources:

    if (this_dataset) {
        LOG("Disposing Dataset [%p]", this_dataset);
        ptr_manager.dispose(uid);
        LOG("Disposed Dataset [%p]", this_dataset);
        this_dataset = NULL;
    }

Pseudo code of our implementation looks like this:

function one(){var ds = new gdal.ds; return two(ds);}
function two(ds){return three(ds);}
function three(ds){return four(ds);}
function four(ds){fs.unlink(ds);}

And with this implementation it absolutely makes sense for me that node hasn't fully released ds yet, as in method four() we are essentially still in method one().
Or am I missing something about node/js callbacks?

coveralls · 2016-03-24T19:04:21Z

Coverage decreased (-1.7%) to 95.111% when pulling f781e3c on togeojson into 455fc99 on master.

GretaCB · 2016-03-24T21:53:18Z

UPDATE on funky test failure

Per chat with Jake, node-mapnik no longer supports builds for node 0.12

coveralls · 2016-03-24T22:10:39Z

Coverage decreased (-1.7%) to 95.111% when pulling cd5474e on togeojson into 455fc99 on master.

coveralls · 2016-03-24T22:47:19Z

Coverage decreased (-1.7%) to 95.111% when pulling a61209a on togeojson into 455fc99 on master.

coveralls · 2016-03-25T14:34:28Z

Coverage decreased (-1.7%) to 95.111% when pulling d6af91f on togeojson into 455fc99 on master.

wilhelmberg · 2016-03-25T15:37:38Z

Cannot reproduce the unlink problem locally also Travis didn't have a problem on the last run:
https://travis-ci.org/mapbox/preprocessorcerer/builds/118480437

The highlighted files are those left after npm test.
Will take a look where there are coming from.

coveralls · 2016-03-25T16:32:05Z

Coverage decreased (-1.7%) to 95.111% when pulling 2a20b12 on togeojson into 455fc99 on master.

coveralls · 2016-03-28T15:22:31Z

Coverage decreased (-1.7%) to 95.133% when pulling 338d5a2 on togeojson into 455fc99 on master.

GretaCB · 2016-04-11T14:41:00Z

Hmmm, looking into why tests are still failing here, so we can get this ready to rock.

coveralls · 2016-04-11T15:01:05Z

Coverage decreased (-1.6%) to 95.143% when pulling 5aad818 on togeojson into 455fc99 on master.

coveralls · 2016-04-26T20:32:36Z

Coverage decreased (-1.6%) to 95.143% when pulling 4cc9d80 on togeojson into 059ebf9 on master.

rclark · 2016-04-27T21:09:37Z

preprocessors/spatial-index.preprocessor.js

@@ -34,6 +34,7 @@ module.exports = function(infile, outdir, callback) {
        .on('exit', function() {
          // If error printed to --validate-features log
          if (data.indexOf('Error') != -1) {
+            console.log(data);


debug logging?

👍 thank you

@GretaCB what's the plan of attack?
This has changed here (https://github.com/mapbox/preprocessorcerer/compare/stderr-fix#diff-39694ce4a586caad3e6ad813c295b454R32) too as latest mapnik-index is now returning an error code, if --validate-features fails.
Need to wait to update to latest node-mapnik?

Plan of attack re: mapbox/mapnik-swoop#14
cc @mapsam

Working through a node-mapnik dev build so we can test this right now.

coveralls · 2016-05-04T21:36:49Z

Coverage decreased (-1.7%) to 95.133% when pulling 18c3a26 on togeojson into 059ebf9 on master.

coveralls · 2016-05-04T21:39:08Z

Coverage decreased (-1.7%) to 95.133% when pulling 18c3a26 on togeojson into 059ebf9 on master.

convert KML and GPX to GeoJSON

e483044

wilhelmberg force-pushed the togeojson branch from 434a979 to e483044 Compare March 10, 2016 13:44

wilhelmberg and others added 2 commits March 16, 2016 08:08

GPX: drop 'track_points' and 'route_points'

a8d1df1

add metadata and index file to new bundle dirs, still troubleshooting…

5ab4de5

… tests though

split out kml togeojson tests and moved indexing

5e1d36b

wilhelmberg added 3 commits March 20, 2016 19:30

flush and close datasets before mapnik-indexing them

677a427

togeojson: release objects to be able to index

4f96667

[wip] make tests work again

43d2630

split out tests

1952055

wilhelmberg added a commit to mapbox/node-wmtiff that referenced this pull request Mar 22, 2016

quick release of gdal's reference to dataset refs mapbox/preprocessor…

6c80973

…cerer#61

manually release gdal's reference to dataset

8dc4291

fix mbtiles test

abf63bc

wilhelmberg added 2 commits March 22, 2016 13:20

fix GPX test

44fc555

fix kml test

c9ca030

bump mapnik-omnivore to include newest node-mapnik

f781e3c

node-mapnik doesnt support builds for node v0.12

cd5474e

re-add special character layer test, it got left behind

a61209a

travis? delete temp dirs of spatial-index.test.js

d6af91f

wilhelmberg added 2 commits March 25, 2016 16:22

WIP not yet finished, clean up temporary test data

fc21183

oops, style errors

2a20b12

clean up temp data after 'npm test'

338d5a2

what is data?

5aad818

`Merge branch 'master' into togeojson

4cc9d80

rclark reviewed Apr 27, 2016
View reviewed changes

This was referenced Apr 27, 2016

Layer limits for kml uploads mapbox/mapbox-upload-limits#8

Merged

Validate number of kml layers mapbox/mapbox-upload-validate#62

Merged

gretacb added 2 commits May 4, 2016 17:20

remove unneeded log

197766e

udpate mapnik-om

18c3a26

GretaCB merged commit 4f8d380 into master May 4, 2016

GretaCB deleted the togeojson branch May 4, 2016 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert GPX/KML to geojson #61

Convert GPX/KML to geojson #61

GretaCB commented Mar 7, 2016 •

edited

Loading

GretaCB commented Mar 14, 2016

GretaCB commented Mar 18, 2016

GretaCB commented Mar 18, 2016

GretaCB commented Mar 18, 2016

wilhelmberg commented Mar 21, 2016

GretaCB commented Mar 21, 2016

wilhelmberg commented Mar 22, 2016

wilhelmberg commented Mar 22, 2016

wilhelmberg commented Mar 22, 2016

wilhelmberg commented Mar 22, 2016

coveralls commented Mar 22, 2016

wilhelmberg commented Mar 22, 2016

GretaCB commented Mar 23, 2016

wilhelmberg commented Mar 23, 2016

springmeyer commented Mar 23, 2016

wilhelmberg commented Mar 23, 2016

coveralls commented Mar 24, 2016

GretaCB commented Mar 24, 2016

coveralls commented Mar 24, 2016

coveralls commented Mar 24, 2016

coveralls commented Mar 25, 2016

wilhelmberg commented Mar 25, 2016

coveralls commented Mar 25, 2016

coveralls commented Mar 28, 2016

GretaCB commented Apr 11, 2016

coveralls commented Apr 11, 2016

coveralls commented Apr 26, 2016

rclark Apr 27, 2016

GretaCB Apr 27, 2016

wilhelmberg Apr 27, 2016

GretaCB Apr 27, 2016

mapsam Apr 27, 2016

coveralls commented May 4, 2016

coveralls commented May 4, 2016

Convert GPX/KML to geojson #61

Convert GPX/KML to geojson #61

Conversation

GretaCB commented Mar 7, 2016 • edited Loading

GretaCB commented Mar 14, 2016

GretaCB commented Mar 18, 2016

GretaCB commented Mar 18, 2016

GretaCB commented Mar 18, 2016

wilhelmberg commented Mar 21, 2016

GretaCB commented Mar 21, 2016

wilhelmberg commented Mar 22, 2016

wilhelmberg commented Mar 22, 2016

wilhelmberg commented Mar 22, 2016

wilhelmberg commented Mar 22, 2016

coveralls commented Mar 22, 2016

wilhelmberg commented Mar 22, 2016

GretaCB commented Mar 23, 2016

wilhelmberg commented Mar 23, 2016

springmeyer commented Mar 23, 2016

wilhelmberg commented Mar 23, 2016

coveralls commented Mar 24, 2016

GretaCB commented Mar 24, 2016

coveralls commented Mar 24, 2016

coveralls commented Mar 24, 2016

coveralls commented Mar 25, 2016

wilhelmberg commented Mar 25, 2016

coveralls commented Mar 25, 2016

coveralls commented Mar 28, 2016

GretaCB commented Apr 11, 2016

coveralls commented Apr 11, 2016

coveralls commented Apr 26, 2016

rclark Apr 27, 2016

Choose a reason for hiding this comment

GretaCB Apr 27, 2016

Choose a reason for hiding this comment

wilhelmberg Apr 27, 2016

Choose a reason for hiding this comment

GretaCB Apr 27, 2016

Choose a reason for hiding this comment

mapsam Apr 27, 2016

Choose a reason for hiding this comment

coveralls commented May 4, 2016

coveralls commented May 4, 2016

GretaCB commented Mar 7, 2016 •

edited

Loading