Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LD_LIBRARY_PATH override stomps EMR settings #6

Open
CloudNiner opened this issue May 31, 2019 · 1 comment
Open

LD_LIBRARY_PATH override stomps EMR settings #6

CloudNiner opened this issue May 31, 2019 · 1 comment

Comments

@CloudNiner
Copy link

CloudNiner commented May 31, 2019

Leading to the following exception in any job where Spark attempts to compress data:

ERROR GPLNativeCodeLoader: Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
    at java.lang.Runtime.loadLibrary0(Runtime.java:870)
    at java.lang.System.loadLibrary(System.java:1122)
    at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
    at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2306)

This was verified by running the following:

[hadoop@ip-172-31-7-166 ~]$ spark-shell --master local --driver-java-options -Djava.library.path=/usr/local/lib
...
scala> System.getProperty("java.library.path")
res0: String = /usr/local/lib

versus

[hadoop@ip-172-31-7-166 ~]$ spark-shell --master local
...
scala> System.getProperty("java.library.path")
res0: String = /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib

We override this setting so that executors are able to find the GDAL bindings installed by the bootstrap script. I'd expect to have the spark cluster configured such that all of the native EMR installed libs as well as GDAL are available on the LD_LIBRARY_PATH

@pomadchin
Copy link
Member

pomadchin commented May 31, 2019

Things to keep in mind:

  1. override LD_LIBRARY_PATH is enough, it was well tested with GDAL ingests in a VLM performance repo

  2. Probably this will work (needs to be checked):

EmrConfig("spark-env").withProperties(
  "LD_LIBRARY_PATH" -> "/usr/local/lib:$LD_LIBRARY_PATH"
)

CloudNiner added a commit to geotrellis/geotrellis-osm-diff-demo that referenced this issue Jun 3, 2019
This is a multi-stage improvement:
1. Switch to a full join so that we can compare
   all combinations of OSM and Bing buildings
2. Match with a series of tiered strategies. First
   filter any geoms from the match list that intersect
   with no other geoms. Then perform the original
   centroid check. Then perform an intersection area
   overlap check.
3. Allow one to many matching for Bing -> OSM. Bing
   areas are generally larger and more poorly defined
   since they're satellite derived footprints. This
   change generally allows us to retain fidelity in
   the output when Bing sees one large area that is
   actually a tight cluster of buildings such as a
   block of row homes.

The schema of the output tiles is changed. The properties
for each output geometry are:
Old:
- "hasOsm": Boolean
New:
- "source": String, one of "osm"|"bing"|"both"
- "name": String, value of osm tag "name", if available.
  Defaults to empty string.
- "building_type": String, value of osm tag "building", if available.
  Defaults to empty string.

Adds support to the CLI for dumping the input RDD layers
in addition to the output diff layer via the `--source`
argument.

Fixes a bug where the geomesa geojson writer fails to
properly encode string properties that contain
double quotes. See:
- https://geomesa.atlassian.net/browse/GEOMESA-2631
- https://geomesa.atlassian.net/browse/GEOMESA-2630

Fixes a bug where invalid geoms could persist to the diff
algorithm and throw errors during the intersection check
phase.

Fixes a EMR bootstrap bug where our library path override
meant to allow EMR to find our GDAL installation overwrites
the defaults in EMR which leaves the cluster executors
unable to find other libs. See:
geotrellis/geotrellis-spark-job.g8#6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants