Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Namespace' object has no attribute 'use_containing' in dedupe_geojson.py #12

Open
thisisaaronland opened this issue Jul 23, 2018 · 3 comments

Comments

@thisisaaronland
Copy link

Is the EMR dedupe geojson job missing options? I am seeing the following errors when trying to run lieu (f55fe8bf232525679baac4c3db1387cb37d16e14) in EMR:

No handlers could be found for logger "mrjob.launch"
Traceback (most recent call last):
  File "dedupe_geojson.py", line 204, in <module>
    DedupeGeoJSONJob.run()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 436, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 454, in execute
    self.run_spark(self.options.step_num)
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 647, in run_spark
    spark_method(input_path, output_path)
  File "dedupe_geojson.py", line 137, in spark
    use_containing = self.options.use_containing
AttributeError: 'Namespace' object has no attribute 'use_containing'

It's not clear to me if/how/where args are defined or inherited, in part because the use_city = self.options.use_city statement on line 136 does not appear to fail, even it doesn't seem to be defined anywhere either...

grep -n -r -e 'use_containing' /usr/local/openvenues/lieu/scripts/jobs
.//dedupe_geojson.py:137:        use_containing = self.options.use_containing
.//dedupe_geojson.py:152:            dupes_with_classes_and_sims = NameAddressDeduperSpark.dupe_sims(address_ids, geo_model=geo_model, geo_model_proportion=geo_model_proportion, index_type=index_type, name_dupe_threshold=name_dupe_threshold, name_review_threshold=name_review_threshold, with_address=with_address, with_unit=with_unit, with_latlon=use_latlon, with_city_or_equivalent=use_city, with_small_containing_boundaries=use_containing, with_postal_code=use_postal_code, fuzzy_street_name=fuzzy_street_name, with_phone_number=with_phone_number, name_and_address_keys=name_and_address_keys, name_only_keys=name_only_keys, address_only_keys=address_only_keys)
.//dedupe_geojson.py:154:            dupes_with_classes_and_sims = AddressDeduperSpark.dupe_sims(address_ids, with_unit=with_unit, with_latlon=use_latlon, with_city_or_equivalent=use_city, with_small_containing_boundaries=use_containing, with_postal_code=use_postal_code, fuzzy_street_name=fuzzy_street_name)
grep -n -r -e 'use_containing' /usr/local/openvenues/lieu/scripts/jobs | wc -l
       0
@thisisaaronland
Copy link
Author

Similarly:

No handlers could be found for logger "mrjob.launch"
Traceback (most recent call last):
  File "dedupe_geojson.py", line 206, in <module>
    DedupeGeoJSONJob.run()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 436, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 454, in execute
    self.run_spark(self.options.step_num)
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 647, in run_spark
    spark_method(input_path, output_path)
  File "dedupe_geojson.py", line 149, in spark
    name_only = self.options.name_only
AttributeError: 'Namespace' object has no attribute 'name_only'

@thisisaaronland
Copy link
Author

No handlers could be found for logger "mrjob.launch"
Traceback (most recent call last):
  File "dedupe_geojson.py", line 213, in <module>
    DedupeGeoJSONJob.run()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 436, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 454, in execute
    self.run_spark(self.options.step_num)
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 647, in run_spark
    spark_method(input_path, output_path)
  File "dedupe_geojson.py", line 184, in spark
    with_unit=self.options.with_unit)
TypeError: explain_name_address_dupe() got an unexpected keyword argument 'name_dupe_threshold'

@thisisaaronland
Copy link
Author

FWIW, the following changes fix all the errors above although I can't be sure I'm not glossing over some important details...

master...sfomuseum:debug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant