Provide methods for normalizing the names of fields provided by ChangesetSource #113

jpolchlo · 2019-09-26T16:32:33Z

Pursuant to the conversation here, there are two different ways to name the fields provided by a changeset source. The streaming source (accessed by spark.read.format(Source.Changesets)) uses snake case for its variable names (e.g., created_at), while changeset ORC files tend to use camel case (createdAt). If one intends to use .as[Changeset] to convert to a Dataset, it will be necessary to use the latter convention.

We should provide the means to convert from one case structure to the other.

The text was updated successfully, but these errors were encountered:

jpolchlo · 2019-09-26T16:38:44Z

Make sure to address https://github.com/azavea/osmesa/blob/baca909e376116350fbb0cf60e32889a9194f0b3/src/analytics/src/main/scala/osmesa/analytics/oneoffs/MergeChangesets.scala#L99 after providing this change.

mojodna · 2019-09-26T16:53:39Z

Clarifying: streaming sources use camel case, ORC files typically use snake case (per https://github.com/mojodna/osm2orc).

jpolchlo mentioned this issue Sep 26, 2019

Staging deployment and Railway statistics azavea/osmesa#157

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide methods for normalizing the names of fields provided by ChangesetSource #113

Provide methods for normalizing the names of fields provided by ChangesetSource #113

jpolchlo commented Sep 26, 2019

jpolchlo commented Sep 26, 2019

mojodna commented Sep 26, 2019

Provide methods for normalizing the names of fields provided by ChangesetSource #113

Provide methods for normalizing the names of fields provided by ChangesetSource #113

Comments

jpolchlo commented Sep 26, 2019

jpolchlo commented Sep 26, 2019

mojodna commented Sep 26, 2019