Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create GTFS difference report #17

Open
jwhitlock opened this issue Feb 2, 2014 · 2 comments
Open

Create GTFS difference report #17

jwhitlock opened this issue Feb 2, 2014 · 2 comments

Comments

@jwhitlock
Copy link
Member

Create a method for showing the common elements and differences between two GTFS feeds.

Difficultly - Hard
Criteria - A text or HTML report would be a great starting place

One strategy: Create a hashing function for a 'row', use to find the identical elements in two feeds and the unique elements. For the unique elements, do a simple matching by GTFS IDs to identify records that changed from one to another. Generate a Markdown report.

@bbrewington
Copy link

Did you guys figure this out? Working on same project with Atlanta's GTFS file, to help the MARTA Army project

@jwhitlock
Copy link
Member Author

No, never got to building this feature, and I probably won't get to it. I don't think @jdungan is the right person either.

My idea was to add a "signature" column. The basic idea was:

  1. Construct a JSON representation of the item, with a known key order, whitespace settings, etc. Omit things like the line pattern that depend on other columns.
  2. Take the md5sum of that representation, and store it in a new "signature" column in the database.

The idea is that, if two stops in two feeds have the same data, they will have the same signature column. This helps identify the data items that did not change.

Once you know what didn't change, you have what did change. The challenge is to match items that refer to the same thing, so you need a similarity algorithm. For example, if a stop has the same stop_id, that's a pretty good indication that it refers to the same stop, even if the latitude and longitude are different. Easy for people, potentially challenging to code.

Then, there is a sample UI to display things that are the same, things that changed, new items, and deleted items.

Very useful stuff, but hard to do, and maybe too much to ask a volunteer to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants