Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recommended presence: reconciling confusion between best practices and spec #376

Closed
emmambd opened this issue May 9, 2023 · 10 comments · Fixed by #386
Closed

Add recommended presence: reconciling confusion between best practices and spec #376

emmambd opened this issue May 9, 2023 · 10 comments · Fixed by #386
Labels
Change: Best Practice Changes focusing on recommendations for optimal use of the specification. GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule

Comments

@emmambd
Copy link
Collaborator

emmambd commented May 9, 2023

Problem

As originally mentioned in #375, MobilityData’s heard a need to merge best practices into the spec to increase their visibility to producers and reduce confusion between the two documents. We’ve been exploring what a good starting “release” or iteration would look like to begin this work. Based on feedback in #375 and some other discussions, it looks like the one key initial challenge to solve is reconciling places in the best practices and spec where their advice conflicts.

Some examples:

  • feed_info.txt is Optional in the spec, but recommended in all cases in best practices.
  • agency_id is Optional if there is only one transit agency in the spec, but always recommended in the best practices

Solution

Adding a Recommended and Conditionally Recommended presence in the spec that conforms to RFC conventions. This would make it possible to clearly state that a field or file is not required, but adding it is a best practice that should be considered. Making it immediately obvious that a file or field is recommended in the spec reduces confusion, while preserving backwards compatibility.

This change would not affect validity or the quality evaluation of the dataset according to the Canonical Validator, since the best practices are already warnings in the validator. Any recommended field or file would generate a warning in the validator.

Recommended Scope

MobilityData has done an initial review of all places in the spec and best practices where they offer contradictory advice on when to add a file or field.

All conflicting best practices that don’t require adopting a new field into the spec relate to

  • feed_info.txt file and fields
  • route_short_name in routes.txt
  • agency_id in routes.txt, agency.txt, and fare_attributes.txt
  • timepoint in stop_times.txt
  • block_id in trips.txt
  • Looping or inlining routes
    • shape_dist_traveled in stop_times.txt
    • shape_dist_traveled in shapes.txt
    • direction_id in trips.txt

After the associated PR is adopted, MobilityData will open a PR on https://github.com/MobilityData/GTFS_Schedule_Best-Practices so that all best practices that have been reconciled + adopted into the spec are removed from the best practices repo.

It would be helpful to get feedback on:

  • The scope itself: is there any best practice here that there isn’t consensus on and ought to be clarified further before it’s added to the spec?
  • If there are particular best practices in this list that are high value to trip planners and regulators to reconcile
  • Any recommendations for revising language related to the best practices on looping or inlining routes

Alternatives Considered

@emmambd emmambd added GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule Change: Best Practice Changes focusing on recommendations for optimal use of the specification. labels May 9, 2023
@e-lo
Copy link

e-lo commented May 10, 2023

Regarding

...places in the spec and best practices where they offer contradictory advice on when to add a file or field.

I'm not sure what is meant by contradictory in this case? I wouldn't consider it contradictory that something MAY be in the dataset and SHOULD (according to best practices) be in there.

@e-lo
Copy link

e-lo commented May 10, 2023

I'm very supportive of migrating the best practices as it lessens the burden on finding what I ideally should be doing w.r.t. dataset creation.

I would want to make sure that as the migration happens the two places where they are are mutually exclusive - meaning the github repos with the best practices should get smaller and smaller.

@emmambd
Copy link
Collaborator Author

emmambd commented May 10, 2023

I'm not sure what is meant by contradictory in this case? I wouldn't consider it contradictory that something MAY be in the dataset and SHOULD (according to best practices) be in there.

Confusing is perhaps better language here — there's no contradiction in severity/quality level. However the current experience as you highlighted over at #375 is already counterintuitive, and it seems worth it to make sure the new and improved experience is 100% clear:

The user experience is essentially:

Ah, I don't need to build agency.txt so I'm going to leave it out of my software implementation
Client is upset that agency.txt is not there b/c its use is widespread...points them to the best practices documentation
Ohhh, there is a whole other set of suggestions that I should have been doing all along?

The new experience we want here is:

I can skim the spec and see everything that's Recommended with ease, so I know exactly what to add if I want to follow best practice.

As opposed to:

I see some fields are Optional and promptly ignore them, and don't notice that the Optional field's description told me I should use the field, or should use it in x, y, z conditional cases.

@emmambd
Copy link
Collaborator Author

emmambd commented May 10, 2023

I would want to make sure that as the migration happens the two places where they are are mutually exclusive - meaning the github repos with the best practices should get smaller and smaller.

Agreed! An intended step of this work is to remove the best practices from the best practices repo once this is adopted in transit repo (I will add it to the recommended scope information).

@bdferris-v2
Copy link
Collaborator

@emmambd would you be willing to open up the audit spreadsheet to comments?

@emmambd
Copy link
Collaborator Author

emmambd commented May 10, 2023

@bdferris-v2 Yes! Apologies that it wasn't already. This has been fixed.

@NomeQ
Copy link

NomeQ commented May 12, 2023

Overall everything in the audit spreadsheet looks good to me. My only comment here is that the "Conditionally recommended, optional otherwise" category of fields still introduces some sense of ambiguity.

I guess that ambiguity is that if it doesn't meet the conditionally recommended guidelines, would it be preferable to include or exclude the field. It's not necessarily the same for each field, although I will admit that in many cases it's unclear what the alternative designation would be.

E.g. route_short_name: Conditionally required, and conditionally recommended if there is a short passenger name for the service. It's optional otherwise, but I would go so far as to say it is not recommended if there isn't actually a common passenger-facing short name for the service. That would be against best practices. I get that part of keeping it optional otherwise is simply for compatibility and clarity on validator behavior, and I don't exactly have a recommendation for how to change this.

In general, it seems like the word "optional" tends to muddle things anywhere it appears. Perhaps because something that is optional but isn't recommended doesn't seem to have much reason to exist.

@antrim
Copy link
Contributor

antrim commented May 12, 2023

In row 15:
block_id: "Recommended if trip is identified in frequencies.txt. Optional otherwise."

If I am reading correctly, I don't think this makes sense. block_id should always be optional or recommended if the same vehicle serves multiple trips. This is more complicated with service described using frequencies.txt, discussed here: #227 (comment)

Also made this comment in the spreadsheet.

@emmambd
Copy link
Collaborator Author

emmambd commented May 18, 2023

@NomeQ

In general, it seems like the word "optional" tends to muddle things anywhere it appears. Perhaps because something that is optional but isn't recommended doesn't seem to have much reason to exist.

I can see how “Conditionally recommended” can overcomplicate things and confuse the meaning of Optional. I’ve edited the spreadsheet to remove the Conditionally recommended status and instead just include “should” language in the description for conditional cases. For example:

Field Presence Revised Description (new text in bold)
feed_contact_email Optional Email address for communication regarding the GTFS dataset and data publishing practices. feed_contact_email is a technical contact for GTFS-consuming applications. Provide customer service contact information through agency.txt. Should be used when feed_info.feed_contact_url is empty.

Now the Recommended presence is only used for cases like feed_info.txt or agency_id: file and fields that should be used universally but aren't declared required for the sake of backwards compatibility.

E.g. route_short_name: Conditionally required, and conditionally recommended if there is a short passenger name for the service. It's optional otherwise, but I would go so far as to say it is not recommended if there isn't actually a common passenger-facing short name for the service.

Slightly revised in spreadsheet for clarity.

I’ve also modified the Recommended definition slightly in the spreadsheet so it echoes the RFC definition and makes it clearer how it should be considered by producers.

@emmambd
Copy link
Collaborator Author

emmambd commented May 18, 2023

@antrim Commenting this here as well as the spreadsheet for visibility:

Thanks for sharing that historical context — it's a massive help. Agreed. I've updated to just say "May be provided for frequency-based trips." at the end of the current description, since the intent of the original best practice seems to be to just emphasize that block_id can be used for frequency-based trips, not that it should be.

Feel free to suggest alternate language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Change: Best Practice Changes focusing on recommendations for optimal use of the specification. GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants