-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/schema selection #96
Conversation
Hi @Somtom, thanks for your contribution! In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. |
You did it @Somtom! Thank you for signing the Singer Contribution License Agreement. |
@cmerrick any hints on things I can do additionally? |
@kspeer825 I saw you created a PR recently. Can you maybe support me on getting this feature in? Seems that no one is responding |
@Somtom This is a really interesting use case and feature. To start, I'm hesitant to merge this into the main line repository here for a few reasons. First off, the case you mention isn't a required approach that targets should take, and I'd encourage the target authors to build up more flexibility regarding unsupported schemas and such. The target's job is to handle everything thrown at it by any old tap, so a specific use case like lack of SQL Server supporting a certain column type would be more in their wheelhouse, rather than the tap's. That said, developing standards around what the target can expect beyond "everything" can be beneficial. I'm very curious about the practice of filtering the schema itself, since in reality I realize that hardening a target such that it can accept an arbitrary schema is a tough problem (especially when you just want other data! 😃 ). Could you link to where in the Meltano SDK you adapted this from? If it were something that makes sense to become a standard, I would prefer to see it heavily tested with unit tests and part of singer-python somehow as a full fledged feature (similar to how the Transformer filters out non-selected fields for records). Additionally, the code for this tap is rather "evolved" and cluttered, so including this at this level seems detrimental overall. All things considered, if it's your specific use case I encourage you to continue running with this feature from a fork. I'll kick the idea around a few folks and see what comes out of it. Cheers! |
Hey @dmosorast thanks for the response :) . Let me link some of the resources.
One thing I could think of to ensure backward compatibility would be that we add a settings variable which activates the schema selection. It could be On the unit tests I actually agree. This is something I could definitely add |
@Somtom Finally able to give these a look. Thanks a bunch for the links, that's really helpful for me. There's a bit of confusion right now as to what is officially recognized as a best practice and what has been developed by the practitioners of Singer over the years. Those links describe Meltano's ideas of best practices and not necessarily part of the official spec. That said, this sort of discrepancy is something that we (read: a rather large segment of the Singer community as a whole) are trying to figure out how best to handle, so hopefully it will be better in the future. The core singer spec lives here, and doesn't necessarily specify anything above the messaging format (like the concept of field selection), so this falls more into a potential "Patterns and Best Practices" space. |
Description of change
Currently the selection within the catalog is not applied to the schema which is populated by the tap. This means the schema is always populated in a full version with all the fields for a stream.
Examples on the impact can be found in #47 .
This MR aims to solve this issue by applying filtering to the schema which gets populated.
Some of the code is based on the Meltano SDK and was adjusted a bit.
I did not find any contribution guidelines so please let me know if I oversaw or forgot something. I am willing to adjust it :) .
Example:
When only selecting
customers.id
andcustomers.name
Before:
After:
Manual QA steps
tap-stripe --config config.json --discover > catalog.json
tap-stripe --config config.json --catalog catalog.json
Risks
false
by default. That way we can ensure backward compatibilityRollback steps