Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/schema selection #96

Closed
wants to merge 2 commits into from
Closed

Conversation

Somtom
Copy link
Contributor

@Somtom Somtom commented Aug 19, 2021

Description of change

Currently the selection within the catalog is not applied to the schema which is populated by the tap. This means the schema is always populated in a full version with all the fields for a stream.

Examples on the impact can be found in #47 .

This MR aims to solve this issue by applying filtering to the schema which gets populated.
Some of the code is based on the Meltano SDK and was adjusted a bit.

I did not find any contribution guidelines so please let me know if I oversaw or forgot something. I am willing to adjust it :) .

Example:

When only selecting customers.id and customers.name

Before:

{"type": "SCHEMA", "stream": "customers", "schema": {"properties": {"metadata": {"properties": {}, "type": ["null", "object"]}, "preferred_locales": {"items": {"type": ["null", "string"]}, "type": ["null", "array"]}, "invoice_settings": {"properties": {"custom_fields": {"items": {"type": ["null", "string"]}, "type": ["null", "array"]}, "default_payment_method": {"type": ["null", "string"]}, "footer": {"type": ["null", "string"]}}, "type": ["null", "object"]}, "name": {"type": ["null", "string"]}, "tax_exempt": {"type": ["null", "string"]}, "next_invoice_sequence": {"type": ["null", "integer"]}, "balance": {"type": ["null", "integer"]}, "phone": {"type": ["null", "string"]}, "address": {"properties": {"city": {"type": ["null", "string"]}, "country": {"type": ["null", "string"]}, "line1": {"type": ["null", "string"]}, "line2": {"type": ["null", "string"]}, "postal_code": {"type": ["null", "string"]}, "state": {"type": ["null", "string"]}}, "type": ["null", "object"]}, "shipping": {"properties": {"address": {"properties": {"line2": {"type": ["null", "string"]}, "state": {"type": ["null", "string"]}, "city": {"type": ["null", "string"]}, "postal_code": {"type": ["null", "string"]}, "country": {"type": ["null", "string"]}, "line1": {"type": ["null", "string"]}}, "type": ["null", "object"]}, "name": {"type": ["null", "string"]}, "phone": {"type": ["null", "string"]}}, "type": ["null", "object"]}, "sources": {"anyOf": [{"type": ["null", "array"], "items": {"type": ["null", "object"], "properties": {"metadata": {"type": ["null", "object"], "properties": {}}, "type": {"type": ["null", "string"]}, "address_zip": {"type": ["null", "string"]}, "livemode": {"type": ["null", "boolean"]}, "card": {"type": ["null", "object"], "properties": {"fingerprint": {"type": ["null", "string"]}, "last4": {"type": ["null", "string"]}, "dynamic_last4": {"type": ["null", "string"]}, "address_line1_check": {"type": ["null", "string"]}, "exp_month": {"type": ["null", "integer"]}, "tokenization_method": {"type": ["null", "string"]}, "name": {"type": ["null", "string"]}, "exp_year": {"type": ["null", "integer"]}, "three_d_secure": {"type": ["null", "string"]}, "funding": {"type": ["null", "string"]}, "brand": {"type": ["null", "string"]}, "cvc_check": {"type": ["null", "string"]}, "country": {"type": ["null", "string"]}, "address_zip_check": {"type": ["null", "string"]}, "type": {"type": ["null", "string"]}}}, "statement_descriptor": {"type": ["null", "string"]}, "id": {"type": ["null", "string"]}, "address_country": {"type": ["null", "string"]}, "funding": {"type": ["null", "string"]}, "dynamic_last4": {"type": ["null", "string"]}, "exp_year": {"type": ["null", "integer"]}, "last4": {"type": ["null", "string"]}, "exp_month": {"type": ["null", "integer"]}, "brand": {"type": ["null", "string"]}, "address_line2": {"type": ["null", "string"]}, "country": {"type": ["null", "string"]}, "object": {"type": ["null", "string"]}, "amount": {"type": ["null", "integer"]}, "cvc_check": {"type": ["null", "string"]}, "usage": {"type": ["null", "string"]}, "address_line1": {"type": ["null", "string"]}, "owner": {"type": ["null", "object"], "properties": {"verified_address": {"type": ["null", "string"]}, "email": {"type": ["null", "string"]}, "address": {"type": ["null", "object"], "properties": {"line2": {"type": ["null", "string"]}, "state": {"type": ["null", "string"]}, "city": {"type": ["null", "string"]}, "postal_code": {"type": ["null", "string"]}, "country": {"type": ["null", "string"]}, "line1": {"type": ["null", "string"]}}}, "verified_email": {"type": ["null", "string"]}, "name": {"type": ["null", "string"]}, "phone": {"type": ["null", "string"]}, "verified_name": {"type": ["null", "string"]}, "verified_phone": {"type": ["null", "string"]}}}, "tokenization_method": {"type": ["null", "string"]}, "client_secret": {"type": ["null", "string"]}, "fingerprint": {"type": ["null", "string"]}, "address_city": {"type": ["null", "string"]}, "currency": {"type": ["null", "string"]}, "address_line1_check": {"type": ["null", "string"]}, "receiver": {"type": ["null", "object"], "properties": {"refund_attributes_method": {"type": ["null", "string"]}, "amount_returned": {"type": ["null", "integer"]}, "amount_received": {"type": ["null", "integer"]}, "refund_attributes_status": {"type": ["null", "string"]}, "address": {"type": ["null", "string"]}, "amount_charged": {"type": ["null", "integer"]}}}, "flow": {"type": ["null", "string"]}, "name": {"type": ["null", "string"]}, "ach_credit_transfer": {"type": ["null", "object"], "properties": {"bank_name": {"type": ["null", "string"]}, "fingerprint": {"type": ["null", "string"]}, "routing_number": {"type": ["null", "string"]}, "swift_code": {"type": ["null", "string"]}, "refund_account_holder_type": {"type": ["null", "string"]}, "refund_account_holder_name": {"type": ["null", "string"]}, "refund_account_number": {"type": ["null", "string"]}, "refund_routing_number": {"type": ["null", "string"]}, "account_number": {"type": ["null", "string"]}}}, "customer": {"type": ["null", "string"]}, "address_zip_check": {"type": ["null", "string"]}, "status": {"type": ["null", "string"]}, "created": {"type": ["null", "string"], "format": "date-time"}, "address_state": {"type": ["null", "string"]}, "alipay": {"type": ["null", "object"], "properties": {}}, "bancontact": {"type": ["null", "object"], "properties": {}}, "eps": {"type": ["null", "object"], "properties": {}}, "ideal": {"type": ["null", "object"], "properties": {}}, "multibanco": {"type": ["null", "object"], "properties": {}}, "redirect": {"type": ["null", "object"], "properties": {"failure_reason": {"type": ["null", "string"]}, "return_url": {"type": ["null", "string"]}, "status": {"type": ["null", "string"]}, "url": {"type": ["null", "string"]}}}}}}, {"type": ["null", "object"], "properties": {"metadata": {"type": ["null", "object"], "properties": {}}, "type": {"type": ["null", "string"]}, "address_zip": {"type": ["null", "string"]}, "livemode": {"type": ["null", "boolean"]}, "card": {"type": ["null", "object"], "properties": {"fingerprint": {"type": ["null", "string"]}, "last4": {"type": ["null", "string"]}, "dynamic_last4": {"type": ["null", "string"]}, "address_line1_check": {"type": ["null", "string"]}, "exp_month": {"type": ["null", "integer"]}, "tokenization_method": {"type": ["null", "string"]}, "name": {"type": ["null", "string"]}, "exp_year": {"type": ["null", "integer"]}, "three_d_secure": {"type": ["null", "string"]}, "funding": {"type": ["null", "string"]}, "brand": {"type": ["null", "string"]}, "cvc_check": {"type": ["null", "string"]}, "country": {"type": ["null", "string"]}, "address_zip_check": {"type": ["null", "string"]}, "type": {"type": ["null", "string"]}}}, "statement_descriptor": {"type": ["null", "string"]}, "id": {"type": ["null", "string"]}, "address_country": {"type": ["null", "string"]}, "funding": {"type": ["null", "string"]}, "dynamic_last4": {"type": ["null", "string"]}, "exp_year": {"type": ["null", "integer"]}, "last4": {"type": ["null", "string"]}, "exp_month": {"type": ["null", "integer"]}, "brand": {"type": ["null", "string"]}, "address_line2": {"type": ["null", "string"]}, "country": {"type": ["null", "string"]}, "object": {"type": ["null", "string"]}, "amount": {"type": ["null", "integer"]}, "cvc_check": {"type": ["null", "string"]}, "usage": {"type": ["null", "string"]}, "address_line1": {"type": ["null", "string"]}, "owner": {"type": ["null", "object"], "properties": {"verified_address": {"type": ["null", "string"]}, "email": {"type": ["null", "string"]}, "address": {"type": ["null", "object"], "properties": {"line2": {"type": ["null", "string"]}, "state": {"type": ["null", "string"]}, "city": {"type": ["null", "string"]}, "postal_code": {"type": ["null", "string"]}, "country": {"type": ["null", "string"]}, "line1": {"type": ["null", "string"]}}}, "verified_email": {"type": ["null", "string"]}, "name": {"type": ["null", "string"]}, "phone": {"type": ["null", "string"]}, "verified_name": {"type": ["null", "string"]}, "verified_phone": {"type": ["null", "string"]}}}, "tokenization_method": {"type": ["null", "string"]}, "client_secret": {"type": ["null", "string"]}, "fingerprint": {"type": ["null", "string"]}, "address_city": {"type": ["null", "string"]}, "currency": {"type": ["null", "string"]}, "address_line1_check": {"type": ["null", "string"]}, "receiver": {"type": ["null", "object"], "properties": {"refund_attributes_method": {"type": ["null", "string"]}, "amount_returned": {"type": ["null", "integer"]}, "amount_received": {"type": ["null", "integer"]}, "refund_attributes_status": {"type": ["null", "string"]}, "address": {"type": ["null", "string"]}, "amount_charged": {"type": ["null", "integer"]}}}, "flow": {"type": ["null", "string"]}, "name": {"type": ["null", "string"]}, "ach_credit_transfer": {"type": ["null", "object"], "properties": {"bank_name": {"type": ["null", "string"]}, "fingerprint": {"type": ["null", "string"]}, "routing_number": {"type": ["null", "string"]}, "swift_code": {"type": ["null", "string"]}, "refund_account_holder_type": {"type": ["null", "string"]}, "refund_account_holder_name": {"type": ["null", "string"]}, "refund_account_number": {"type": ["null", "string"]}, "refund_routing_number": {"type": ["null", "string"]}, "account_number": {"type": ["null", "string"]}}}, "customer": {"type": ["null", "string"]}, "address_zip_check": {"type": ["null", "string"]}, "status": {"type": ["null", "string"]}, "created": {"type": ["null", "string"], "format": "date-time"}, "address_state": {"type": ["null", "string"]}, "alipay": {"type": ["null", "object"], "properties": {}}, "bancontact": {"type": ["null", "object"], "properties": {}}, "eps": {"type": ["null", "object"], "properties": {}}, "ideal": {"type": ["null", "object"], "properties": {}}, "multibanco": {"type": ["null", "object"], "properties": {}}, "redirect": {"type": ["null", "object"], "properties": {"failure_reason": {"type": ["null", "string"]}, "return_url": {"type": ["null", "string"]}, "status": {"type": ["null", "string"]}, "url": {"type": ["null", "string"]}}}}}]}, "delinquent": {"type": ["null", "boolean"]}, "description": {"type": ["null", "string"]}, "livemode": {"type": ["null", "boolean"]}, "default_source": {"type": ["null", "string"]}, "cards": {"items": {"properties": {"metadata": {"properties": {}, "type": ["null", "object"]}, "object": {"type": ["null", "string"]}, "id": {"type": ["null", "string"]}, "exp_month": {"type": ["null", "integer"]}, "dynamic_last4": {"type": ["null", "string"]}, "exp_year": {"type": ["null", "integer"]}, "last4": {"type": ["null", "string"]}, "funding": {"type": ["null", "string"]}, "brand": {"type": ["null", "string"]}, "country": {"type": ["null", "string"]}, "customer": {"type": ["null", "string"]}, "cvc_check": {"type": ["null", "string"]}, "address_line2": {"type": ["null", "string"]}, "address_line1": {"type": ["null", "string"]}, "fingerprint": {"type": ["null", "string"]}, "address_zip": {"type": ["null", "string"]}, "address_city": {"type": ["null", "string"]}, "address_country": {"type": ["null", "string"]}, "address_line1_check": {"type": ["null", "string"]}, "tokenization_method": {"type": ["null", "string"]}, "name": {"type": ["null", "string"]}, "address_state": {"type": ["null", "string"]}, "address_zip_check": {"type": ["null", "string"]}, "type": {"type": ["null", "string"]}}, "type": ["null", "object"]}, "type": ["null", "array"]}, "email": {"type": ["null", "string"]}, "default_card": {"type": ["null", "string"]}, "subscriptions": {"items": {"type": ["null", "string"]}, "type": ["null", "array"]}, "discount": {"properties": {"end": {"format": "date-time", "type": ["null", "string"]}, "coupon": {"properties": {"metadata": {"properties": {}, "type": ["null", "object"]}, "valid": {"type": ["null", "boolean"]}, "livemode": {"type": ["null", "boolean"]}, "amount_off": {"type": ["null", "integer"]}, "redeem_by": {"format": "date-time", "type": ["null", "string"]}, "duration_in_months": {"type": ["null", "integer"]}, "percent_off_precise": {"type": ["null", "number"]}, "max_redemptions": {"type": ["null", "integer"]}, "currency": {"type": ["null", "string"]}, "name": {"type": ["null", "string"]}, "times_redeemed": {"type": ["null", "integer"]}, "id": {"type": ["null", "string"]}, "duration": {"type": ["null", "string"]}, "object": {"type": ["null", "string"]}, "percent_off": {"type": ["null", "integer"]}, "created": {"format": "date-time", "type": ["null", "string"]}}, "type": ["null", "object"]}, "customer": {"type": ["null", "string"]}, "start": {"format": "date-time", "type": ["null", "string"]}, "object": {"type": ["null", "string"]}, "subscription": {"type": ["null", "string"]}}, "type": ["null", "object"]}, "account_balance": {"type": ["null", "integer"]}, "currency": {"type": ["null", "string"]}, "id": {"type": ["null", "string"]}, "invoice_prefix": {"type": ["null", "string"]}, "tax_info_verification": {"type": ["null", "string"]}, "object": {"type": ["null", "string"]}, "created": {"format": "date-time", "type": ["null", "string"]}, "tax_info": {"type": ["null", "string"]}, "updated": {"format": "date-time", "type": ["null", "string"]}}, "type": ["null", "object"]}, "key_properties": ["id"]}

After:

{
  "type": "SCHEMA",
  "stream": "customers",
  "schema": {
    "properties": {
      "name": {
        "type": [
          "null",
          "string"
        ]
      },
      "id": {
        "type": [
          "null",
          "string"
        ]
      },
      "created": {
        "format": "date-time",
        "type": [
          "null",
          "string"
        ]
      },
      "updated": {
        "format": "date-time",
        "type": [
          "null",
          "string"
        ]
      }
    },
    "type": [
      "null",
      "object"
    ]
  },
  "key_properties": [
    "id"
  ]
}

Manual QA steps

  • Create a catalog.json tap-stripe --config config.json --discover > catalog.json
  • Select specific fields of certain streams as described in the README
  • Run the tap tap-stripe --config config.json --catalog catalog.json
  • Check that the populated schema only includes selected fields

Risks

  • Implementations which rely on the whole schema being populated could break since the target does not receive the same schema as before anymore -> even if only specific fields where selected the database might have created a table with all fields already. We could add an environment variable for schema filtering which is false by default. That way we can ensure backward compatibility

Rollback steps

  • I guess just reverting the code

@cmerrick
Copy link
Contributor

Hi @Somtom, thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes.

@cmerrick
Copy link
Contributor

You did it @Somtom!

Thank you for signing the Singer Contribution License Agreement.

@Somtom
Copy link
Contributor Author

Somtom commented Aug 25, 2021

@cmerrick any hints on things I can do additionally?

@Somtom
Copy link
Contributor Author

Somtom commented Sep 10, 2021

@kspeer825 I saw you created a PR recently. Can you maybe support me on getting this feature in? Seems that no one is responding

@dmosorast
Copy link
Contributor

dmosorast commented Sep 10, 2021

@Somtom This is a really interesting use case and feature. To start, I'm hesitant to merge this into the main line repository here for a few reasons. First off, the case you mention isn't a required approach that targets should take, and I'd encourage the target authors to build up more flexibility regarding unsupported schemas and such. The target's job is to handle everything thrown at it by any old tap, so a specific use case like lack of SQL Server supporting a certain column type would be more in their wheelhouse, rather than the tap's.

That said, developing standards around what the target can expect beyond "everything" can be beneficial. I'm very curious about the practice of filtering the schema itself, since in reality I realize that hardening a target such that it can accept an arbitrary schema is a tough problem (especially when you just want other data! 😃 ). Could you link to where in the Meltano SDK you adapted this from?

If it were something that makes sense to become a standard, I would prefer to see it heavily tested with unit tests and part of singer-python somehow as a full fledged feature (similar to how the Transformer filters out non-selected fields for records). Additionally, the code for this tap is rather "evolved" and cluttered, so including this at this level seems detrimental overall.

All things considered, if it's your specific use case I encourage you to continue running with this feature from a fork. I'll kick the idea around a few folks and see what comes out of it. Cheers!

@Somtom
Copy link
Contributor Author

Somtom commented Sep 13, 2021

Hey @dmosorast thanks for the response :) .

Let me link some of the resources.

One thing I could think of to ensure backward compatibility would be that we add a settings variable which activates the schema selection. It could be false by default.

On the unit tests I actually agree. This is something I could definitely add

@dmosorast
Copy link
Contributor

@Somtom Finally able to give these a look. Thanks a bunch for the links, that's really helpful for me. There's a bit of confusion right now as to what is officially recognized as a best practice and what has been developed by the practitioners of Singer over the years. Those links describe Meltano's ideas of best practices and not necessarily part of the official spec. That said, this sort of discrepancy is something that we (read: a rather large segment of the Singer community as a whole) are trying to figure out how best to handle, so hopefully it will be better in the future.

The core singer spec lives here, and doesn't necessarily specify anything above the messaging format (like the concept of field selection), so this falls more into a potential "Patterns and Best Practices" space.

@Somtom Somtom closed this Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants