Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add previously experimental Cloud "deploy" functionality (DRAFT - DO NOT MERGE) #419

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Oct 15, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new utility functions for generating unique identifiers.
    • Added new methods for listing connections, sources, and destinations.
    • Enhanced CloudConnection with new properties for better resource management.
  • Bug Fixes

    • Improved error handling for API interactions, including clearer error messages.
  • Refactor

    • Streamlined various classes and methods to enhance clarity and functionality, including the CloudWorkspace and CacheBase classes.
    • Simplified the Source class structure.
  • Documentation

    • Updated documentation for new and existing functions to clarify usage.
    • Adjusted the list of documented modules to reflect recent changes.

Copy link

coderabbitai bot commented Oct 15, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces significant enhancements to the API utility functions within the airbyte codebase. Key changes include improved type hinting, updated function signatures, and the addition of new functions for managing sources and destinations. Error handling has been refined, and several existing functions have undergone logic refactoring. Additionally, new utility functions for generating unique identifiers have been added, and the CacheBase class has been simplified by removing unnecessary attributes. The overall structure and functionality of the API interactions have been enhanced.

Changes

File Path Change Summary
airbyte/_util/api_util.py Enhanced API interaction functions with type hinting, updated signatures, new functions, and improved error handling.
airbyte/_util/text_util.py Introduced utility functions for generating unique identifiers: generate_ulid and generate_random_suffix.
airbyte/caches/base.py Simplified CacheBase class by removing three private attributes related to deployment.
airbyte/cloud/connections.py Enhanced CloudConnection class with new properties and updated logic for existing properties and methods.
airbyte/cloud/connectors.py Added CloudConnector base class with abstract properties and methods for managing cloud connectors.
airbyte/cloud/experimental.py Deleted the file containing experimental features for the Airbyte Cloud API.
airbyte/cloud/workspaces.py Modified CloudWorkspace class with refactored deployment methods and new listing methods.
airbyte/exceptions.py Introduced AirbyteDuplicateResourcesError class and modified AirbyteMissingResourceError for indentation.
airbyte/sources/base.py Simplified Source class by changing connector_type to a simple assignment and removing private attributes.

Possibly related PRs

Suggested labels

enable-ai-review

Suggested reviewers

  • natikgadzhi
  • erohmensing
  • bindipankhudi
  • bnchrch

What do you think about these suggestions?


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 9b40875 and 4777d24.

📒 Files selected for processing (1)
  • docs/generate.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (1)
docs/generate.py (1)

21-21: Updating documented modules. Is this intentional?

I noticed we're swapping out "airbyte/cloud/experimental.py" for "airbyte/cli.py" in the public_modules list. This change will affect which modules get documented. Are we sure we want to remove the experimental cloud module from the docs? And are we ready to publicize the CLI module? Just want to make sure this aligns with our current documentation strategy. Wdyt?

To ensure we're not missing anything, could we run this quick check?

This will help us catch any lingering references and confirm the new module exists.

✅ Verification successful

Module changes verified successfully.
All references to airbyte/cloud/experimental.py have been removed, and airbyte/cli.py exists in the repository.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to the experimental cloud module
# and verify the existence of the new CLI module.

echo "Checking for remaining references to experimental cloud module:"
rg "airbyte/cloud/experimental"

echo "\nVerifying existence of CLI module:"
ls airbyte/cli.py

Length of output: 286


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (11)
airbyte/_util/text_util.py (3)

9-11: The generate_ulid() function looks good, but how about enhancing the docstring?

The function is well-implemented, but the docstring could be more informative. What do you think about expanding it to explain what a ULID is and its characteristics? Something like this, perhaps?

def generate_ulid() -> str:
    """Generate a new ULID (Universally Unique Lexicographically Sortable Identifier).

    A ULID is a 128-bit identifier that combines a timestamp with randomly generated bits,
    resulting in a sortable, unique identifier that's more compact than a UUID.

    Returns:
        str: A 26-character string representation of the ULID.
    """
    return str(ulid.ULID())

This would provide more context for users of the function. WDYT?


14-22: Great implementation of generate_random_suffix()! How about a tiny optimization?

The function looks solid, and I love the detailed docstring explaining its behavior and limitations. The use of ULID for a sortable suffix is clever!

Just a small suggestion: we could simplify the implementation slightly by using string slicing in a single line. What do you think about this?

def generate_random_suffix() -> str:
    """Generate a random suffix for use in temporary names.

    By default, this function generates a ULID and returns a 9-character string
    which will be monotonically sortable. It is not guaranteed to be unique but
    is sufficient for small-scale and medium-scale use cases.
    """
    return generate_ulid()[:6] + generate_ulid()[-3:]

This achieves the same result but in a slightly more concise way. WDYT?


1-22: The overall structure of the file looks great! Any plans for more text utilities?

I really like how clean and focused this file is. The two functions are well-organized and clearly related. Great job on keeping it simple and to the point!

Just curious, do you have any plans to add more text utility functions in the future? This file seems like a great place for them. Maybe something for text sanitization, truncation, or other common text operations you might need across the project? No pressure, just thinking ahead!

Keep up the awesome work! 😊

airbyte/exceptions.py (2)

489-494: LGTM! New exception class looks good.

The new AirbyteDuplicateResourcesError class is well-structured and follows the established pattern. It provides a clear way to handle duplicate resource scenarios.

Quick thought: Would it be helpful to add a default value for the guidance attribute, similar to other exception classes in this file? Something like "Please choose a unique name for the resource." wdyt?


497-497: Hmm, duplicate "Custom Warnings" sections?

I noticed we now have two "Custom Warnings" sections in the file. The new one here and another at the end of the file. Should we consider consolidating these for better organization? Maybe move AirbyteMultipleResourcesError to the existing section at the end? What do you think?

airbyte/sources/base.py (1)

54-54: Simplified connector_type attribute, but lost type hinting. Thoughts on keeping both?

The connector_type attribute has been simplified from a type-annotated literal to a simple string assignment. While this makes the code cleaner, we've lost the type hinting.

What do you think about keeping the type hint for better IDE support and static type checking? We could do something like:

connector_type: Literal["source"] = "source"

This way, we maintain the simplicity while preserving the type information. WDYT?

airbyte/cloud/connectors.py (1)

39-39: Consider updating docstring to generalize connector type—wdyt?

Since the connector_url method applies to both sources and destinations, would it make sense to update the docstring to "Get the URL of the connector"? Let me know your thoughts.

airbyte/cloud/connections.py (1)

209-219: Enhance the method docstring to reflect cascading deletions.

Since cascade_delete_source and cascade_delete_destination allow deleting the source and destination, should we update the method description to reflect this additional functionality? Wdyt?

airbyte/_util/api_util.py (3)

104-105: Suggestion: Clarify Error Message

The error message could be more precise. Maybe rephrase it to "You can provide either 'name' or 'name_filter', but not both." to enhance clarity. Wdyt?


143-143: Update Docstring in list_sources

The docstring currently says "Get a connection." Should we update it to "List sources." to reflect the function's purpose? Wdyt?


179-179: Update Docstring in list_destinations

The docstring states "Get a connection." Consider updating it to "List destinations." to accurately describe the function. Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f3bc4c6 and 9b40875.

📒 Files selected for processing (9)
  • airbyte/_util/api_util.py (17 hunks)
  • airbyte/_util/text_util.py (1 hunks)
  • airbyte/caches/base.py (0 hunks)
  • airbyte/cloud/connections.py (5 hunks)
  • airbyte/cloud/connectors.py (1 hunks)
  • airbyte/cloud/experimental.py (0 hunks)
  • airbyte/cloud/workspaces.py (6 hunks)
  • airbyte/exceptions.py (1 hunks)
  • airbyte/sources/base.py (1 hunks)
💤 Files with no reviewable changes (2)
  • airbyte/caches/base.py
  • airbyte/cloud/experimental.py
🧰 Additional context used
🔇 Additional comments (16)
airbyte/_util/text_util.py (1)

1-7: LGTM! The file header and imports look good.

The copyright notice, module docstring, and imports are all appropriate for the functionality provided. Nice job using from __future__ import annotations for better type hinting support! 👍

airbyte/exceptions.py (1)

489-489: Thumbs up for the indentation fix!

Nice catch on aligning the resource_name_or_id attribute. It's these little details that keep our codebase tidy and consistent. 👍

airbyte/sources/base.py (2)

54-54: Overall, good cleanup! Any other simplifications in mind?

These changes look good overall! You've simplified the connector_type attribute and removed some unused private attributes. This kind of cleanup helps keep the codebase lean and maintainable.

Do you have any other simplifications or cleanups in mind for this class? It might be worth taking a look at other attributes or methods that could benefit from similar treatment.


54-54: Removed private attributes. Can we double-check their usage?

I noticed that three private attributes (_deployed_api_root, _deployed_workspace_id, and _deployed_source_id) have been removed. This is great for code cleanliness if they were unused!

Just to be on the safe side, could we run a quick search through the codebase to ensure these weren't used anywhere else? Maybe something like:

This would help us catch any lingering references. What do you think?

airbyte/cloud/connectors.py (1)

40-40: Verify URL construction for correctness—wdyt?

Just to double-check, the connector_url method constructs the URL by appending {self.connector_type}s, resulting in paths like .../sources/{connector_id} or .../destinations/{connector_id}. Does this match the expected URL structure in Airbyte Cloud? Let me know if this aligns with the API endpoints.

airbyte/cloud/connections.py (5)

9-9: Importing CloudDestination and CloudSource looks good.

This addition enables the usage of these classes later in the code.


105-105: Good handling of None for streams.

Using or [] ensures that we don't encounter a TypeError when self._connection_info.configurations.streams is None. Nice work!


113-113: Properly handling None for prefix.

Returning an empty string when self._connection_info.prefix is None prevents potential issues when using this value. Looks good!


174-176: Adding _latest_job_info to SyncResult instantiation makes sense.

This provides SyncResult with access to the latest job information, which can be useful for further processing. Good addition!


223-227: Confirming safe deletion when cascading is enabled.

When cascade_delete_source or cascade_delete_destination are True, the source or destination will be permanently deleted. Should we add safeguards or confirmations to prevent accidental deletions of these resources? Wdyt?

airbyte/_util/api_util.py (6)

17-17: Approved: Enhancement of Type Hints

The addition of TYPE_CHECKING and Any from the typing module improves type hinting and code clarity.


27-27: Approved: Importing PyAirbyteInputError

Including PyAirbyteInputError in the exceptions enhances error handling for input validation.


31-33: Approved: Conditional Import of Callable

Using if TYPE_CHECKING to conditionally import Callable optimizes runtime performance by avoiding unnecessary imports.


235-239: Approved: Improved Exception Handling in get_connection

The updated exception handling provides more informative error messages, enhancing debugging and user experience.


563-569: Approved: Refactoring Stream Configurations

Wrapping stream_configurations in models.StreamConfigurations ensures compatibility with the API requirements.


390-397: Approved: Enhanced Error Handling in get_source

The updated error handling in get_source correctly raises an AirbyteMissingResourceError when a source is not found, improving reliability.

Comment on lines +62 to +65
def connector_type(self) -> Literal["source", "destination"]:
"""Get the type of the connector."""
return "source"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using class variables for connector_type—wdyt?

Since connector_type returns a constant value in both CloudSource and CloudDestination, perhaps defining it as a class variable instead of a property could simplify the code. What do you think?

Here's how it might look:

In CloudSource:

 class CloudSource(CloudConnector):
+    connector_type: Literal["source", "destination"] = "source"

     @property
     def source_id(self) -> str:
         """Get the ID of the source.

         This is an alias for `connector_id`.
         """
         return self.connector_id

-    @property
-    def connector_type(self) -> Literal["source", "destination"]:
-        """Get the type of the connector."""
-        return "source"

And in CloudDestination:

 class CloudDestination(CloudConnector):
+    connector_type: Literal["source", "destination"] = "destination"

     @property
     def destination_id(self) -> str:
         """Get the ID of the destination.

         This is an alias for `connector_id`.
         """
         return self.connector_id

-    @property
-    def connector_type(self) -> Literal["source", "destination"]:
-        """Get the type of the connector."""
-        return "destination"

Also applies to: 79-81

Comment on lines +72 to +79
@property
def source(self) -> CloudSource:
"""Get the source object."""
return CloudSource(
workspace=self.workspace,
connector_id=self.source_id,
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider caching the CloudSource instance.

Currently, each access to the source property creates a new CloudSource object. Should we cache this instance to avoid redundant object creation and improve performance? Wdyt?

@@ -79,21 +88,29 @@

return cast(str, self._destination_id)

@property
def destination(self) -> CloudDestination:
"""Get the source object."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Update docstring to refer to 'destination' instead of 'source'.

The docstring says "Get the source object," but this property returns a CloudDestination. Should we update it to "Get the destination object"? Wdyt?

Comment on lines +88 to +93
existing = self.list_sources(name=name)
if existing:
raise exc.AirbyteDuplicateResourcesError(
resource_type="destination",
resource_name=name,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Possible inconsistency in the error message for duplicate source names

In the deploy_source method, when raising AirbyteDuplicateResourcesError, the resource_type is set to "destination":

raise exc.AirbyteDuplicateResourcesError(
    resource_type="destination",
    resource_name=name,
)

Since we're dealing with sources here, should resource_type be "source" instead? Wdyt?

Comment on lines +160 to +164
if not isinstance(source, str | CloudSource):
raise exc.PyAirbyteInputError(
message="Invalid source type.",
input_value=type(source).__name__,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use tuple for multiple types in isinstance check

In the permanently_delete_source method, the isinstance check uses the bitwise | operator:

if not isinstance(source, str | CloudSource):
    # ...

However, isinstance expects a type or a tuple of types. Should this be updated to use a tuple instead?

if not isinstance(source, (str, CloudSource)):
    # ...

Wdyt?

Comment on lines +182 to +185
if not isinstance(destination, str | CloudDestination):
raise exc.PyAirbyteInputError(
message="Invalid destination type.",
input_value=type(destination).__name__,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use tuple for multiple types in isinstance check

Similarly, in the permanently_delete_destination method, the isinstance check uses str | CloudDestination:

if not isinstance(destination, str | CloudDestination):
    # ...

Should we change this to a tuple for proper type checking?

if not isinstance(destination, (str, CloudDestination)):
    # ...

Wdyt?

Comment on lines +219 to +222
if not selected_streams:
raise exc.PyAirbyteInputError(
guidance="You must provide `selected_streams` when creating a connection."
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider making selected_streams a required parameter

In the deploy_connection method, selected_streams has a default value of None, but the code raises an error if it's not provided:

if not selected_streams:
    raise exc.PyAirbyteInputError(
        guidance="You must provide `selected_streams` when creating a connection."
    )

Maybe it would be clearer to make selected_streams a required parameter without a default value to reflect that it's mandatory. Wdyt?

Comment on lines +230 to +235
if not selected_streams:
raise exc.PyAirbyteInputError(
guidance="You must provide either a destination ID or a cache object."
guidance=(
"You must provide `selected_streams` when creating a connection "
"from an existing destination."
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Redundant check for selected_streams

There seems to be a second check for selected_streams being provided:

if not selected_streams:
    raise exc.PyAirbyteInputError(
        guidance=(
            "You must provide `selected_streams` when creating a connection "
            "from an existing destination."
        )
    )

Since we already have this check earlier in the method, perhaps this block can be removed to avoid redundancy? Wdyt?

Comment on lines +475 to +497
if status_ok(response.status_code) and response.destination_response:
# TODO: This is a temporary workaround to resolve an issue where
# the destination API response is of the wrong type.
# https://github.com/airbytehq/pyairbyte/issues/320
raw_response: dict[str, Any] = json.loads(response.raw_response.text)
raw_configuration: dict[str, Any] = raw_response["configuration"]

destination_type = raw_response.get("destinationType")
if destination_type == "snowflake":
response.destination_response.configuration = models.DestinationSnowflake.from_dict(
raw_configuration,
response.destination_response.configuration = models.DestinationSnowflake(
**raw_configuration,
)
if destination_type == "bigquery":
response.destination_response.configuration = models.DestinationBigquery.from_dict(
raw_configuration,
response.destination_response.configuration = models.DestinationBigquery(
**raw_configuration,
)
if destination_type == "postgres":
response.destination_response.configuration = models.DestinationPostgres.from_dict(
raw_configuration,
response.destination_response.configuration = models.DestinationPostgres(
**raw_configuration,
)
if destination_type == "duckdb":
response.destination_response.configuration = models.DestinationDuckdb.from_dict(
raw_configuration,
response.destination_response.configuration = models.DestinationDuckdb(
**raw_configuration,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Question: Simplify Destination Type Handling

Currently, the code checks for each destination type individually. Would it be beneficial to use a mapping or factory pattern to reduce repetition and improve scalability? Wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant