-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add previously experimental Cloud "deploy" functionality (DRAFT - DO NOT MERGE) #419
base: main
Are you sure you want to change the base?
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThe pull request introduces significant enhancements to the API utility functions within the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
What do you think about these suggestions? 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used🔇 Additional comments (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Outside diff range and nitpick comments (11)
airbyte/_util/text_util.py (3)
9-11
: Thegenerate_ulid()
function looks good, but how about enhancing the docstring?The function is well-implemented, but the docstring could be more informative. What do you think about expanding it to explain what a ULID is and its characteristics? Something like this, perhaps?
def generate_ulid() -> str: """Generate a new ULID (Universally Unique Lexicographically Sortable Identifier). A ULID is a 128-bit identifier that combines a timestamp with randomly generated bits, resulting in a sortable, unique identifier that's more compact than a UUID. Returns: str: A 26-character string representation of the ULID. """ return str(ulid.ULID())This would provide more context for users of the function. WDYT?
14-22
: Great implementation ofgenerate_random_suffix()
! How about a tiny optimization?The function looks solid, and I love the detailed docstring explaining its behavior and limitations. The use of ULID for a sortable suffix is clever!
Just a small suggestion: we could simplify the implementation slightly by using string slicing in a single line. What do you think about this?
def generate_random_suffix() -> str: """Generate a random suffix for use in temporary names. By default, this function generates a ULID and returns a 9-character string which will be monotonically sortable. It is not guaranteed to be unique but is sufficient for small-scale and medium-scale use cases. """ return generate_ulid()[:6] + generate_ulid()[-3:]This achieves the same result but in a slightly more concise way. WDYT?
1-22
: The overall structure of the file looks great! Any plans for more text utilities?I really like how clean and focused this file is. The two functions are well-organized and clearly related. Great job on keeping it simple and to the point!
Just curious, do you have any plans to add more text utility functions in the future? This file seems like a great place for them. Maybe something for text sanitization, truncation, or other common text operations you might need across the project? No pressure, just thinking ahead!
Keep up the awesome work! 😊
airbyte/exceptions.py (2)
489-494
: LGTM! New exception class looks good.The new
AirbyteDuplicateResourcesError
class is well-structured and follows the established pattern. It provides a clear way to handle duplicate resource scenarios.Quick thought: Would it be helpful to add a default value for the
guidance
attribute, similar to other exception classes in this file? Something like "Please choose a unique name for the resource." wdyt?
497-497
: Hmm, duplicate "Custom Warnings" sections?I noticed we now have two "Custom Warnings" sections in the file. The new one here and another at the end of the file. Should we consider consolidating these for better organization? Maybe move
AirbyteMultipleResourcesError
to the existing section at the end? What do you think?airbyte/sources/base.py (1)
54-54
: Simplifiedconnector_type
attribute, but lost type hinting. Thoughts on keeping both?The
connector_type
attribute has been simplified from a type-annotated literal to a simple string assignment. While this makes the code cleaner, we've lost the type hinting.What do you think about keeping the type hint for better IDE support and static type checking? We could do something like:
connector_type: Literal["source"] = "source"This way, we maintain the simplicity while preserving the type information. WDYT?
airbyte/cloud/connectors.py (1)
39-39
: Consider updating docstring to generalize connector type—wdyt?Since the
connector_url
method applies to both sources and destinations, would it make sense to update the docstring to "Get the URL of the connector"? Let me know your thoughts.airbyte/cloud/connections.py (1)
209-219
: Enhance the method docstring to reflect cascading deletions.Since
cascade_delete_source
andcascade_delete_destination
allow deleting the source and destination, should we update the method description to reflect this additional functionality? Wdyt?airbyte/_util/api_util.py (3)
104-105
: Suggestion: Clarify Error MessageThe error message could be more precise. Maybe rephrase it to "You can provide either 'name' or 'name_filter', but not both." to enhance clarity. Wdyt?
143-143
: Update Docstring inlist_sources
The docstring currently says "Get a connection." Should we update it to "List sources." to reflect the function's purpose? Wdyt?
179-179
: Update Docstring inlist_destinations
The docstring states "Get a connection." Consider updating it to "List destinations." to accurately describe the function. Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
- airbyte/_util/api_util.py (17 hunks)
- airbyte/_util/text_util.py (1 hunks)
- airbyte/caches/base.py (0 hunks)
- airbyte/cloud/connections.py (5 hunks)
- airbyte/cloud/connectors.py (1 hunks)
- airbyte/cloud/experimental.py (0 hunks)
- airbyte/cloud/workspaces.py (6 hunks)
- airbyte/exceptions.py (1 hunks)
- airbyte/sources/base.py (1 hunks)
💤 Files with no reviewable changes (2)
- airbyte/caches/base.py
- airbyte/cloud/experimental.py
🧰 Additional context used
🔇 Additional comments (16)
airbyte/_util/text_util.py (1)
1-7
: LGTM! The file header and imports look good.The copyright notice, module docstring, and imports are all appropriate for the functionality provided. Nice job using
from __future__ import annotations
for better type hinting support! 👍airbyte/exceptions.py (1)
489-489
: Thumbs up for the indentation fix!Nice catch on aligning the
resource_name_or_id
attribute. It's these little details that keep our codebase tidy and consistent. 👍airbyte/sources/base.py (2)
54-54
: Overall, good cleanup! Any other simplifications in mind?These changes look good overall! You've simplified the
connector_type
attribute and removed some unused private attributes. This kind of cleanup helps keep the codebase lean and maintainable.Do you have any other simplifications or cleanups in mind for this class? It might be worth taking a look at other attributes or methods that could benefit from similar treatment.
54-54
: Removed private attributes. Can we double-check their usage?I noticed that three private attributes (
_deployed_api_root
,_deployed_workspace_id
, and_deployed_source_id
) have been removed. This is great for code cleanliness if they were unused!Just to be on the safe side, could we run a quick search through the codebase to ensure these weren't used anywhere else? Maybe something like:
This would help us catch any lingering references. What do you think?
airbyte/cloud/connectors.py (1)
40-40
: Verify URL construction for correctness—wdyt?Just to double-check, the
connector_url
method constructs the URL by appending{self.connector_type}s
, resulting in paths like.../sources/{connector_id}
or.../destinations/{connector_id}
. Does this match the expected URL structure in Airbyte Cloud? Let me know if this aligns with the API endpoints.airbyte/cloud/connections.py (5)
9-9
: ImportingCloudDestination
andCloudSource
looks good.This addition enables the usage of these classes later in the code.
105-105
: Good handling ofNone
forstreams
.Using
or []
ensures that we don't encounter aTypeError
whenself._connection_info.configurations.streams
isNone
. Nice work!
113-113
: Properly handlingNone
forprefix
.Returning an empty string when
self._connection_info.prefix
isNone
prevents potential issues when using this value. Looks good!
174-176
: Adding_latest_job_info
toSyncResult
instantiation makes sense.This provides
SyncResult
with access to the latest job information, which can be useful for further processing. Good addition!
223-227
: Confirming safe deletion when cascading is enabled.When
cascade_delete_source
orcascade_delete_destination
areTrue
, the source or destination will be permanently deleted. Should we add safeguards or confirmations to prevent accidental deletions of these resources? Wdyt?airbyte/_util/api_util.py (6)
17-17
: Approved: Enhancement of Type HintsThe addition of
TYPE_CHECKING
andAny
from thetyping
module improves type hinting and code clarity.
27-27
: Approved: ImportingPyAirbyteInputError
Including
PyAirbyteInputError
in the exceptions enhances error handling for input validation.
31-33
: Approved: Conditional Import ofCallable
Using
if TYPE_CHECKING
to conditionally importCallable
optimizes runtime performance by avoiding unnecessary imports.
235-239
: Approved: Improved Exception Handling inget_connection
The updated exception handling provides more informative error messages, enhancing debugging and user experience.
563-569
: Approved: Refactoring Stream ConfigurationsWrapping
stream_configurations
inmodels.StreamConfigurations
ensures compatibility with the API requirements.
390-397
: Approved: Enhanced Error Handling inget_source
The updated error handling in
get_source
correctly raises anAirbyteMissingResourceError
when a source is not found, improving reliability.
def connector_type(self) -> Literal["source", "destination"]: | ||
"""Get the type of the connector.""" | ||
return "source" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider using class variables for connector_type
—wdyt?
Since connector_type
returns a constant value in both CloudSource
and CloudDestination
, perhaps defining it as a class variable instead of a property could simplify the code. What do you think?
Here's how it might look:
In CloudSource
:
class CloudSource(CloudConnector):
+ connector_type: Literal["source", "destination"] = "source"
@property
def source_id(self) -> str:
"""Get the ID of the source.
This is an alias for `connector_id`.
"""
return self.connector_id
- @property
- def connector_type(self) -> Literal["source", "destination"]:
- """Get the type of the connector."""
- return "source"
And in CloudDestination
:
class CloudDestination(CloudConnector):
+ connector_type: Literal["source", "destination"] = "destination"
@property
def destination_id(self) -> str:
"""Get the ID of the destination.
This is an alias for `connector_id`.
"""
return self.connector_id
- @property
- def connector_type(self) -> Literal["source", "destination"]:
- """Get the type of the connector."""
- return "destination"
Also applies to: 79-81
airbyte/cloud/connections.py
Outdated
@property | ||
def source(self) -> CloudSource: | ||
"""Get the source object.""" | ||
return CloudSource( | ||
workspace=self.workspace, | ||
connector_id=self.source_id, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider caching the CloudSource
instance.
Currently, each access to the source
property creates a new CloudSource
object. Should we cache this instance to avoid redundant object creation and improve performance? Wdyt?
airbyte/cloud/connections.py
Outdated
@@ -79,21 +88,29 @@ | |||
|
|||
return cast(str, self._destination_id) | |||
|
|||
@property | |||
def destination(self) -> CloudDestination: | |||
"""Get the source object.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update docstring to refer to 'destination' instead of 'source'.
The docstring says "Get the source object," but this property returns a CloudDestination
. Should we update it to "Get the destination object"? Wdyt?
airbyte/cloud/workspaces.py
Outdated
existing = self.list_sources(name=name) | ||
if existing: | ||
raise exc.AirbyteDuplicateResourcesError( | ||
resource_type="destination", | ||
resource_name=name, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible inconsistency in the error message for duplicate source names
In the deploy_source
method, when raising AirbyteDuplicateResourcesError
, the resource_type
is set to "destination"
:
raise exc.AirbyteDuplicateResourcesError(
resource_type="destination",
resource_name=name,
)
Since we're dealing with sources here, should resource_type
be "source"
instead? Wdyt?
airbyte/cloud/workspaces.py
Outdated
if not isinstance(source, str | CloudSource): | ||
raise exc.PyAirbyteInputError( | ||
message="Invalid source type.", | ||
input_value=type(source).__name__, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use tuple for multiple types in isinstance
check
In the permanently_delete_source
method, the isinstance
check uses the bitwise |
operator:
if not isinstance(source, str | CloudSource):
# ...
However, isinstance
expects a type or a tuple of types. Should this be updated to use a tuple instead?
if not isinstance(source, (str, CloudSource)):
# ...
Wdyt?
airbyte/cloud/workspaces.py
Outdated
if not isinstance(destination, str | CloudDestination): | ||
raise exc.PyAirbyteInputError( | ||
message="Invalid destination type.", | ||
input_value=type(destination).__name__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use tuple for multiple types in isinstance
check
Similarly, in the permanently_delete_destination
method, the isinstance
check uses str | CloudDestination
:
if not isinstance(destination, str | CloudDestination):
# ...
Should we change this to a tuple for proper type checking?
if not isinstance(destination, (str, CloudDestination)):
# ...
Wdyt?
airbyte/cloud/workspaces.py
Outdated
if not selected_streams: | ||
raise exc.PyAirbyteInputError( | ||
guidance="You must provide `selected_streams` when creating a connection." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making selected_streams
a required parameter
In the deploy_connection
method, selected_streams
has a default value of None
, but the code raises an error if it's not provided:
if not selected_streams:
raise exc.PyAirbyteInputError(
guidance="You must provide `selected_streams` when creating a connection."
)
Maybe it would be clearer to make selected_streams
a required parameter without a default value to reflect that it's mandatory. Wdyt?
airbyte/cloud/workspaces.py
Outdated
if not selected_streams: | ||
raise exc.PyAirbyteInputError( | ||
guidance="You must provide either a destination ID or a cache object." | ||
guidance=( | ||
"You must provide `selected_streams` when creating a connection " | ||
"from an existing destination." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Redundant check for selected_streams
There seems to be a second check for selected_streams
being provided:
if not selected_streams:
raise exc.PyAirbyteInputError(
guidance=(
"You must provide `selected_streams` when creating a connection "
"from an existing destination."
)
)
Since we already have this check earlier in the method, perhaps this block can be removed to avoid redundancy? Wdyt?
airbyte/_util/api_util.py
Outdated
if status_ok(response.status_code) and response.destination_response: | ||
# TODO: This is a temporary workaround to resolve an issue where | ||
# the destination API response is of the wrong type. | ||
# https://github.com/airbytehq/pyairbyte/issues/320 | ||
raw_response: dict[str, Any] = json.loads(response.raw_response.text) | ||
raw_configuration: dict[str, Any] = raw_response["configuration"] | ||
|
||
destination_type = raw_response.get("destinationType") | ||
if destination_type == "snowflake": | ||
response.destination_response.configuration = models.DestinationSnowflake.from_dict( | ||
raw_configuration, | ||
response.destination_response.configuration = models.DestinationSnowflake( | ||
**raw_configuration, | ||
) | ||
if destination_type == "bigquery": | ||
response.destination_response.configuration = models.DestinationBigquery.from_dict( | ||
raw_configuration, | ||
response.destination_response.configuration = models.DestinationBigquery( | ||
**raw_configuration, | ||
) | ||
if destination_type == "postgres": | ||
response.destination_response.configuration = models.DestinationPostgres.from_dict( | ||
raw_configuration, | ||
response.destination_response.configuration = models.DestinationPostgres( | ||
**raw_configuration, | ||
) | ||
if destination_type == "duckdb": | ||
response.destination_response.configuration = models.DestinationDuckdb.from_dict( | ||
raw_configuration, | ||
response.destination_response.configuration = models.DestinationDuckdb( | ||
**raw_configuration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Question: Simplify Destination Type Handling
Currently, the code checks for each destination type individually. Would it be beneficial to use a mapping or factory pattern to reduce repetition and improve scalability? Wdyt?
Summary by CodeRabbit
Release Notes
New Features
CloudConnection
with new properties for better resource management.Bug Fixes
Refactor
CloudWorkspace
andCacheBase
classes.Source
class structure.Documentation