Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat[DatabricksConnector]: connector to connect to Databricks on Cloud #580
feat[DatabricksConnector]: connector to connect to Databricks on Cloud #580
Changes from 1 commit
a7da1a5
70c71ed
358c612
6ef5416
637348a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new hunk simplifies the SQL query for row count, which is a good change. However, it assumes that the table name in the configuration does not contain any special characters or reserved words that might need to be escaped. This could lead to SQL injection vulnerabilities if the table name is user-supplied and not properly sanitized. Consider using bind parameters as in the old hunk to mitigate this risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this could create a potential attack. We are using bindparams to validate the table name. Also the suggestion from @coderabbitai might work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gventuri: I'm glad you agree with the suggestion. Using bind parameters is indeed a good practice to prevent SQL injection attacks. It ensures that the table name is properly escaped, preventing malicious input from being executed as part of the SQL query. Here's how you can modify your code:
This way, SQLAlchemy will handle the proper escaping of the table name, making your code safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem: You're correct. My apologies for the confusion. SQLAlchemy's
bindparams
is indeed used to bind parameters, not table names or column names.To safely include a table name in a query, we need to use SQLAlchemy's
Table
construct which provides proper quoting around identifiers. Here's how you can modify your code:This way, SQLAlchemy will handle the proper quoting of the table name, making your code safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
columns_count
method now uses thelen
function on the columns returned by thehead
method instead of running a SQL query. This change could improve performance by reducing the number of database queries. However, make sure that thehead
method always returns the correct and complete set of columns.Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method to test
columns_count
property has been changed. Previously, it was directly fetching the count from the database, but now it's using thehead
method to get a dataframe and then counting the columns. This change might affect the performance if the table has a large number of rows. Consider reverting back to the old method if performance is a concern.