-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kunaljubce/add decorators for functions #253
Kunaljubce/add decorators for functions #253
Conversation
…a function and as a decorator
…d func definition of validate_schema()
…r code snippets relevant to quinn without making them visible to git
update column extension function names and desc in readme
I am looking into the pre-commit failures! |
quinn/dataframe_validator.py
Outdated
) -> function: | ||
required_schema: StructType, | ||
ignore_nullable: bool = False, | ||
_df: DataFrame = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using private variables (I mean _df
) naming convention for a public API (I mean function arguments)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duh! Sorry, I meant to change this and completely forgot. Let me fix this.
Meanwhile, can we not have ruff-format
as one of the pre-commit hooks? First - It's experimental and called out so; second - it seems to be reformatting a whole lot of files which are not part of this PR when I run it on local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SemyonSinchenko Renamed _df
to df_to_be_validated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fpgmaas May you take a look, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kunaljubce In the CI/CD pipeline I see that only 1 file is improperly formatted. I see your .pre-commit-config.yaml
contains unstaged changes, my guess that is causing your issue. Why is that changed, and what does the pre-commit-config.yaml
look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fpgmaas The unstaged changes are because I added - id: ruff-format
to my pre-commit-config.yaml
. Here's a screen recording of my changes and pre-commit succeeding before ruff-format
v/s pre-commit failing and fixing 9 files after ruff-format
- https://ufile.io/792yfg0c
I could not upload the video here itself due to size restrictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should edit that file manually, did you try pulling or rebasing on top of planning-1.0-release
?
If I look at your branch, you are still using an outdated version of ruff, see here. That is likely causing the issue you are seeing. In order to prevent that you have to update your branch with the changes on planning-1.0-release
from this repo. e.g.
git fetch upstream
git rebase -i upstream/planning-1.0-release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kunaljubce Ah, that explains! Understandable mistake :) Good you were able to figure it out.
* According to one of the points in the issue mrpowers-io#199 by @MrPowers, this function should never have been created. * This particular commit removes this function and its references from the quinn repo.
Deprecate and remove `exists` and `forall` functions from the codebase. * Remove `exists` and `forall` functions from `quinn/functions.py`. * Remove import statements for `exists` and `forall` from `quinn/__init__.py`. * Remove tests related to `exists` and `forall` functions from `tests/test_functions.py`.
- Update deps - Update Ruff - Corresponding updates of pyproject - Slightly update Makefile - Drop Support of Spark2 - Drop Support of Spark 3.1-3.2 - Minimal python is 3.10 from now - Apply new ruff rules - Apply ruff linter On branch 202-drop-pyspark2 Changes to be committed: modified: .github/workflows/ci.yml modified: Makefile modified: poetry.lock modified: pyproject.toml modified: quinn/append_if_schema_identical.py modified: quinn/dataframe_helpers.py modified: quinn/keyword_finder.py modified: quinn/math.py modified: quinn/schema_helpers.py modified: quinn/split_columns.py modified: quinn/transformations.py
This commit introduces the following changes: * Updates the `ci.yml` file by introducing a new step under the `test` job to perform tests using Spark-Connect. * Creates a shell-script that downloads & installs Spark binaries and then runs the Spark-Connect server. * Creates a pytest module/file that tests a very simple function on Spark-Connect. * Updates the Makefile to add a new step for the Spark-Connect tests.
As per the review comment, the recently added dependencies such as Pyarrow, Pandas etc., are optional and not required for Spark-Classic. Update the pyproject.toml to reflect that and lock the poetry file
apply hotfix update lock file
Proposed changes
Took another stab at #140, as an extension to #144.
Types of changes
What types of changes does your code introduce to quinn?
Put an
x
in the boxes that applyFurther comments: Implementation details for
validate_schema
So we are implementing a decorator factory here so that our function can be used both as a decorator as well as a callable function. In this implementation:
The
validate_schema
function acts as both a decorator factory and a decorator. It takesrequired_schema
,ignore_nullable
, and an optional_df
argument._df
is None, it means the function is being used as a decorator factory, and it returns the decorator decorator._df
is not None, it means the function is being called directly with a DataFrame, and it applies the decorator to_df
immediately.When
validate_schema
is called directly with a DataFrame, the validation logic gets executed by wrapping the DataFrame in a lambda function and immediately calling the decorator.