Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kunaljubce/add decorators for functions #253

Conversation

kunaljubce
Copy link
Contributor

@kunaljubce kunaljubce commented Jul 15, 2024

Proposed changes

Took another stab at #140, as an extension to #144.

Types of changes

What types of changes does your code introduce to quinn?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Further comments: Implementation details for validate_schema

So we are implementing a decorator factory here so that our function can be used both as a decorator as well as a callable function. In this implementation:

The validate_schema function acts as both a decorator factory and a decorator. It takes required_schema, ignore_nullable, and an optional _df argument.

  • If _df is None, it means the function is being used as a decorator factory, and it returns the decorator decorator.
  • If _df is not None, it means the function is being called directly with a DataFrame, and it applies the decorator to _df immediately.

When validate_schema is called directly with a DataFrame, the validation logic gets executed by wrapping the DataFrame in a lambda function and immediately calling the decorator.

@kunaljubce kunaljubce mentioned this pull request Jul 15, 2024
4 tasks
@kunaljubce
Copy link
Contributor Author

I am looking into the pre-commit failures!

) -> function:
required_schema: StructType,
ignore_nullable: bool = False,
_df: DataFrame = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using private variables (I mean _df) naming convention for a public API (I mean function arguments)?

Copy link
Contributor Author

@kunaljubce kunaljubce Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duh! Sorry, I meant to change this and completely forgot. Let me fix this.

Meanwhile, can we not have ruff-format as one of the pre-commit hooks? First - It's experimental and called out so; second - it seems to be reformatting a whole lot of files which are not part of this PR when I run it on local.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SemyonSinchenko Renamed _df to df_to_be_validated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fpgmaas May you take a look, please?

Copy link

@fpgmaas fpgmaas Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunaljubce In the CI/CD pipeline I see that only 1 file is improperly formatted. I see your .pre-commit-config.yaml contains unstaged changes, my guess that is causing your issue. Why is that changed, and what does the pre-commit-config.yaml look like?

Copy link
Contributor Author

@kunaljubce kunaljubce Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fpgmaas The unstaged changes are because I added - id: ruff-format to my pre-commit-config.yaml. Here's a screen recording of my changes and pre-commit succeeding before ruff-format v/s pre-commit failing and fixing 9 files after ruff-format - https://ufile.io/792yfg0c

I could not upload the video here itself due to size restrictions.

Copy link

@fpgmaas fpgmaas Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should edit that file manually, did you try pulling or rebasing on top of planning-1.0-release?

If I look at your branch, you are still using an outdated version of ruff, see here. That is likely causing the issue you are seeing. In order to prevent that you have to update your branch with the changes on planning-1.0-release from this repo. e.g.

git fetch upstream
git rebase -i upstream/planning-1.0-release

Copy link
Contributor Author

@kunaljubce kunaljubce Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fpgmaas That was my mistake. Out of instinct, I rebased this branch with main and now retrying to rebase it with planning-1.0-release has corrupted this PR. 😭

Closing this and reopening a fresh PR - #255.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunaljubce Ah, that explains! Understandable mistake :) Good you were able to figure it out.

kunaljubce and others added 12 commits July 16, 2024 13:53
* According to one of the points in the issue mrpowers-io#199 by @MrPowers, this function should never have been created.
* This particular commit removes this function and its references from the quinn repo.
Deprecate and remove `exists` and `forall` functions from the codebase.
* Remove `exists` and `forall` functions from `quinn/functions.py`.
* Remove import statements for `exists` and `forall` from `quinn/__init__.py`.
* Remove tests related to `exists` and `forall` functions from `tests/test_functions.py`.
- Update deps
- Update Ruff
- Corresponding updates of pyproject
- Slightly update Makefile
- Drop Support of Spark2
- Drop Support of Spark 3.1-3.2
- Minimal python is 3.10 from now
- Apply new ruff rules
- Apply ruff linter

 On branch 202-drop-pyspark2
 Changes to be committed:
	modified:   .github/workflows/ci.yml
	modified:   Makefile
	modified:   poetry.lock
	modified:   pyproject.toml
	modified:   quinn/append_if_schema_identical.py
	modified:   quinn/dataframe_helpers.py
	modified:   quinn/keyword_finder.py
	modified:   quinn/math.py
	modified:   quinn/schema_helpers.py
	modified:   quinn/split_columns.py
	modified:   quinn/transformations.py
This commit introduces the following changes:
* Updates the `ci.yml` file by introducing a new step under the `test` job to perform tests using Spark-Connect.
* Creates a shell-script that downloads & installs Spark binaries and then runs the Spark-Connect server.
* Creates a pytest module/file that tests a very simple function on Spark-Connect.
* Updates the Makefile to add a new step for the Spark-Connect tests.
@kunaljubce kunaljubce closed this Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants