Kunaljubce/add decorators for functions #253

kunaljubce · 2024-07-15T17:23:12Z

Proposed changes

Took another stab at #140, as an extension to #144.

Types of changes

What types of changes does your code introduce to quinn?
Put an x in the boxes that apply

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation Update (if none of the other choices apply)

Further comments: Implementation details for `validate_schema`

So we are implementing a decorator factory here so that our function can be used both as a decorator as well as a callable function. In this implementation:

The validate_schema function acts as both a decorator factory and a decorator. It takes required_schema, ignore_nullable, and an optional _df argument.

If _df is None, it means the function is being used as a decorator factory, and it returns the decorator decorator.
If _df is not None, it means the function is being called directly with a DataFrame, and it applies the decorator to _df immediately.

When validate_schema is called directly with a DataFrame, the validation logic gets executed by wrapping the DataFrame in a lambda function and immediately calling the decorator.

…a function and as a decorator

…d func definition of validate_schema()

…d schema

…r code snippets relevant to quinn without making them visible to git

update column extension function names and desc in readme

kunaljubce · 2024-07-15T17:48:30Z

I am looking into the pre-commit failures!

SemyonSinchenko · 2024-07-16T07:59:05Z

quinn/dataframe_validator.py

-) -> function:
+    required_schema: StructType,
+    ignore_nullable: bool = False,
+    _df: DataFrame = None,


Why are we using private variables (I mean _df) naming convention for a public API (I mean function arguments)?

Duh! Sorry, I meant to change this and completely forgot. Let me fix this.

Meanwhile, can we not have ruff-format as one of the pre-commit hooks? First - It's experimental and called out so; second - it seems to be reformatting a whole lot of files which are not part of this PR when I run it on local.

@SemyonSinchenko Renamed _df to df_to_be_validated

@fpgmaas May you take a look, please?

@kunaljubce In the CI/CD pipeline I see that only 1 file is improperly formatted. I see your .pre-commit-config.yaml contains unstaged changes, my guess that is causing your issue. Why is that changed, and what does the pre-commit-config.yaml look like?

@fpgmaas The unstaged changes are because I added - id: ruff-format to my pre-commit-config.yaml. Here's a screen recording of my changes and pre-commit succeeding before ruff-format v/s pre-commit failing and fixing 9 files after ruff-format - https://ufile.io/792yfg0c

I could not upload the video here itself due to size restrictions.

I don't think you should edit that file manually, did you try pulling or rebasing on top of planning-1.0-release?

If I look at your branch, you are still using an outdated version of ruff, see here. That is likely causing the issue you are seeing. In order to prevent that you have to update your branch with the changes on planning-1.0-release from this repo. e.g.

git fetch upstream git rebase -i upstream/planning-1.0-release

@fpgmaas That was my mistake. Out of instinct, I rebased this branch with main and now retrying to rebase it with planning-1.0-release has corrupted this PR. 😭

Closing this and reopening a fresh PR - #255.

@kunaljubce Ah, that explains! Understandable mistake :) Good you were able to figure it out.

…nges

@MrPowers

* According to one of the points in the issue mrpowers-io#199 by @MrPowers, this function should never have been created. * This particular commit removes this function and its references from the quinn repo.

Deprecate and remove `exists` and `forall` functions from the codebase. * Remove `exists` and `forall` functions from `quinn/functions.py`. * Remove import statements for `exists` and `forall` from `quinn/__init__.py`. * Remove tests related to `exists` and `forall` functions from `tests/test_functions.py`.

- Update deps - Update Ruff - Corresponding updates of pyproject - Slightly update Makefile - Drop Support of Spark2 - Drop Support of Spark 3.1-3.2 - Minimal python is 3.10 from now - Apply new ruff rules - Apply ruff linter On branch 202-drop-pyspark2 Changes to be committed: modified: .github/workflows/ci.yml modified: Makefile modified: poetry.lock modified: pyproject.toml modified: quinn/append_if_schema_identical.py modified: quinn/dataframe_helpers.py modified: quinn/keyword_finder.py modified: quinn/math.py modified: quinn/schema_helpers.py modified: quinn/split_columns.py modified: quinn/transformations.py

This commit introduces the following changes: * Updates the `ci.yml` file by introducing a new step under the `test` job to perform tests using Spark-Connect. * Creates a shell-script that downloads & installs Spark binaries and then runs the Spark-Connect server. * Creates a pytest module/file that tests a very simple function on Spark-Connect. * Updates the Makefile to add a new step for the Spark-Connect tests.

…n a condition

…dressed later

As per the review comment, the recently added dependencies such as Pyarrow, Pandas etc., are optional and not required for Spark-Classic. Update the pyproject.toml to reflect that and lock the poetry file

…e deps

enable UP007

apply hotfix update lock file

kunaljubce and others added 13 commits March 29, 2024 16:14

Adding decoratory factory to validate_schema to make it work both as …

e642b86

…a function and as a decorator

Fix to execute the validation when func is called and replaced the ol…

3188b54

…d func definition of validate_schema()

Changes to tests to conform to new validate_schema definition

a353a69

Updating README description for validate_schema

85cc47a

README fix

e38ba8e

Improved documentation in README

e58ccdd

Added success msg to be printed in case df schema matches the require…

fe843b5

…d schema

Added a uncommitted directory for developers to store their scripts o…

a964e16

…r code snippets relevant to quinn without making them visible to git

Minor README documentation update

c856f79

Moved uncommitted folder

151dcc2

Removing uncommitted dir

b9a6d08

update column extension function names and desc in readme

7c8ae16

Merge pull request mrpowers-io#240 from fatemetardasti96/main

e9f8948

update column extension function names and desc in readme

kunaljubce mentioned this pull request Jul 15, 2024

Add decorators for functions #221

Closed

4 tasks

kunaljubce added 2 commits July 16, 2024 13:24

Static type error fixes

21d87e5

Resolved merge conflicts

7ab9a42

SemyonSinchenko reviewed Jul 16, 2024

View reviewed changes

kunaljubce and others added 12 commits July 16, 2024 13:53

Changed _df param name to df_to_be_validated and associated tests cha…

1d33b91

…nges

README changes for _df change

2fca007

Remove the print_athena_create_table function

c4cc8af

* According to one of the points in the issue mrpowers-io#199 by @MrPowers, this function should never have been created. * This particular commit removes this function and its references from the quinn repo.

Remove imported and unused Callable module to avoid ruff lint failure

0823158

Update linting CI

60c7fb7

Fix typo in CI

5b545ef

Fix failed tests

e505a21

Updates from review

7802545

Address the fixtures issue in the test file

dbd3f66

nijanthanvijayakumar and others added 18 commits July 16, 2024 18:23

Update the CI workflow to initiate the sparkconnect test on the 1.0

3ef4219

Update the poetry & pyproject with the dependencies for Spark-Connect

b1573b4

Update the CI workflow to run Spark-Connect tests only for v3.4+

fc85013

Update the script to check if Spark-Connect server is running or not

3e8776a

Remove the spark-connect server run check

8f76b0c

Update workflows & pytest to choose the Sparksession instance based o…

0fb197e

…n a condition

Add a TODO statement so that the spark-connect server check can be ad…

b413920

…dressed later

Remove the 1.0 planning branch for the CI file

f3cf717

Attribute the original script that inspired this

0ab7493

Mark recently added deps as optional for Spark-Classic

3c669fc

As per the review comment, the recently added dependencies such as Pyarrow, Pandas etc., are optional and not required for Spark-Classic. Update the pyproject.toml to reflect that and lock the poetry file

Rename the spark-classic to connect & update makefile to install thes…

f62185f

…e deps

update column extension function names and desc in readme

93f39d1

enable UP007

add acknowledgement

b9926fd

Fix the linting issues in the linting CI workflow

74545c3

remove .python-version

943918a

apply hotfix

0a71190

apply hotfix update lock file

run lint also on pr

ad45d31

update column extension function names and desc in readme

f04ab78

kunaljubce closed this Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kunaljubce/add decorators for functions #253

Kunaljubce/add decorators for functions #253

kunaljubce commented Jul 15, 2024 •

edited

Loading

kunaljubce commented Jul 15, 2024

SemyonSinchenko Jul 16, 2024

kunaljubce Jul 16, 2024 •

edited

Loading

kunaljubce Jul 16, 2024

SemyonSinchenko Jul 16, 2024

fpgmaas Jul 16, 2024 •

edited

Loading

kunaljubce Jul 16, 2024 •

edited

Loading

fpgmaas Jul 16, 2024 •

edited

Loading

kunaljubce Jul 16, 2024 •

edited

Loading

fpgmaas Jul 16, 2024

Kunaljubce/add decorators for functions #253

Kunaljubce/add decorators for functions #253

Conversation

kunaljubce commented Jul 15, 2024 • edited Loading

Proposed changes

Types of changes

Further comments: Implementation details for validate_schema

kunaljubce commented Jul 15, 2024

SemyonSinchenko Jul 16, 2024

Choose a reason for hiding this comment

kunaljubce Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

kunaljubce Jul 16, 2024

Choose a reason for hiding this comment

SemyonSinchenko Jul 16, 2024

Choose a reason for hiding this comment

fpgmaas Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

kunaljubce Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

fpgmaas Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

kunaljubce Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

fpgmaas Jul 16, 2024

Choose a reason for hiding this comment

kunaljubce commented Jul 15, 2024 •

edited

Loading

Further comments: Implementation details for `validate_schema`

kunaljubce Jul 16, 2024 •

edited

Loading

fpgmaas Jul 16, 2024 •

edited

Loading

kunaljubce Jul 16, 2024 •

edited

Loading

fpgmaas Jul 16, 2024 •

edited

Loading

kunaljubce Jul 16, 2024 •

edited

Loading