Skip to content

Commit

Permalink
Deduplicate e2e scenarios
Browse files Browse the repository at this point in the history
  • Loading branch information
AnkitCLI committed Nov 22, 2024
1 parent 716daea commit ae18efe
Show file tree
Hide file tree
Showing 6 changed files with 156 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,27 @@ Feature:Deduplicate - Verify Deduplicate Plugin Error scenarios
Then Select Deduplicate plugin property: filterOperation field function with value: "deduplicateFilterFunctionMax"
Then Click on the Validate button
Then Verify that the Plugin Property: "filterOperation" is displaying an in-line error message: "errorMessageDeduplicateInvalidFunction"

@GCS_DEDUPLICATE_TEST
Scenario:Verify Deduplicate plugin error for FilterOperation field with invalid field name
Given Open Datafusion Project to configure pipeline
When Select plugin: "File" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Analytics"
When Select plugin: "Deduplicate" from the plugins list as: "Analytics"
Then Connect plugins: "File" and "Deduplicate" to establish connection
Then Navigate to the properties page of plugin: "File"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "gcsDeduplicateTest"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Click plugin property: "skipHeader"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "deduplicateOutputSchema"
Then Validate "File" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "Deduplicate"
Then Select dropdown plugin property: "uniqueFields" with option value: "fname"
Then Press ESC key to close the unique fields dropdown
Then Enter Deduplicate plugin property: filterOperation field name with value: "deduplicateInvalidFieldName"
Then Select Deduplicate plugin property: filterOperation field function with value: "deduplicateFilterFunctionMax"
Then Click on the Validate button
Then Verify that the Plugin Property: "filterOperation" is displaying an in-line error message: "errorMessageDeduplicateInvalidFieldName"
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
@Deduplicate
Feature:Deduplicate - Verify Deduplicate Plugin Runtime Error Scenarios

@GCS_DEDUPLICATE_TEST @FILE_SINK_TEST
Scenario:Verify the Pipeline Fails When the Unique Field Column is Empty
Given Open Datafusion Project to configure pipeline
When Select plugin: "File" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Analytics"
When Select plugin: "Deduplicate" from the plugins list as: "Analytics"
Then Connect plugins: "File" and "Deduplicate" to establish connection
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "File" from the plugins list as: "Sink"
Then Connect plugins: "Deduplicate" and "File2" to establish connection
Then Navigate to the properties page of plugin: "File"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "gcsDeduplicateTest"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Click plugin property: "skipHeader"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "deduplicateOutputSchema"
Then Validate "File" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "Deduplicate"
Then Validate "Deduplicate" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "File2"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "fileSinkTargetBucket"
Then Replace input plugin property: "pathSuffix" with value: "yyyy-MM-dd-HH-mm-ss"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Validate "File" plugin properties
Then Close the Plugin Properties page
Then Save the pipeline
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Failed"

@GCS_DEDUPLICATE_TEST @FILE_SINK_TEST
Scenario: To verify that pipeline fails from File to File using Deduplicate plugin with invalid partition and invalid unique field as macro argument
Given Open Datafusion Project to configure pipeline
When Select plugin: "File" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Analytics"
When Select plugin: "Deduplicate" from the plugins list as: "Analytics"
Then Connect plugins: "File" and "Deduplicate" to establish connection
Then Navigate to the properties page of plugin: "File"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "gcsDeduplicateTest"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Click plugin property: "skipHeader"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "deduplicateOutputSchema"
Then Validate "File" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "Deduplicate"
Then Click on the Macro button of Property: "deduplicateUniqueFields" and set the value to: "deduplicateUniqueFields"
Then Click on the Macro button of Property: "deduplicateNumPartitions" and set the value to: "deduplicateNumberOfPartitions"
Then Validate "Deduplicate" plugin properties
Then Close the Plugin Properties page
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "File" from the plugins list as: "Sink"
Then Connect plugins: "Deduplicate" and "File2" to establish connection
Then Navigate to the properties page of plugin: "File2"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "fileSinkTargetBucket"
Then Replace input plugin property: "pathSuffix" with value: "yyyy-MM-dd-HH-mm-ss"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Validate "File" plugin properties
Then Close the Plugin Properties page
Then Save the pipeline
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "invalidUniqueField" for key "deduplicateUniqueFields"
Then Enter runtime argument value "deduplicateInvalidNumberOfPartitions" for key "deduplicateNumberOfPartitions"
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Failed"
Original file line number Diff line number Diff line change
Expand Up @@ -260,3 +260,51 @@ Feature: Deduplicate - Verification of Deduplicate pipeline with File as source
Then Close the pipeline logs
Then Validate OUT record count of deduplicate is equal to IN record count of sink
Then Validate output file generated by file sink plugin "fileSinkTargetBucket" is equal to expected output file "deduplicateTest6OutputFile"


@GCS_DEDUPLICATE_TEST @FILE_SINK_TEST @Deduplicate_Required @ITN_TEST
Scenario: To verify data transfer from File source to File sink using Deduplicate Plugin with only Unique field
Given Open Datafusion Project to configure pipeline
When Select plugin: "File" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Analytics"
When Select plugin: "Deduplicate" from the plugins list as: "Analytics"
Then Connect plugins: "File" and "Deduplicate" to establish connection
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "File" from the plugins list as: "Sink"
Then Connect plugins: "Deduplicate" and "File2" to establish connection
Then Navigate to the properties page of plugin: "File"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "gcsDeduplicateTest"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Click plugin property: "skipHeader"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "deduplicateOutputSchema"
Then Validate "File" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "Deduplicate"
Then Select dropdown plugin property: "uniqueFields" with option value: "fname"
Then Press ESC key to close the unique fields dropdown
Then Validate "Deduplicate" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "File2"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "fileSinkTargetBucket"
Then Replace input plugin property: "pathSuffix" with value: "yyyy-MM-dd-HH-mm-ss"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Validate "File" plugin properties
Then Close the Plugin Properties page
Then Save the pipeline
Then Preview and run the pipeline
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Close the preview
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Close the pipeline logs
Then Validate OUT record count of deduplicate is equal to IN record count of sink
Then Validate output file generated by file sink plugin "fileSinkTargetBucket" is equal to expected output file "deduplicateTest7OutputFile"
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,4 @@ errorMessageJoinerBasicJoinCondition=Join keys cannot be empty
errorMessageJoinerAdvancedJoinCondition=A join condition must be specified.
errorMessageJoinerInputLoadMemory=Advanced outer joins must specify an input to load in memory.
errorMessageJoinerAdvancedJoinConditionType=Advanced join conditions can only be used when there are two inputs.
errorMessageDeduplicateInvalidFieldName=Invalid filter MAX(abcd): Field 'abcd' does not exist in input schema
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,7 @@ deduplicateFilterFunctionLast=Last
deduplicateFilterFunctionFirst=First
deduplicateFieldName=fname
deduplicateFilterOperation=cost:Max
deduplicateInvalidFieldName=abcd
deduplicateNumberOfPartitions=2
deduplicateInvalidNumberOfPartitions=@#$%
deduplicateFilterFieldName=cost
Expand All @@ -191,6 +192,7 @@ deduplicateTest3OutputFile=e2e-tests/expected_outputs/CSV_DEDUPLICATE_TEST3_Outp
deduplicateMacroOutputFile=e2e-tests/expected_outputs/CSV_DEDUPLICATE_TEST4_Output.csv
deduplicateTest5OutputFile=e2e-tests/expected_outputs/CSV_DEDUPLICATE_TEST5_Output.csv
deduplicateTest6OutputFile=e2e-tests/expected_outputs/CSV_DEDUPLICATE_TEST6_Output.csv
deduplicateTest7OutputFile=e2e-tests/expected_outputs/CSV_DEDUPLICATE_TEST7_Output.csv
## Deduplicate-PLUGIN-PROPERTIES-END

## GROUPBY-PLUGIN-PROPERTIES-START
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
alice,smith,1.5,34567
bob,smith,50.23,12345

0 comments on commit ae18efe

Please sign in to comment.