Skip to content

Commit

Permalink
Add ETL pipeline testing QS (e2e-python) (#985)
Browse files Browse the repository at this point in the history
  • Loading branch information
cristianhkr authored Feb 13, 2024
1 parent ce9e879 commit 023911a
Show file tree
Hide file tree
Showing 78 changed files with 4,676 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ updates:
interval: "weekly"
labels:
- "dependencies"
- "skip changelog"
- "skip changelog"
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

### Added
- Rust Quickstarter with Axum web framework simple boilerplate ([#980](https://github.com/opendevstack/ods-quickstarters/issues/980))
- Added ETL pipeline testing QS (e2e-python) ([#985](https://github.com/opendevstack/ods-quickstarters/pull/985))
- Update gateway-Nginx quickstarter ([#983](https://github.com/opendevstack/ods-quickstarters/pull/983))
- Added secret scanning in docker plain ([#963](https://github.com/opendevstack/ods-quickstarters/pull/963))
- Added Nodejs20 agent ([#962](https://github.com/opendevstack/ods-quickstarters/issues/962))
Expand Down
1 change: 1 addition & 0 deletions docs/modules/quickstarters/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
** xref:quickstarters:ds-rshiny.adoc[Data Science RShiny app]
** xref:quickstarters:ds-streamlit.adoc[Data Science Streamlit app]
** xref:quickstarters:e2e-cypress.adoc[Cypress E2E testing]
** xref:quickstarters:e2e-etl-python.adoc[ETL Python E2E testing]
** xref:quickstarters:e2e-spock-geb.adoc[Spock, Geb and Unirest E2E testing]
** xref:quickstarters:inf-terraform-aws.adoc[INF Terraform AWS]
** xref:quickstarters:inf-terraform-azure.adoc[INF Terraform AZURE]
Expand Down
46 changes: 46 additions & 0 deletions docs/modules/quickstarters/pages/e2e-etl-python.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
= End-to-end tests with Great Expectations and Pytest (e2e-etl-python)

End-to-end tests for ETLs quickstarter project

== Purpose of this quickstarter

This is a python based quicktarter intended to develop end-to-end tests for data pipelines.
In order to do that it uses two testing technologies:
1. Great Expectations, meant for data transformation testing data within relational tables.
e.g.: You could test the schema of a database, the number of rows, that a specific column has no null values, etc
2. Pytest together with Boto it allows for testing etl triggers, notification system, content of S3 buckets, etc

== What files / architecture is generated?

----
├── Jenkinsfile - This file contains Jenkins stages.
├── README.md
├── environments
│ ├── dev.json - This file describes parameters for the development AWS environment.
│ ├── test.json - This file describes parameters for the test AWS environment.
│ └── prod.json - This file describes parameters for the production AWS environment.
├── tests - This folder contains the root for test-kitchen
│ ├── acceptance/great_expectations - This folder contains the Great Expecations tests to test
│ └── acceptance/pytest - This folder contains the pytest tests to test
----

== Frameworks used

* https://greatexpectations.io[Great-expectations]
* https://pytest.org[Pytest]


== Usage - how do you start after you provisioned this quickstarter

Check the README.md file at root level for further instructions after the quickstarter has been provisioned.


== Builder agent used

This quickstarter uses https://github.com/opendevstack/ods-quickstarters/tree/master/common/jenkins-agents/terraform[terraform] Jenkins agent.

== Known limitations

Let us know if you find any, thanks!
1 change: 1 addition & 0 deletions docs/modules/quickstarters/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Quickstarters are used from the https://github.com/opendevstack/ods-provisioning
=== E2E Test Quickstarter
* xref::e2e-cypress.adoc[E2E test - Cypress]
* xref::e2e-spock-geb.adoc[E2E test - Spock / Geb]
* xref::e2e-etl-python.adoc[E2E test - ETL Python]

=== Infrastructure Terraform Quickstarter
* xref::inf-terraform-aws.adoc[AWS deployments utilizing terraform tooling]
Expand Down
48 changes: 48 additions & 0 deletions e2e-etl-python/Jenkinsfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
def odsNamespace = ''
def odsGitRef = ''
def odsImageTag = ''
def sharedLibraryRef = ''
def agentImageTag = ''

node {
odsNamespace = env.ODS_NAMESPACE ?: 'ods'
odsGitRef = env.ODS_GIT_REF ?: 'master'
odsImageTag = env.ODS_IMAGE_TAG ?: 'latest'
sharedLibraryRef = env.SHARED_LIBRARY_REF ?: odsImageTag
agentImageTag = env.AGENT_IMAGE_TAG ?: odsImageTag
}

library("ods-jenkins-shared-library@${sharedLibraryRef}")

odsQuickstarterPipeline(
imageStreamTag: "${odsNamespace}/jenkins-agent-base:${agentImageTag}",
) { context ->

odsQuickstarterStageCopyFiles(context)

odsQuickstarterStageRenderJenkinsfile(context)

odsQuickstarterStageRenderJenkinsfile(
context,
[source: 'dev.yml.template',
target: 'environments/dev.yml']
)

odsQuickstarterStageRenderJenkinsfile(
context,
[source: 'test.yml.template',
target: 'environments/test.yml']
)

odsQuickstarterStageRenderJenkinsfile(
context,
[source: 'prod.yml.template',
target: 'environments/prod.yml']
)

odsQuickstarterStageRenderJenkinsfile(
context,
[source: 'testing.yml.template',
target: 'environments/testing.yml']
)
}
183 changes: 183 additions & 0 deletions e2e-etl-python/Jenkinsfile.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
/* generated jenkins file used for building and deploying AWS-infrastructure in projects */

@Library('ods-jenkins-shared-library@@shared_library_ref@') _

node {
aws_region = env.AWS_REGION ?: 'eu-west-1'
dockerRegistry = env.DOCKER_REGISTRY
}

odsComponentPipeline(
podContainers: [
containerTemplate(
name: 'jnlp',
image: "${dockerRegistry}/ods/jenkins-agent-terraform-2306:@shared_library_ref@",
envVars: [
envVar(key: 'AWS_REGION', value: aws_region)
],
alwaysPullImage: true,
args: '${computer.jnlpmac} ${computer.name}'
)
],
branchToEnvironmentMapping: [
'*': 'dev',
// 'release/': 'test'
]
) { context ->
getEnvironment(context)
addVars2envJsonFile(context)
odsComponentStageInfrastructure(context, [cloudProvider: 'AWS'])

withEnv(["AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}",
"AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}"
])
{
stage ("AWS Testing Preparation"){
generateTerraformOutputsFile()
}

def outputNames = stageGetNamesFromOutputs()
def aws_pipelineName = outputNames.aws_codepipeline_name
def bitbuckets3_name = outputNames.bitbuckets3_name
def results3_name = outputNames.results3_name

stage ("Publish Bitbucket Code To AWS"){
publishBitbucketCodeToAWS(context, bitbuckets3_name)
}

stage ("Run Tests"){
awsCodePipelineTrigger(context, aws_pipelineName)
awsCodePipelineWaitForExecution(context, aws_pipelineName)
}

stage ("Test Results"){
retrieveReportsFromAWS(context, results3_name)
archiveArtifacts artifacts: "build/test-results/test/**", allowEmptyArchive: true
junit(testResults:'build/test-results/test/*.xml', allowEmptyResults: true)
stash(name: "acceptance-test-reports-junit-xml-${context.componentId}-${context.buildNumber}", includes: "build/test-results/test/acceptance*junit.xml", allowEmpty: true)
stash(name: "installation-test-reports-junit-xml-${context.componentId}-${context.buildNumber}", includes: "build/test-results/test/installation*junit.xml", allowEmpty: true)
stash(name: "integration-test-reports-junit-xml-${context.componentId}-${context.buildNumber}", includes: "build/test-results/test/integration*junit.xml", allowEmpty: true)
}
}

}

def getEnvironment(def context){
sh "echo Get Environment Variables"
AWS_ACCESS_KEY_ID = sh(returnStdout: true, script:"oc get secret aws-access-key-id-${context.environment} --namespace ${context.cdProject} --output jsonpath='{.data.secrettext}' | base64 -d")
AWS_SECRET_ACCESS_KEY = sh(returnStdout: true, script:"oc get secret aws-secret-access-key-${context.environment} --namespace ${context.cdProject} --output jsonpath='{.data.secrettext}' | base64 -d")

}


def generateTerraformOutputsFile() {
sh 'terraform output -json > terraform_outputs.json'
sh 'cat terraform_outputs.json'
}

def stageGetNamesFromOutputs() {
def outputNames = [:]
def terraformOutputJson = readJSON file: 'terraform_outputs.json'

outputNames.aws_codepipeline_name = terraformOutputJson.codepipeline_name.value
outputNames.bitbuckets3_name = terraformOutputJson.bitbucket_s3bucket_name.value
outputNames.results3_name = terraformOutputJson.e2e_results_bucket_name.value

return outputNames
}

def awsCodePipelineTrigger(def context, pipelineName) {
sh "aws codepipeline start-pipeline-execution --name ${pipelineName}"
}


def awsCodePipelineWaitForExecution(def context, pipelineName) {
def pipelineExecutionStatus = ''

while (true) {
pipelineExecutionStatus = ''
sleep(time: 40, unit: 'SECONDS')
def pipelineState = sh(
script: "aws codepipeline get-pipeline-state --name ${pipelineName} --query 'stageStates[*]' --output json",
returnStdout: true
).trim()

def pipelineStages = readJSON(text: pipelineState)

pipelineStages.each { stage ->
def stageName = stage.stageName
def stageStatus = stage.latestExecution.status
echo "Stage: ${stageName}, Status: ${stageStatus}"

if (stageStatus == 'InProgress') {
pipelineExecutionStatus = 'InProgress'
return
} else if (stageStatus == 'Failed') {
pipelineExecutionStatus = 'Failed'
echo "Pipeline execution failed at stage ${stageName}"
error("Pipeline execution failed at stage ${stageName}")
return
}
}

if (pipelineExecutionStatus == 'InProgress') {
continue
} else if (pipelineExecutionStatus == 'Failed') {
echo "Pipeline execution failed at stage ${stageName}"
break
} else {
echo 'Pipeline execution completed successfully.'
break
}
}
}



def publishBitbucketCodeToAWS(def context, bitbuckets3_name) {
def branch = context.gitBranch
def repository = context.componentId
zip zipFile: "${repository}-${branch}.zip", archive: false, dir: '.'
sh " aws s3 cp ${repository}-${branch}.zip s3://${bitbuckets3_name}/${repository}-${branch}.zip"
}

def retrieveReportsFromAWS(def context, results3_name) {
sh "aws s3 cp s3://${results3_name}/junit/acceptance_GX_junit.xml ./build/test-results/test/acceptance_GX_junit.xml"
sh "aws s3 cp s3://${results3_name}/junit/acceptance_pytest_junit.xml ./build/test-results/test/acceptance_pytest_junit.xml"
sh "aws s3 cp s3://${results3_name}/junit/installation_pytest_junit.xml ./build/test-results/test/installation_pytest_junit.xml"
sh "aws s3 cp s3://${results3_name}/junit/integration_pytest_junit.xml ./build/test-results/test/integration_pytest_junit.xml"

sh "aws s3 cp s3://${results3_name}/GX_test_results ./build/test-results/test/artifacts/acceptance/acceptance_GX_report --recursive"
sh "aws s3 cp s3://${results3_name}/GX_jsons ./build/test-results/test/artifacts/acceptance/GX_jsons --recursive"
sh "aws s3 cp s3://${results3_name}/pytest_results/acceptance/acceptance_allure_report_complete.html ./build/test-results/test/artifacts/acceptance/acceptance_pytest_report.html"
sh "aws s3 cp s3://${results3_name}/pytest_results/installation/installation_allure_report_complete.html ./build/test-results/test/artifacts/installation/installation_pytest_report.html"
sh "aws s3 cp s3://${results3_name}/pytest_results/integration/integration_allure_report_complete.html ./build/test-results/test/artifacts/integration/integration_pytest_report.html"

sh "ls build/test-results/test"
}

def addVars2envJsonFile(def context) {
echo "Starting addVars2envJsonFile"
def environment = context.environment
def projectId = context.projectId
def branch_name = context.gitBranch
def repository = context.componentId
def filePath = "./environments/${environment}.json"

def existingJson = readFile file: filePath
def existingData = readJSON text: existingJson

existingData.environment = environment
existingData.projectId = projectId
existingData.aws_region = aws_region
existingData.repository = repository
existingData.branch_name = branch_name

echo "Environment: ${existingData}"

def updatedJson = groovy.json.JsonOutput.toJson(existingData)
writeFile file: filePath, text: updatedJson

echo "Finishing addVars2envJsonFile"
}

5 changes: 5 additions & 0 deletions e2e-etl-python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# e2e-etl-python Quickstarter (e2e-etl-python)

Documentation is located in our [official documentation](https://www.opendevstack.org/ods-documentation/opendevstack/latest/getting-started/index.html)

Please update documentation in the [antora page directory](https://github.com/opendevstack/ods-quickstarters/tree/master/docs/modules/quickstarters/pages)
7 changes: 7 additions & 0 deletions e2e-etl-python/dev.yml.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
region: eu-west-1

credentials:
key: @project_id@-cd-aws-access-key-id-dev
secret: @project_id@-cd-aws-secret-access-key-dev

account: "<your_aws_account_id>"
19 changes: 19 additions & 0 deletions e2e-etl-python/files/.editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# EditorConfig is awesome: http://EditorConfig.org

# top-most EditorConfig file
root = true

[*]
charset = utf-8
end_of_line = lf
indent_size = 2
indent_style = space
insert_final_newline = true
trim_trailing_whitespace = true

[*.md]
trim_trailing_whitespace = false ; trimming trailing whitespace may break Markdown

[Makefile]
tab_width = 2
indent_style = tab
20 changes: 20 additions & 0 deletions e2e-etl-python/files/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.bundle
.kitchen
.terraform
.terraform.lock.hcl
.terraform-data.json
.vscode
.devcontainer/devcontainer.json
*.auto.tfvars*
inspec.lock
outputs.json
terraform.tfvars*
terraform.tfstate*
tfplan
vendor
test/integration/*/files/*.json
test/integration/*/files/*.yml
reports/install/*
!reports/install/.gitkeep
Pipfile.lock
.venv
Loading

0 comments on commit 023911a

Please sign in to comment.