Skip to content

Commit

Permalink
adding new post about gh actions
Browse files Browse the repository at this point in the history
  • Loading branch information
robertmitchellv committed May 9, 2024
1 parent c21e93e commit a8c4867
Show file tree
Hide file tree
Showing 2 changed files with 391 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"hash": "fb0db3ab8d7c6f5f11aa241fc4c91287",
"result": {
"markdown": "---\ntitle: \"GitHub Actions and `workflow_call`\"\nauthor: \"Robert Mitchell\"\ndate: \"2024-05-08\"\ncategories: [GitHub Actions, DevOps]\npage-layout: article \ntoc: true\ntoc-location: left\ncode-copy: true\ncode-overflow: wrap\ncode-line-numbers: true\ndf-print: paged\n---\n\n::: {.callout-caution}\n## Setting Some Expectations\nWhile I enjoy learning about DevOps, my primary role is as a Data Engineer, though I often function more like a software engineer working on data-intensive applications. Much of what I've learned comes from examining existing workflows, reading documentation, exploring other blogs, and practical experimentation. If you notice any inaccuracies, please feel free to create an issue on the {{< fa brands github >}} repository. Your feedback is greatly appreciated!\n:::\n\n## How did I get here?\n\n::: {.column-page-inset-right}\nI've found working with GitHub Actions both empowering and intimidating. It's empowering because it automates tasks that would otherwise be manual, but intimidating due to the complexity it adds, often leading to tricky debugging scenarios. **A key takeaway is that while striving for simplicity can make code more manageable, over-optimization, especially in DevOps, can lead to diminishing returns and increased fragility.** Debugging a draft PR with numerous commits and failed runs can be mentally exhausting, particularly when you're making changes just to see what happens.\n\nRecently, I explored how to trigger certain workflows based on the specific needs of the code changes for my team. This may not seem necessary for small projects where all tests run in under ten minutes, but in a monorepo with multiple services, including SDKs, containers running Python, JS, or databases, tests could take 15 minutes or more. This can significantly slow down the review process for PRs.\n\nOne solution I delved into was `workflow_call`. I had to spend some time understanding how it works through examples in our repo, particularly how to manage inputs and outputs between workflow parts effectively.\n:::\n\n## `workflow_call` from the docs\n\n::: {.column-page-inset-right}\nLet's start by looking at the [docs](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_call).\n\nThe documentation introduces `workflow_call` as a method to allow a workflow to be triggered by another. It then directs you to the section on \"[Reusing workflows](https://docs.github.com/en/actions/using-workflows/reusing-workflows)\". This resource is thorough and detailed, but can be overwhelming initially, so I'll break down the basics first to build a solid foundation.\n:::\n\n## Basic workflows\n\n::: {.column-page-inset-right}\nGitHub Action workflows are located in the `.github/` directory of your repo, under the `workflows` folder:\n\n````bash\n.\n└── .github\n └── workflows\n ├── lint.yaml\n └── test.yaml\n````\n\nWorkflows are `.yaml` files that specify what actions should run based on certain triggers. Initially, I found this part of a repo daunting, often relying on the community for pre-built actions. However, workflows are actually quite straightforward once you understand the syntax and logic.\n\nHere is an example of a simple linting workflow:\n\n```{.yaml filename=\"lint.yaml\"}\nname: Lint\n\non:\n push:\n branches:\n - main\n pull_request:\n branches:\n - main\n\njobs:\n lint:\n runs-on: ubuntu-latest\n steps:\n - uses: actions/checkout@v3\n - name: Set up Python\n uses: actions/setup-python@v4\n with:\n python-version: '3.10'\n - name: Install dependencies\n run: pip install ruff\n - name: Lint with Ruff\n run: ruff code/\n```\n\nThis workflow includes three top-level elements:\n1. `name`: The name of the workflow.\n2. `on`: Specifies the triggers for the workflow, such as `push` and `pull_request` events on the `main` branch.\n3. `jobs`: Lists the jobs that will run, typically in parallel.\n\nLet's continue with a unit testing workflow:\n\n```{.yaml filename=\"test.yaml\"}\nname: Test\n\non:\n push:\n branches:\n - main\n pull_request:\n branches:\n - main\n\njobs:\n test:\n runs-on: ubuntu-latest\n steps:\n - uses: actions/checkout@v3\n - name: Set up Python\n uses: actions/setup-python@v4\n with:\n python-version: '3.10'\n - name: Install dependencies\n run: pip install pytest\n - name: Run tests\n run: pytest code/\n```\n\nSimilar to the linting workflow, this workflow contains the same three top-level elements, but with a different job configuration.\n\nUnderstanding the components of `jobs:` and their keys was initially confusing. These elements specify the steps to be run in each job, which can be:\n- **Shell commands:** Executable commands using the `run:` key, similar to what you would type in a terminal.\n- **Actions:** Reusable actions specified with the `uses:` key, like `actions/checkout@v3` for checking out repository code.\n- **Composite run steps actions:** Grouped steps within a single action for reuse across workflows, specified with the `steps:` key.\n:::\n\n\n## More complex workflows with `workflow_call`\n\n::: {.column-page-inset-right}\n\nNow let's return to the `workflow_call` and how it can be used to trigger a workflow from another workflow.\n\n\n````{yaml filename=\"changes.yaml\"}\nname: Determine Changes\n\non:\n workflow_call:\n outputs:\n changed-files-data:\n description: \"JSON formatted list of changed files with metadata\"\n value: ${{ jobs.determine-changes.outputs.changed-files-data }}\n\njobs:\n determine-changes:\n runs-on: ubuntu-latest\n outputs:\n changed-files-data: ${{ steps.create-changed-files-data.outputs.result }}\n steps:\n - uses: actions/checkout@v3\n with:\n fetch-depth: 0\n\n - name: Fetch base and head branches\n id: gather-branches\n run: |\n git fetch origin ${{ github.base_ref }}:${{ github.base_ref }}\n git fetch origin ${{ github.head_ref }}:${{ github.head_ref }}\n\n - name: Create changed files data\n id: create-changed-files-data\n run: |\n echo \"Base reference: ${{ github.base_ref }}\"\n echo \"Head reference: ${{ github.head_ref }}\"\n\n # get the list of changed files\n DIFF_OUTPUT=$(git diff --name-only ${{ github.base_ref }}...${{ github.head_ref }})\n mapfile -t CHANGED_FILES <<< \"$DIFF_OUTPUT\"\n echo \"Changed files:\"\n printf '%s\\n' \"${CHANGED_FILES[@]}\"\n\n JSON_ARRAY=\"[\"\n\n # create a JSON array of the changed files\n for FILE in \"${CHANGED_FILES[@]}\"; do\n EXTENSION=\"${FILE##*.}\"\n \n JSON_ENTRY=$(jq -nc \\\n --arg file \"$FILE\" \\\n --arg extension \"$EXTENSION\" \\\n '{\n file: $file,\n extension: $extension\n }')\n \n JSON_ARRAY+=\"$JSON_ENTRY,\"\n done\n\n JSON_ARRAY=\"${JSON_ARRAY%,}]\"\n\n echo \"Changed files data: $JSON_ARRAY\"\n echo \"result=$JSON_ARRAY\" >> $GITHUB_OUTPUT\n````\n\n\nThis workflow has the same three top-level elements but we can see that there are some big differences. In the `on:` \ntop-level element we see the `workflow_call` first-level element with its own set of keys that are unfamilar.\n\n````yaml\non:\n workflow_call:\n outputs:\n changed-files-data:\n description: \"JSON formatted list of changed files with metadata\"\n value: ${{ jobs.determine-changes.outputs.changed-files-data }}\n````\n\nWhat we're demonstrating here is that this workflow is exclusively triggered by another workflow—this is the only way it activates, using the `workflow_call` trigger. This setup is designed to ensure that the workflow can contribute data to subsequent processes.\n\nWe also define an output for this workflow. This output is named `changed-files-data` and it includes a description and a value specifying where in this workflow the data is produced. Whenever you encounter `${{ ... }}`, you're seeing what's known as a **variable expression**. These expressions help us dynamically reference data produced by the workflow.\n\nLet's break down the location of this variable:\n\n- `jobs`: This indicates that the variable is located within the second-level element of the workflow.\n - `determine-changes`: This is the name of a specific job within our workflow where the output is produced.\n - `outputs`: This section within the job specifies where the output is generated.\n - `changed-files-data`: Here’s the interesting part! We encounter another variable expression that indicates where in the job's steps the data is finalized.\n\nNow, looking into the `changed-files-data` variable expression: `${{ steps.create-changed-files-data.outputs.result }}`:\n- `steps`: This tells us that the variable is found within the third-level element of our workflow.\n - `create-changed-files-data`: This is the `id` of the step, labeled \"Create changed files data\", where the output is generated.\n - `outputs`: This subsection within the step delineates where the output is specifically created.\n - `result`: This is the identifier for the output produced by the step.\n\nBy structuring workflows in this manner, we enable modular, reusable components that can interact seamlessly within GitHub Actions.\n:::\n\n\n## How do we use the output from `workflow_call`?\n\n::: {.column-page-inset-right}\nNow that we've seen how we can set up a workflow that can be called by another workflow--how do we actually call it? Let's return to a linting example that is slightly expanded to include the workflow from our previous step.\n\n\n````{yaml filename=\"lintChanges.yaml\"}\nname: Lint Changed Files\n\non:\n workflow_call:\n inputs:\n changed-files-data:\n required: true\n type: string\n\njobs:\n lint-python:\n runs-on: ubuntu-latest\n steps:\n - name: Checkout\n uses: actions/checkout@v3\n\n - name: Set up Python\n uses: actions/setup-python@v2\n with:\n python-version: '3.10'\n\n - name: Install Ruff\n run: |\n pip install ruff\n\n - name: Lint Python files\n run: |\n # Extract Python files from JSON input\n CHANGED_FILES_JSON='${{ inputs.changed-files-data }}'\n CHANGED_PY_FILES=$(echo \"$CHANGED_FILES_JSON\" | jq -r '.[] | select(.extension == \"py\") | .file')\n if [[ -n \"$CHANGED_PY_FILES\" ]]; then\n echo \"Changed Python Files: $CHANGED_PY_FILES\"\n ruff check --output-format=github $CHANGED_PY_FILES\n ruff format --check $CHANGED_PY_FILES\n else\n echo \"No Python files to lint.\"\n````\n\n\n:::\n\n::: {.callout-tip} \n## {{< fa puzzle-piece >}} Putting it together\nYou can think of the `.container` like a board you want to hang on the wall--the `.features` is like a sticky adhesive and the `.feature` the squares with notes on them that you stick to your board. \n:::\n\n\n\n## Conclusion\n\n::: {.column-page-inset-right}\nI think this is probably a good place to end things. There is obviously more I could cover in connection to publishing, custom domains, etc, but as I mentioned at the outset--I just wanted to sort of focus on the parts that were exciting and new to me via Quarto. Feel free to get in touch with me if you have any questions or comments. Again, if something isn't correct, feel free to create an issue on the {{< fa brands github >}} repo.\n:::\n\n",
"supporting": [
"gh-actions_files"
],
"filters": [],
"includes": {}
}
}
Loading

0 comments on commit a8c4867

Please sign in to comment.