Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: save host run file output #1288

Closed
wants to merge 0 commits into from
Closed

Conversation

cwyl02
Copy link
Contributor

@cwyl02 cwyl02 commented Aug 2, 2023

Description, Motivation and Context

Address the collector part of #1078

  • added .env to allow user to specify which env vars to import into this host run
  • added .input to allow user to create input files(e.g., config file, sample data for this host run). User needs to utilize TS_INPUT_DIR env var in their script, with the filename they specify in .input , e.g., $TS_INPUT_DIR/dummy.yaml
  • added .outputDir to allow user to specify a name of a directory(within the directory of the collector in bundle) to host their host run command file output. Note that command run has no access to this directory

Example SupportBundle:

apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: run
spec:
  hostCollectors:
    - run:
        collectorName: "some-magic-on-host"
        command: "/home/cwyl02/custom-host-run.sh"
        args: ["--timeout", "10m"]
        env:
            - AWS_REGION=us-west-2
        inheritParentEnvs: false  # Whether to inherit all the variable or not. If true, inheritEnvs would be made redundant
        inheritEnvs:   # list of env variables to inherit from the parent process
            - SHELL
            - LANG
            - PYTHON
        outputDir: "magic-outputs"
        input:
          dummy.yaml: |-
            username: postgres
            password: <my-pass>
            dbHost: <hostname>
            map:
              key: value
            list:
              - val1
              - val2
          config.toml: |-
             title = "TOML Example"
             [owner]
             name = "Tom Preston-Werner"

Result bundle:

├── analysis.json
├── execution-data
│   └── summary.txt
├── host-collectors
│   └── run-host
│       └── some-magic-on-host
│           ├── collector-info.json
│           ├── collector.txt
│           └── magic-outputs
│               ├── magic-dir-0
│               │   └── t1.txt
│               └── magic-dir-1
│                   └── t2.txt
└── version.yaml

Checklist

  • New and existing tests pass locally with introduced changes.
  • Tests for the changes have been added (for bug fixes / features)
  • The commit message(s) are informative and highlight any breaking changes
  • Any documentation required has been added/updated. For changes to https://troubleshoot.sh/ create a PR here

Does this PR introduce a breaking change?

  • Yes
  • No

@CLAassistant
Copy link

CLAassistant commented Aug 2, 2023

CLA assistant check
All committers have signed the CLA.

@xavpaice xavpaice added the type::feature New feature or request label Aug 3, 2023
@xavpaice
Copy link
Member

xavpaice commented Aug 4, 2023

Thanks for adding this. To pass tests, you'll want to run make schemas and commit the changes.

Is there any testing that can be added?

Also, it's likely this will need a docs change, particularly if this is going to change paths to output which we would usually consider a breaking change.

@cwyl02
Copy link
Contributor Author

cwyl02 commented Aug 4, 2023

hi @xavpaice!

Thanks for adding this. To pass tests, you'll want to run make schemas and commit the changes.

I will make sure to run that before flipping to ready for review.

Is there any testing that can be added?

I plan to add some unit test after we agree on the approach and implementation.

Also, it's likely this will need a docs change, particularly if this is going to change paths to output which we would usually consider a breaking change.

Agree. Definitely need to add more docs for this feature.

@banjoh
Copy link
Member

banjoh commented Sep 1, 2023

This change captures the feature well. My minor comments are

  • Prefix env variables e.g TS_WORKSPACE
  • The config map would lead to files being dropped on disk where the names of the files are the map keys and the contents are the map values. e.g the config below would lead to files named dummy.yaml, path/to/data and config.toml with the contents of their corresponding values. I would suggest we rename config to input or files which is generic and can hold any kind of input files
config:
  dummy.yaml: |-
    username: postgres
    password: <my-pass>
  path/to/data: "some sample data"
  config.toml: |-
    title = "TOML Example"
    [owner]
    name = "Tom Preston-Werner"

pkg/collect/host_run.go Outdated Show resolved Hide resolved
pkg/collect/host_run.go Outdated Show resolved Hide resolved
pkg/collect/host_run.go Outdated Show resolved Hide resolved
@cwyl02
Copy link
Contributor Author

cwyl02 commented Sep 19, 2023

This change captures the feature well. My minor comments are

  • Prefix env variables e.g TS_WORKSPACE
  • The config map would lead to files being dropped on disk where the names of the filed are the map keys and the contents are the map valued. e.g the config below would lead to a files named dummy.yaml, path/to/data and config.toml with the contents if their corresponding values. I would suggest we rename config to input or files which is generic and can hold any kind of input files
config:
  dummy.yaml: |-
    username: postgres
    password: <my-pass>
  path/to/data: "some sample data"
  config.toml: |-
    title = "TOML Example"
    [owner]
    name = "Tom Preston-Werner"

Like this idea, although maybe we should just not allow / in the file name to avoid the need of a logic to handle the different OS, and the logic to conditionally create a dir?

@cwyl02 cwyl02 marked this pull request as ready for review September 19, 2023 03:07
@cwyl02 cwyl02 requested a review from a team as a code owner September 19, 2023 03:07
@cwyl02 cwyl02 force-pushed the main branch 2 times, most recently from da8fb57 to 6ff150a Compare September 20, 2023 18:54
@banjoh
Copy link
Member

banjoh commented Sep 27, 2023

Like this idea, although maybe we should just not allow / in the file name to avoid the need of a logic to handle the different OS, and the logic to conditionally create a dir?

My intention with this is to reduce friction when landing files for programs/scripts to consume. This way, programs/scripts would not need to arrange files to directories before running. But I think we can leave this out since its a bit of an edge case. If need arises, we can introduce it

With regards to envs, I think the map should list additional environment variables to add to the command being launched instead of what to inherit from the parent process

Additionally, for the security focused folks, I think we need to be explicit on whether to inherit the entire environment of the parent process using a key such as inheritParentEnvs or something similar and another inheritEnv which explicitly lists what to inherit. By default, a process created using the exec command will inherit the parent's environment.

We will have some always present environment variables such as PATH, KUBECONFIG, TS_WORKSPACE etc though

Here is a complete example

       env:
            - AWS_REGION=us-east-1
            - AWS_ACCESS_KEY_ID={{ aws.key-id.template }}
            - AWS_SECRET_ACCESS_KEY={{ aws.secret.template }}
       inheritParentEnvs: false  # Whether to inherit all the variable or not. If true, inheritEnvs would be made redundant
       inheritEnvs:   # list of env variables to inherit from the parent process
            - SHELL
            - LANG
            - PYTHON

For backward compatibility inheritParentEnvs would need to be true by default

Copy link
Member

@banjoh banjoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good now. Could you please add some tests as well?

pkg/collect/host_run.go Outdated Show resolved Hide resolved
Copy link
Member

@banjoh banjoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are almost there. I managed to run the collector locally but could not collect anything since TS_WORKSPACE_DIR does not exist. I added a few more comments in the PR (see below).

Here are the files I used to test

  • custom script
#!/usr/bin/env bash

pwd # Should we ensure the command's CWD is the $TS_WORKSPACE_DIR? Its currently the dir where support bundle is launched
env | grep AWS
ls -al $TS_WORKSPACE_DIR    # This fails. Dir does not exist
ls -al $TS_INPUT_DIR    # This fails. Dir does not exist
cat $TS_INPUT_DIR/dummy.yaml
echo "Drop my data: $($TS_INPUT_DIR/dummy.yaml)" >> $TS_WORKSPACE_DIR/mydata.txt # I'll have a redactor redact my secret later on
  • Spec
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: run
spec:
  hostCollectors:
    - run:
        collectorName: "ping-google"
        command: "custom-host-run.sh"
        env:
          - AWS_REGION=us-east-1
          - AWS_ACCESS_KEY_ID={{ aws.key-id.template }}
          - AWS_SECRET_ACCESS_KEY={{ aws.secret.template }}
        inheritParentEnvs: false
        inheritEnvs:
          - SHELL
          - LANG
          - TERM
        outputDir: "magic-outputs"
        input:
          dummy.yaml: My sample data REDACTED SECRET
          # config:   # I cannot load this config
          #   dummy.yaml: |-
          #     username: postgres
          #     password: <my-pass>
          #   config.toml: |-
          #     title = "TOML Example"
          #     [owner]
          #     name = "Tom Preston-Werner"
---
apiVersion: troubleshoot.sh/v1beta2
kind: Redactor
metadata:
  name: redactor-spec
spec:
  redactors:
    - name: redact-text
      removals:
        values:
        -  REDACTED SECRET

pkg/apis/troubleshoot/v1beta2/hostcollector_shared.go Outdated Show resolved Hide resolved
pkg/collect/host_run.go Outdated Show resolved Hide resolved
pkg/collect/host_run.go Outdated Show resolved Hide resolved
pkg/collect/host_run.go Outdated Show resolved Hide resolved
if err != nil {
return "", errors.New(fmt.Sprintf("failed to created temp dir for: %s", runHostCollector.OutputDir))
}
cmd.Env = append(cmd.Env,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change the CWD of the command to TS_WORKSPACE_DIR? The working dir is currently where support-bundle is launched from

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great question. TS_WORKSPACE_DIR is the temporary output directory so it only exists when user specify .outputDir.

The working dir is currently where support-bundle is launched from

I think this is the existing behavior. I thought it is confusing if the CWD changes when the .outputDir is specified vs not specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm I think the spec i was using was with absolute path so masked the issue. let me fix that

ExitCode string `json:"exitCode"`
Error string `json:"error"`
OutputDir string `json:"outputDir"`
Input string `json:"input"`
Copy link
Member

@banjoh banjoh Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a map[string]string or map[string]any? I could not load this yaml. How is the input parameter meant to behave? Isn't the key meant to be the filename and the the value be the contents of the file, all stored as flat files in TS_INPUT_DIR directory?

        input:
          dummy.yaml: My sample data REDACTED SECRET
          config:   # I cannot load this config
            dummy.yaml: |-
              username: postgres
              password: <my-pass>
            config.toml: |-
              title = "TOML Example"
              [owner]
              name = "Tom Preston-Werner"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the input files will be written into the TS_INPUT_DIR, which is a temp directory. It will always be deleted. I previously also choose to output the input file to somewhere but then removed that part -- talked with Martin and we thought input might be sensitive and might be good to just skip storing it.

The user triggers this bundle will have input file on their hand anyways but anything output to that bundle might do harm when the bundle is given to someone else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or btw it should be

        input:
          dummy.yaml: My sample data REDACTED SECRET
          dummy2.yaml: |-
            username: postgres
            password: <my-pass>
          config.toml: |-
            title = "TOML Example"
            [owner]
            name = "Tom Preston-Werner"

that's why its map[string]string

Copy link
Member

@banjoh banjoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regards to tests, please add

  • An example in examples directory similar to what I posted in my previous comment. The command part can be bash -c <some inline code e.g env, cat $TS_INPUT_DIR/dummy.yaml, ls -al...> so as not to be forced to add an external script
  • Unit tests around processEnvVars
    • Ensure directories that need to be created are created, and exist.
    • Ensure cmd has the correct parameters set i.e env vars, CWD etc
  • Add at least one level 2 (klog.V(2)) log, more if possible, that should show what the collector is doing like in this example.

@cwyl02
Copy link
Contributor Author

cwyl02 commented Oct 18, 2023

command: "custom-host-run.sh"

after fixing the issue this should work although you need to change command to: "./custom-host-run.sh"

@cwyl02
Copy link
Contributor Author

cwyl02 commented Oct 19, 2023

closed this PR accidentally 🙈 and opened a new one #1376

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type::feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants