Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provenance tracking when settings come from environment variables #2635

Open
infotroph opened this issue Jun 12, 2020 · 2 comments
Open

Provenance tracking when settings come from environment variables #2635

infotroph opened this issue Jun 12, 2020 · 2 comments

Comments

@infotroph
Copy link
Member

Description

There are now a couple of places in the PEcAn code where environment variables can override the values recorded in settings.xml. This is often very useful, but has high potential to cause confusion and dilutes the principle that what you see in the settings file is the behavior you'll get. I'd like us to have a unified system for ensuring that any behavior altered by environment variables is predictable to minimize runtime confusion and reliably recorded as part of the provenance chain to minimize confusion when reasoning about runs after the fact.

Proposed solution

  1. Establish a defined hierarchy of the sources a setting can come from and document which sources override which others.
  2. Any time a parameter is changed by an environment variable, record the value change somewhere permanent. I don't know exactly what this should look like, but maybe it could be as simple as a logger.info line.
  3. Adopt a coding practice of keeping the effects of env vars localized. Ideally, any time you're reading a chunk of code whose behavior might change because of an env var, the fact that an env var was consulted ought to be visible rather than hidden a few layers away.

Additional Context

The main place this matters today is in PEcAn.DB::get_postgres_envvars, which was implemented in #2541 and briefly discussed in #2632. Important points from those threads:

@infotroph in #2541:

The big ol' downside: This function flagrantly breaks PEcAn's usual rule that the values in settings.xml always win over any conflicting defaults, and it does it in a way (environment variables) that's famously nontransparent and hard to debug unless you know what you're looking for. I expect we will not want to use this function very much outside of tests, but for tests I don't see any other good options for balancing reusable settings files against machine-specific database configurations -- I'm open to alternate proposals if anyone has them.

@infotroph in #2632 (comment):

A follow-up thought that might be a separate PR: I know that overriding defaults is the entire reason I implemented get_postgres_envvars in the first place, but from a provenance-tracking perspective I'm still not thrilled that it lets us silently override values that are present in the settings. Should we consider making get_postgres_envvars log the values it assigns? If so, further considerations include how to avoid leaking secrets and whether logging on every call is too much log spam.

@robkooper in #2632 (comment):

I was thinking as well about the order of things. It be nice if there was some order that we want to use and if a variable is defined in one place that will be the value that sticks. For example docker-compose has a very explicit listing of how they find a variable:

When you set the same environment variable in multiple files, here’s the priority used by docker-compose to choose which value to use:

  1. Compose file
  2. Shell environment variables
  3. Environment file
  4. Dockerfile
  5. Variable is not defined
@github-actions
Copy link

This issue is stale because it has been open 365 days with no activity.

@harshagr70
Copy link
Contributor

Hi @infotroph,

I’d like to work on this issue next. While I understand the general problem, I might need your guidance with some parts, especially around how settings.xml and environment variables are currently handled. Could you point me to the key areas in the codebase to start?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants