You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are now a couple of places in the PEcAn code where environment variables can override the values recorded in settings.xml. This is often very useful, but has high potential to cause confusion and dilutes the principle that what you see in the settings file is the behavior you'll get. I'd like us to have a unified system for ensuring that any behavior altered by environment variables is predictable to minimize runtime confusion and reliably recorded as part of the provenance chain to minimize confusion when reasoning about runs after the fact.
Proposed solution
Establish a defined hierarchy of the sources a setting can come from and document which sources override which others.
Any time a parameter is changed by an environment variable, record the value change somewhere permanent. I don't know exactly what this should look like, but maybe it could be as simple as a logger.info line.
Adopt a coding practice of keeping the effects of env vars localized. Ideally, any time you're reading a chunk of code whose behavior might change because of an env var, the fact that an env var was consulted ought to be visible rather than hidden a few layers away.
Additional Context
The main place this matters today is in PEcAn.DB::get_postgres_envvars, which was implemented in #2541 and briefly discussed in #2632. Important points from those threads:
The big ol' downside: This function flagrantly breaks PEcAn's usual rule that the values in settings.xml always win over any conflicting defaults, and it does it in a way (environment variables) that's famously nontransparent and hard to debug unless you know what you're looking for. I expect we will not want to use this function very much outside of tests, but for tests I don't see any other good options for balancing reusable settings files against machine-specific database configurations -- I'm open to alternate proposals if anyone has them.
A follow-up thought that might be a separate PR: I know that overriding defaults is the entire reason I implemented get_postgres_envvars in the first place, but from a provenance-tracking perspective I'm still not thrilled that it lets us silently override values that are present in the settings. Should we consider making get_postgres_envvars log the values it assigns? If so, further considerations include how to avoid leaking secrets and whether logging on every call is too much log spam.
I was thinking as well about the order of things. It be nice if there was some order that we want to use and if a variable is defined in one place that will be the value that sticks. For example docker-compose has a very explicit listing of how they find a variable:
When you set the same environment variable in multiple files, here’s the priority used by docker-compose to choose which value to use:
Compose file
Shell environment variables
Environment file
Dockerfile
Variable is not defined
The text was updated successfully, but these errors were encountered:
I’d like to work on this issue next. While I understand the general problem, I might need your guidance with some parts, especially around how settings.xml and environment variables are currently handled. Could you point me to the key areas in the codebase to start?
Description
There are now a couple of places in the PEcAn code where environment variables can override the values recorded in settings.xml. This is often very useful, but has high potential to cause confusion and dilutes the principle that what you see in the settings file is the behavior you'll get. I'd like us to have a unified system for ensuring that any behavior altered by environment variables is predictable to minimize runtime confusion and reliably recorded as part of the provenance chain to minimize confusion when reasoning about runs after the fact.
Proposed solution
logger.info
line.Additional Context
The main place this matters today is in
PEcAn.DB::get_postgres_envvars
, which was implemented in #2541 and briefly discussed in #2632. Important points from those threads:@infotroph in #2541:
@infotroph in #2632 (comment):
@robkooper in #2632 (comment):
The text was updated successfully, but these errors were encountered: