Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define separation of configuration in ~/settings.yml and the SQlite DB #1524

Closed
jotaen4tinypilot opened this issue Jul 24, 2023 · 6 comments · Fixed by #1692
Closed

Define separation of configuration in ~/settings.yml and the SQlite DB #1524

jotaen4tinypilot opened this issue Jul 24, 2023 · 6 comments · Fixed by #1692
Assignees
Labels

Comments

@jotaen4tinypilot
Copy link
Contributor

We will reach an de-Ansibilized state in the foreseeable future. One “relict” or by-product is the YAML-based config file, /home/tinypilot/settings.yml, which still contains important config parameters that we rely on (independent of Ansible).

However, we also have the SQlite database, which contains configuration logic as well – such as whether H.264 or MJPEG is enabled, or whether HTTPS should be required.

I think the current/previous division of concerns was “by consumer”:

  • If configuration is consumed by Ansible at install/update time, then it lives in ~/settings.yml
  • If configuration is managed by the Python app at “runtime”, then it lives in the SQlite

We should clearly define where things are supposed to go and how our configuration is structured. The SQlite and the ~/settings.yml also have different practical constraints, which we should be aware of – e.g. the SQlite has a more powerful structure, but it cannot easily be processed outside of the Python app.

@mtlynch mtlynch added the medium label Oct 30, 2023
@jotaen4tinypilot jotaen4tinypilot self-assigned this Nov 15, 2023
@jotaen4tinypilot
Copy link
Contributor Author

jotaen4tinypilot commented Nov 16, 2023

Background and Investigation

I did some digging in git, and tried to find out what rationale we used in the past for deciding where to put settings, in the hope that it helps us make an informed decision.

  • settings.yml was Ansible-land initially, exclusively used during the Ansible-based update process (hence its Python implementation being located in the Python update package).
  • tinypilot.db saw the light of day when we added user authentication back in late 2020. I couldn’t find any explicit discussion of the storage format choice, though.

We kept adding new settings to tinypilot.db, but our reasoning doesn’t seem super consistent or deliberate in hindsight:

  • For storing the last Wake-on-LAN address, we didn’t discuss storage options. Persisting in the DB basically was a requirement right from the start.
  • For storing the Pro enterprise license key, we didn’t want to put it in settings.yml to avoid mixing sensitive with insensitive data in the YAML. tinypilot.db seemed to feel better, as it provided stronger “encapsulation” or “privacy” of the data.
  • For storing the streaming mode, we briefly considered using settings.yml, but we eventually didn’t go down that route once we realized that we would have had to populate a new, non-Ansible parameter to the YAML (which – for some reason – seemed off the table).

In the earlier days of TinyPilot, we offered wider support for customization to users through settings.yml. E.g., users were able to specify the Linux user account for running the app, or reconfigure the home directory. As of the 2.6.1 release (Sept. 2023), however, we have restricted customization support to only a handful of settings.

In any event, as of 2023, we are officially Ansible-free, so the original Ansible-related workings around settings.yml no longer constraint us. If we had free choice for how we’d store settings based on the code as it is today, we might even consider consolidating everything into a single storage format. However, we might also deliberately decide not to do that, and use separate storage formats based on the purpose of what’s stored, or for other practical reasons.

Proposal

Unifying the storage format is not an option for us at the moment, because it would require too much refactoring effort. We therefore content ourselves with the current split-approach, and try to find the clearest-possible definition that gives us sufficient guidance when adding new settings in the future.

The proposed definition is:

  • settings.yml is for system-related settings, which are shared with external services on the system (e.g., systemd services, privileged scripts, etc.).
    • Usage Rules: The TinyPilot app is authoritative for writing. Other system services may read.
    • Reasoning: A plain YAML file is better suited for universal access than a SQLite DB, since it’s plain text (and therefore e.g. grep’able).
  • tinypilot.db is for settings that are fully owned and controlled by the Python app.
    • Usage Rules: External services should never access the DB file directly, not even for reading. If at all, they must invoke the Python app via the CLI interface. (The CLI interface has limitations, however, since it’s relatively slow, and it also emits a bunch of application-level logging to stdout.)
    • Reasoning: SQlite has a more powerful structure, and is easier to work with programmatically via Python. E.g., we have stronger typing, and structural migrations are safer and more straightforward.

When adding a new setting, we use the above definition for deciding where the setting should live. We also try to realistically anticipate whether usage of the setting might change in the foreseeable future, to prevent avoidable migration work (if reasonably possible).

If requirements for a setting change, we migrate the setting to the appropriate storage format. (We might make pragmatic exceptions for marginal use-cases.)

Tasks

  • Move settings.py from the update package to the app package.
    • By this, we acknowledge that settings.yml is no longer restricted to the update process, but it rather has become an independent storage mechanism, which is also used outside the update context (E.g., for configuring uStreamer and Janus.)
    • We could also consider renaming to system_settings.py in Python, to make the name more clear and specific to the designated purpose.
  • Document the purpose and usage rules of the app.settings and app.db packages in their respective package docstrings.

Discussion

Request for comments on the proposal and tasks, @mtlynch / @jdeanwallace. In case we want to discuss this more deeply, I’m happy to turn this into a PR with a “design document” – in this case, just let me know. (I thought we could avoid that overhead, though, in case we’d happen to agree anyways.)

@jdeanwallace
Copy link
Contributor

jdeanwallace commented Nov 20, 2023

Could we alter our definition of settings.yml to include the following 👇

According to our docstring, settings.yml is also for user-configurable TinyPilot settings:

# Define default values for user-configurable TinyPilot settings. The YAML data
# in _SETTINGS_FILE_PATH take precedence over these defaults.

Which includes settings that don't necessarily get used system wide. For example, tinypilot_keyboard_interface only gets used to populate KEYBOARD_PATH for our Python app:

'tinypilot_keyboard_interface': '/dev/hidg0',
'tinypilot_mouse_interface': '/dev/hidg1',

KEYBOARD_PATH = '{{ tinypilot_keyboard_interface }}'
MOUSE_PATH = '{{ tinypilot_mouse_interface }}'

@mtlynch
Copy link
Contributor

mtlynch commented Nov 21, 2023

Goal

I just want to make sure we're on the same page in terms of goals. I don't think it's important for us to retrofit rules onto our existing implementation. The more important thing to me is to make a decision about what's best going forward, and it's okay if legacy settings violate that as long as future settings adhere to our rules.

Considerations

The other elements I think we should cover are performance and sensitivity.

  • Settings that are performance-critical should go in the SQLite DB
    • For example, we gate socket messages on DB reads. We do some caching to speed things up, but it's still a slow operation when we refresh the cache, and I imagine it would be even slower if we were reading directly from the filesystem.
  • Nothing sensitive should go into settings.yml

Comparison

Advantages of SQLite over settings.yml

  • We have support for data migrations that would be hard to apply to a YAML file.
  • SQLite provides more type checking and support for complex settings.
  • Performance is faster reading from SQLite than from a YAML file.
  • We have finer control over what to put into the debug log.
  • Raspbian ships with a sqlite CLI client, whereas we have to pull in yq for reading YAML.

Advantages of settings.yml over SQLite

  • It's easier to debug because it's plaintext.
  • We dump the entire settings.yml in the debug log, so any settings we add, we get "for free" in the debug log.
  • It's a good middle ground for settings that are too niche to support in the UI, but we want to let the user manage them manually if they care enough.

I used to think settings.yaml was easier to share with other apps, but I don't think that's true anymore. It was easier to share with Ansible because Ansible natively consumed YAML. It's not that easy to share with anything else because any kind of sharing depends on yq, which is kind of a pain to support. sqlite is built-in to Raspbian, so it's already available if we wanted to use that for reading DB settings from another app.

Proposal

All in all, I feel like SQLite is a better option than settings.yml.

Given that, I feel like the rule should just be that everything in the future goes into the SQLite database unless we have a strong reason not to put it there. One exception is for settings that we tell users they can manually override in settings. Another exception is settings that are there for legacy reasons.

Documenting user-editable settings

Could we alter our definition of settings.yml to include the following

Yeah, if we're documenting it, it's a good opportunity to have an authoritative definition of user-configurable settings. The closest we currently have is here:

https://tinypilotkvm.com/pro/changes#261

@jotaen4tinypilot
Copy link
Contributor Author

Thanks providing your thoughts so comprehensively.

I’ll start to work on a PR with the documentation, where I incorporate the things that we’ve discussed so far.

Using the SQlite by default

Given that, the rule should just be that everything in the future goes into the SQLite database unless we have a strong reason not to put it there. One exception is for settings that we tell users they can manually override in settings. Another exception is settings that are there for legacy reasons.

We rely on reading from settings.yml in several places right now:

We don’t have a good mechanism to programmatically read from tinypilot.db at the moment. The only way right now is to invoke the Flask app via a dedicated CLI endpoint. This is relatively slow, though, (several seconds), and Flask currently emits a bunch of unrelated logging output. We also have to keep in mind that it might be tricky to access the tinypilot.db file directly, without going through the Flask app, because tinypilot.db might not be in the latest migration state.

I feel if we’d effectively “deprecate” settings.yml (or at least declare it exceptional), we should consider looking into a robust mechanism for reading from tinypilot.db programmatically, to promote its usage over reading from settings.yml in the future. For instance, if I were to implement the configure-janus privileged script by gating it on tinypilot.db instead of settings.yml, I currently wouldn’t know how to do that best.

Documenting user-editable settings

Related #1573.

@mtlynch
Copy link
Contributor

mtlynch commented Nov 21, 2023

Just to clarify, I didn't mean my opinion as an edict. I'm open to other solutions if we want a different strategy.

I feel if we’d effectively “deprecate” settings.yml (or at least declare it exceptional), we should consider looking into a robust mechanism for reading from tinypilot.db programmatically, to promote its usage over reading from settings.yml in the future. For instance, if I were to implement the configure-janus privileged script by gating it on tinypilot.db instead of settings.yml, I currently wouldn’t know how to do that best.

Yeah, that's a good idea. Can you create ticket for that?

It seems like there should be some way of making the Flask CLI faster because what the heck is it doing for several seconds just to do a single SQLite database call?

@jotaen4tinypilot
Copy link
Contributor Author

jotaen4tinypilot commented Nov 22, 2023

I think in the end it’s a bit of a strategic decision which – considering current requirements – is mostly up to our preference. I don’t feel super strongly here, and I think there are valid arguments for each approach, including the hybrid one.

In regards to making SQlite our primary option for storing settings, I feel that having a robust programmatic interface makes a big difference, as programmatic access would otherwise be an at least partially blocking problem that would hinder us going down that road. In any event, I see a lot of value in consolidating everything in one place, as that eases future decision-making where to put new settings.

I’ll work on a PR with a proposal for documentation, based on our discussion so far. It’s probably easier for us to continue the discussion on that basis.

Yeah, that's a good idea. Can you create ticket for that?

Yes → #1690

I just measured the Flask execution time on device, by the way: it’s ~1.5s per CLI invocation for the streaming-mode endpoint. That admittedly isn’t exactly “several seconds”, but still quite slow for what it does. (Especially considering that the pure SQlite-query is barely 5ms.)

jotaen4tinypilot added a commit that referenced this issue Nov 30, 2023
Resolves #1524.

The outcome of our discussion in the ticket was this:

- The SQlite DB (`tinypilot.db`) should be the default place for new
settings, whereas the YAML file (`settings.yml`) should only be used in
exceptional cases, and obviously for legacy reasons.
- We favor having a single storage solution over a dual approach, as
that simplifies future decision-making, and eliminates potential
migration burden between the two storage formats.
- The SQlite DB doesn’t easily allow programmatic access right now (in
contrast to the YAML file), so we [created this ticket for removing this
hurdle](#1690).
- We [will document the parameters in `settings.yml`
separately](#1573).

This PR adds a comment in `settings.py`, marking its usage as
quasi-deprecated. It also adds some historical context, references the
SQlite DB as default storage, and lists sample exceptional cases.

I realized that we probably don’t need to add any documentation to the
SQlite DB, since there is not really an alternative to it any longer. So
by deprecating `settings.yml`, the SQlite DB is the only storage option
anyway, and there shouldn’t be any ambiguity left. Contrary to [my
initial
proposal](#1524 (comment)),
I also don’t think we should move `settings.py` up to the `app` package,
as that might mistakenly promote its usage.
<a data-ca-tag
href="https://codeapprove.com/pr/tiny-pilot/tinypilot/1692"><img
src="https://codeapprove.com/external/github-tag-allbg.png" alt="Review
on CodeApprove" /></a>

---------

Co-authored-by: Jan Heuermann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants