Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB caches #3320

Merged
merged 55 commits into from
Nov 4, 2024
Merged

DB caches #3320

merged 55 commits into from
Nov 4, 2024

Conversation

aatkin
Copy link
Collaborator

@aatkin aatkin commented Aug 16, 2024

#2783

This change implements low-level caching for db entities. Motivation is greatly improved performance, especially for db entities that are often queried but rarely changed, due to the overhead in fetching data and serializing it. Most db entities have small memory footprint as well, so they can be cached entirely.

Most cache needs are very basic, but having a standard way of accessing the cache is probably beneficial, for example making sure that db entity cache is ready for use without explicit instructions in every db namespace. With this change nearly all db namespaces implement a common refreshable cache that uses clojure.core.cache.wrapped under hood. Interface is mostly similar, implementation takes care that cache is always up-to-date when accessed.

Renamed references to rems.db.* and rems.service.* so that they are always fully qualified. Many places used ambiguous naming when referring to either db or service functions.

Follow up

Possible follow ups

  • add db schemas where none exist (e.g. catalogue)
  • catalogue api ; only admins get enabled and disabled items
    • is this filtering still needed?
  • detect potential issues with caches (edit: transaction tests now implemented)
    • while all tests should pass, there could still be hidden bugs
  • analyze absolute performance
    • e.g. new events per second, created applications per second ...
    • detect bottlenecks and possibilities for improvement
  • write-up ADR on how it differs from pre-cache implementation
  • rename users db table into user attributes, and remove users db entity so that user service is the logical source for "full user"?
  • remove deprecated resource licenses db table
  • full-stack test that verifies cache works properly (e.g. 2 requests where one resets, one reloads, what is the result)

Other noteworthy changes

raw db access

  • should now be limited to db namespaces, with some exceptions. raw db querying is fine (in theory), but updates should be strictly through db namespace
  • db namespaces should implement mostly similar "copy-paste" db loading and parse functions

rems.db.catalogue / rems.service.catalogue

  • quite big changes due to db select that implements various query params, implemented "stupider" db select that takes in everything and formats before entering cache
  • service implements query filter and join that should be 100% compatible with existing API
  • added all various edit functions to db so that they update cache

rems.db.applications / rems.service.application

  • removed injections cache, since all db entities now implement their own cache. injections are still in place, not necessarily because they are needed but there is a lot of code that depends on those injections existing.
  • removed periodic applications cache reloader since injections cache is no more
  • created application service layer for serving API functions
  • removed rems.service.todos and moved all related functionality to application service

rems.common.dependency

  • new namespace that wraps com.stuartsierra.dependency (used in e.g. mondo, caches, dependencies)

rems.db.attachments / rems.service.attachment

  • relocated license attachment functions here (from license db/service namespace)
  • separation of concerns: moved several checks and "fix filename" from db level to service level

rems.kaocha

  • relocated namespace that implements kaocha test runner plugins
  • implemented second plugin rems.kaocha/cache-statistics-plugin that prints table of cache CRUD statistics after successful test run. usable from CLI like lein kaocha integration --plugin rems.kaocha/cache-statistics-plugin
  • added cache statistics plugin to both integration and browser test suites in CircleCI for tracking potential issues in cache usage

example table data from real test run

:id % :get :upsert :evict :reload :reset
:rems.db.user-settings/user-settings-cache 100.00 11676 13 1 1 0
> rems.test-browser/test-processed-applications-and-paging 84.77 9909 1 0 0 0
> rems.test-browser/test-applicant-member-invite-action 11.21 1309 2 0 0 0
> rems.test-browser/test-handling 2.02 236 0 0 0 0
> rems.test-browser/test-edit-catalogue-item 1.31 146 6 0 1 0
> rems.test-browser/test-workflow-create-edit 0.43 48 2 0 0 0
:rems.db.users/user-cache 100.00 11622 58 0 1 0
> rems.test-browser/test-processed-applications-and-paging 84.05 9813 5 0 0 0
> rems.test-browser/test-applicant-member-invite-action 11.50 1330 13 0 0 0
> rems.test-browser/test-handling 2.15 243 8 0 0 0
> rems.test-browser/test-edit-catalogue-item 1.61 167 20 0 1 0
> rems.test-browser/test-workflow-create-edit 0.36 41 1 0 0 0

Implemented DB caches

dependent cache = actual cache but uses cache DAG to reset when main cache updates
for time being = actual cache implemented but not sure if want to keep):

  • rems.db.attachments
    • rems.db.attachments/by-application-id dependent cache
    • rems.db.attachments/license-attachments-cache (for time being)
  • rems.db.blacklist
  • rems.db.catalogue
    • rems.db.catalogue/catalogue-item-localizations-cache (for time being)
  • rems.db.form
  • rems.db.licenses
    • rems.db.licenses/license-localizations-cache (for time being)
  • rems.db.organization
  • rems.db.resource
  • rems.db.roles
    • rems.db.roles/users-by-role dependent cache
  • rems.db.user-mappings
    • rems.db.user-mappings/by-extidattribute dependent cache
    • rems.db.user-mappings/by-extidvalue dependent cache
  • rems.db.user-settings (for time being)
  • rems.db.users
  • rems.db.workflow

Not yet implemented:

  • rems.db.entitlements
  • rems.db.invitation

Existing cache implementations:

  • rems.db.api-key/api-key-memo
  • rems.db.events/event-cache

Checklist for author

Remove items that aren't applicable, check items that are done.

Reviewability

  • Link to issue

Backwards compatibility

  • API is backwards compatible or completely new

Documentation

  • Update changelog if necessary
  • API is documented and shows up in Swagger UI
  • Update docs/ (if applicable)
  • Update manual/ (if applicable)
  • ADR for major architectural decisions or experiments

Testing

  • Complex logic is unit tested
  • Valuable features are integration / browser / acceptance tested automatically

Follow-up

  • New tasks are created for pending or remaining tasks

@aatkin aatkin changed the title Users cache 3 DB caches 3 Aug 16, 2024
@aatkin aatkin changed the base branch from cache-optimizations to users-cache-1 August 16, 2024 14:07
@aatkin aatkin force-pushed the users-cache-2 branch 3 times, most recently from 2c116a7 to 75b56ad Compare August 21, 2024 17:26
@aatkin aatkin changed the base branch from users-cache-1 to cache-optimizations August 21, 2024 17:27
@aatkin aatkin changed the title DB caches 3 Users DB caches Aug 21, 2024
@aatkin aatkin changed the title Users DB caches DB caches, part 2 Aug 21, 2024
@aatkin aatkin force-pushed the users-cache-2 branch 2 times, most recently from e8c679a to f5436bb Compare August 27, 2024 14:55
@aatkin aatkin changed the base branch from cache-optimizations to master August 27, 2024 14:57
@aatkin aatkin mentioned this pull request Aug 27, 2024
18 tasks
@aatkin aatkin changed the title DB caches, part 2 DB caches Aug 27, 2024
@aatkin aatkin force-pushed the users-cache-2 branch 3 times, most recently from 37a972a to c3d1689 Compare September 13, 2024 11:04
Dependent cache reset should happen as soon as the cache is updated, but cache update algorithm is not strictly dependent on *when* it happens. add-watch seems like a convenient utility here.
@@ -104,51 +104,70 @@
(not (:expired item))
(not (:archived item))))

(defn- plus-others [item-count]
(when (> item-count 5)
[:li (str "... plus " (- item-count 5) " more")]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Localization?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this later. Let's get it merged!

Comment on lines +112 to +118
(getx {1 {:resource-id 1}
2 {:resource-id 2}
3 {:resource-id 3 :formid 2}
4 {:resource-id 4 :wfid 2}
5 {:resource-id 5 :formid 2 :wfid 2}
6 {:resource-id 5 :formid nil}
7 {:resource-id 6 :enabled false}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe sometime later we can get to :resource/id, :form/id and :workflow/id for all these.

Comment on lines 33 to 40
(testing "user has different userid in userattrs"
;; NB: raw db call due testing backwards compatible behavior
(db/add-user! {:user "different-userid" :userattrs (json/generate-string {:userid "bad"})})
(cache/reset! rems.db.users/user-cache)
(is (= {:userid "bad"
:name nil
:email nil}
(rems.db.users/get-user "different-userid"))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should get rid of this behavior at some point. We could perhaps choose to have the master data in the blob, and the other column (e.g. user id) just be there for indexing / searching, but of secondary importance. Or just have the blob if it works fast enough (index and constraint functionally into blob)

:each
reset-caches-fixture
rollback-db-fixture)
(def ^:private owner "owner")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could just inline this.

Comment on lines +132 to +138
(let [response (archive-catalogue-item! false)]
(is (= {:success false
:errors [{:type :t.administration.errors/dependencies-archived
:forms [{:form/id form-id
:form/external-title {:en "form" :fi "form" :sv "form"}
:form/internal-name "catalogue-item-enabled-archived-test-form"}]}]}
response)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the assertion was simpler before so that the details of the response don't need to be tested here.

Comment on lines +211 to +213
(is (= []
(mapv :id (get-catalogue-items))
(mapv :id (get-catalogue-items {:archived false}))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe split into two asserts was better on the error diff?

@@ -255,7 +256,7 @@
api-key user-id)
(is (= {:success false
:errors [{:type "t.administration.errors/in-use-by"
:catalogue-items [{:id cat-id :localizations {}}]}]}
:catalogue-items [{:catalogue-item/id cat-id :localizations {}}]}]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a change in the API? Perhaps worth a changelog entry.

Also remove tracking reset statistics, because they are not so interesting anymore.
Performance difference here is not very meaningful anymore, since dependency synchronization will always dominate overall execution time.
Cache dependencies can update at any given time. To ensure that caches work correctly, transactions test tries to run read/write operations concurrently, as fast as possible. Most of the time cache read/write operates in correct order, but certain race conditions existed (or may still exist). This change attempts to rectify those.

Dependencies are loaded in separate threads, handled by a separate thread pool. Two count down latches are used: one to signal main loader that all sub-dependencies are ready, and one to indicate that sub-dependency thread can release cache locks. Each latch is awaited with timeout, so at maximum they can cause thread locking for duration of the timeout.
aatkin and others added 9 commits October 10, 2024 14:46
When an exception is thrown, for example in middleware, we didn't log
anything in some cases. This makes it difficult to know in test what
happened. Let's make sure we log the error message also when API-key
is used (tests or external calls).
Re-frame and React seem to take a bit of time to update so we can't
immediately check the results. We can sleep for a bit.

Also let's switch the expected values to the right place.
- Try to simplify the locking strategy to make the code simpler and
locking more safe. We want to allow a cache read to happen to stale
data so we don't block reads.
- Add also a simple performance test with synthetic scenario.
Copy link
Collaborator

@Macroz Macroz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to merge. Only localization issue really needs to be fixed before release. Others can be worked in the follow-up issues.

@Macroz Macroz merged commit 241d84b into master Nov 4, 2024
7 checks passed
@Macroz Macroz deleted the users-cache-2 branch November 4, 2024 08:28
@Macroz Macroz mentioned this pull request Nov 4, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants