-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: Add media retention policy to EventCacheStore #3918
base: main
Are you sure you want to change the base?
Conversation
1ff7a2a
to
b3b61fe
Compare
Sounds awesome but it's a biggie so I think it's safer to pass it over to somebody more familiar with the SDK, congrats Benji! 😁 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3918 +/- ##
==========================================
+ Coverage 84.18% 84.33% +0.15%
==========================================
Files 267 268 +1
Lines 28036 28554 +518
==========================================
+ Hits 23601 24082 +481
- Misses 4435 4472 +37 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Kévin Commaille <[email protected]>
84d83d1
to
02a3630
Compare
Rebased on main to solve a conflict. |
Hi, thanks for the PR, since this is a large commit it might take a bit of time to get to review it. I'll try my best. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main feedback is that I wouldn't want to have external users have to deal with a EventCacheStoreWrapper
, but rather with the EventCacheStore
; so EventCacheStore
(the trait) would likely need to be renamed into something else, maybe EventCacheStoreBackend
or EventCacheStoreImpl
, or something else that makes this clear enough?
I haven't looked at the code in details, so forwarding review to somebody else.
I know this is a big commit, so for reviewers I am going to reiterate what I said in the Matrix room to help reviewing part by part: I would recommend starting in Then in any order:
|
Before reviewing this PR, I'm wondering: don't we have an MSC about media retention policy? I see there is MSC1763. I don't know the state of this MSC, but I think we should link both (a media is an event after all). I don't know if the What do you think about this? |
This is on purpose a device-specific setting. The sole goal here is for the client or the user (if the client offers a setting for it) to be allowed to limit the size of the cache on the device. This kind of setting can usually change from one device to the other, e.g. the limitations would be more strict on an phone where disk space is limited than on a computer with a lot of free space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the general idea and the MediaPolicy
type. I'm suggesting to rewrite the history to make it clearer. I'm also suggesting to introduce a new MediaService
API instead of adding a new event cache indirection with EventCacheStoreWrapper
. What do you think?
Sorry for the late review.
@@ -93,7 +92,7 @@ pub struct BaseClient { | |||
/// Database | |||
pub(crate) store: Store, | |||
/// The store used by the event cache. | |||
event_cache_store: Arc<DynEventCacheStore>, | |||
event_cache_store: EventCacheStoreWrapper, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now an EventCacheStoreLock
:
matrix-rust-sdk/crates/matrix-sdk-base/src/event_cache/store/mod.rs
Lines 42 to 52 in 4724648
/// The high-level public type to represent an `EventCacheStore` lock. | |
#[derive(Clone)] | |
pub struct EventCacheStoreLock { | |
/// The inner cross process lock that is used to lock the `EventCacheStore`. | |
cross_process_lock: CrossProcessStoreLock<LockableEventCacheStore>, | |
/// The store itself. | |
/// | |
/// That's the only place where the store exists. | |
store: Arc<DynEventCacheStore>, | |
} |
which is a cross-process lock.
/// The retention policy for media content used by the `EventCacheStore`. | ||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] | ||
pub struct MediaRetentionPolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract the creation of this type inside its own commit please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this type and its implementation is quite long. It must be inside its own module.
pub max_cache_size: Option<usize>, | ||
/// The maximum authorized size of a single media content, in bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub max_cache_size: Option<usize>, | |
/// The maximum authorized size of a single media content, in bytes. | |
pub max_cache_size: Option<usize>, | |
/// The maximum authorized size of a single media content, in bytes. |
pub max_file_size: Option<usize>, | ||
/// The duration after which unaccessed media content is considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub max_file_size: Option<usize>, | |
/// The duration after which unaccessed media content is considered | |
pub max_file_size: Option<usize>, | |
/// The duration after which unaccessed media content is considered |
/// If this is set, media content whose last access is older than this | ||
/// duration will be removed from the media cache during a cleanup. | ||
/// | ||
/// Defaults to 30 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 60 days is theoretically a better default for this value.
|
||
/// The retention policy for media content used by the `EventCacheStore`. | ||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] | ||
pub struct MediaRetentionPolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add unit tests for this type please?
@@ -29,43 +29,81 @@ pub trait EventCacheStore: AsyncTraitDeps { | |||
/// The error type used by this event cache store. | |||
type Error: fmt::Debug + Into<EventCacheStoreError>; | |||
|
|||
/// Add a media file's content in the media store. | |||
/// The retention policy set to cleanup the media cache. | |||
async fn media_retention_policy(&self) -> Result<Option<MediaRetentionPolicy>, Self::Error>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addition can go in a second commit:
- First commit: add
MediaRetentionPolicy
- Second commit: use
MediaRetentionPolicy
along with tests.
/// | ||
/// [`EventCacheStore`]: super::EventCacheStore | ||
#[derive(Debug, Clone)] | ||
pub struct EventCacheStoreWrapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why you've chosen this design.
I'm not comfortable with this though for several reasons, the major one being it introduces another type indirection for the EventCacheStore
itself. To reach the EventCacheStore
, the entry point is EventCacheStoreWrapper
, then DynEventCacheStore
, then EventCacheStore
. If we include the EventCacheStoreLock
, it introduces another level of indirection.
Note that this is only for media management for the moment.
Possible solutions:
-
Use standalone functions to be used by the multiple
EventCacheStore
implementations, liek we did for the cross-process lock andMemoryStore
: 94bd421 -
Use a standalone type to manage media, like
MediaService
, which takes aMediaRetentionPolicy
. TheEventCacheStore
would delegate the handling of media toMediaService
, such as:pub async fn add_media_content(&self, request: &MediaRequest, content: Vec<u8>) -> Result<()> { self.media_service.add_media_content(&self, request).await }
That way we can test MediaService
alone, it's a self-contained API, and it removes the need for another EventCacheWrapper
type + type indirection.
Bonus: MediaService
can be introduced inside its own commit, along with its tests. Another commit can use MediaService
inside the EventCacheStore
implementations.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the MediaService
access the data from the backend? Are we supposed to add a new trait for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know exactly. Maybe EventCacheStore
could be a super-trait of EventCacheStoreEvents
and EventCacheStoreMedias
, so that we can split it. First we can have:
trait EventCacheStore: EventCacheStoreMedia { … }
i.e. we keep all methods related to events in EventCacheStore
, but extract all methods related to media inside EventCacheStoreMedia
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but this is still confusing to me. From what you are saying, my understanding is that an EventCacheStore
(or EventCacheStoreMedia
with what you just said) implementation would look like:
impl EventCacheStore for MemoryStore {
async fn add_media_content(&self, request: &MediaRequest, content: Vec<u8>) -> Result<()> {
self.media_service.add_media_content(&self, request, content).await
}
// other methods…
}
Which means the MediaService
would look like this:
pub struct MediaService {
media_retention_policy: Arc<StdMutex<Option<MediaRetentionPolicy>>>,
media_cleanup_guard: Arc<AsyncMutex<()>>,
}
impl MediaService {
// other methods…
async fn add_media_content(&self, cache: &impl InnerMediaApi, request: &MediaRequest, content: Vec<u8>) -> Result<()> {
let media_retention_policy = self.media_retention_policy();
// maybe do some pre-checks with the policy…
cache.add_media_content_inner(media_retention_policy, request, content).await
}
}
We need a trait for a private API that is only accessed by the MediaService
, right? The MediaService
cannot call the methods from the EventCacheStore
trait because they are calling the MediaService
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I propose is:
trait EventCacheStoreMedia {
fn add_media_content(…);
fn remove_media_content(…);
}
trait EventCacheStore: EventCacheStoreMedia { … }
impl EventCacheStoreMedia for MediaService { … }
impl EventCacheStore for MemoryStore {
fn add_media_content(…) {
self.media_service.add_media_content(…);
}
…
}
something roughly like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what API does the MediaService
call in its implementation of add_media_content
? Since we are storing/removing data from the backend, the MediaService
needs a generic API to access it somehow.
@@ -187,27 +188,27 @@ impl SqliteAsyncConnExt for SqliteAsyncConn { | |||
} | |||
|
|||
pub(crate) trait SqliteTransactionExt { | |||
fn chunk_large_query_over<Query, Res>( | |||
fn chunk_large_query_over<K, Query, Res>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn chunk_large_query_over<K, Query, Res>( | |
fn chunk_large_query_over<Key, Query, Res>( |
} | ||
|
||
impl<'a> SqliteTransactionExt for Transaction<'a> { | ||
fn chunk_large_query_over<Query, Res>( | ||
fn chunk_large_query_over<K, Query, Res>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn chunk_large_query_over<K, Query, Res>( | |
fn chunk_large_query_over<Key, Query, Res>( |
Allows to prevent the media cache from growing indefinitely by setting different limitations like size or last access time. I tried to select sensible defaults for those values.
Currently cleanups must be triggered manually, a future PR will add the possibility to run a regular cleanup task so they happen automatically.