-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove "async" from ImageCaching #386
Conversation
📲 You can test the changes from this Pull Request in Gravatar SwiftUI Prototype Build by scanning the QR code below to install the corresponding build.
|
📲 You can test the changes from this Pull Request in Gravatar UIKit Prototype Build by scanning the QR code below to install the corresponding build.
|
@@ -9,7 +9,7 @@ public protocol ImageCaching: Sendable { | |||
/// - Parameters: | |||
/// - image: The cache entry to set. | |||
/// - key: The entry's key, used to be found via `.getEntry(key:)`. | |||
func setEntry(_ entry: CacheEntry?, for key: String) async | |||
func setEntry(_ entry: CacheEntry?, for key: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I am not sure about removing async
from setEntry(...)
. I can't think of a similar problem happening related to this. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to be synchronous or it ends up with race conditions due to reentrancy. This seems like a good post on it, but I only briefly looked into it. You generally want at little await
calls as possible, especially inside actors (ImageDownloadService
should really be one or be synchronized on one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That post explains the solution we already applied here. We too have the .inProgress(...)
state and stuff.
What else could we be causing by making setEntry async? Any examples?
switch entry { | ||
case .inProgress(let task): | ||
if !task.isCancelled { | ||
task.cancel() | ||
await imageCache.setEntry(nil, for: url.absoluteString) | ||
imageCache.setEntry(nil, for: url.absoluteString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest addressing it separately, but I believe there is a race condition in this code. I would suggest synchronizing most of the Gravatar types on a custom global actor. It will ensure that this code runs in the background (good) and is synchronized with no reentrancy (great).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is a race condition in this code.
Like what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an async function, fetchImage
runs on arbitrary threads, even if invoked from the main thread. The cancelTask
runs in parallel with it.
There are multiple potential race conditions:
// you start a new request
// func cachedImage(for url: URL) is on line 37 and it
// finds an `.inProgress` entry
guard let entry = await imageCache.getEntry(with: url.absoluteString)
// in parallel, cancelTask(for url: URL) is on line 81 and cancels the
// task cachedImage assumes is in progress
task.cancel()
It's hard to reason about it because it's a race condition as any line can be executed in any order in parallel to each other, which can lead to unpredictable results.
While both ImageDownloadService
and ImageCache
are Sendable
and can be passed between threads, the code is not safe from race conditions. I would suggest synchronizing the entire system that runs in the background on an actor.
The cancelTask
may also be not the right API because it cancels a download even if it was started from two places. So, as a consumer, you are better off never using it unless you are sure you start a download from only one image view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the results would be "unpredictable" in the sense that it would be in a real race condition though. If you call these 2 methods in a parallel manner, one of them will get to access the cache first it's true, and we won't know which one. But it doesn't mean things will go wrong. For example, both scenarios are fine:
-
if
cancelTask()
gets to access to the cache when there's already an inProgress task then it'll cancel the task and set the value to "nil". In this casefetchImage()
will get a URLSession error "cancelled", it will also set the value to "nil". Which is fine. -
If the inProgress task of
fetchImage()
gets to finish first, it'll write the image to the cache, and there will be nothing left to cancel for thecancelTask()
.
No race condition here. This is just 2 tasks running in parallel. If the caller party wants to really make sure one executes before the other, they should just call them in a serial way.
The important bit is, each method should be able to access to the "correct" state of the cache. We ensure it by using a thread safe cache here (NSCache) so all the accesses to the cache are "synchronized".
If anything, with this change we are making the code more prone to race conditions. Because now if 3rd parties want to inject their own cache to the SDK, they are fully responsible for synchronizing the accesses to it. If they don't use any thread safe cache implementation like NSCache, then there will indeed be race conditions.
The cancelTask may also be not the right API because it cancels a download even if it was started from two places. So, as a consumer, you are better off never using it unless you are sure you start a download from only one image view.
Good point. I was thinking about introducing the ability of using a separate cache key than the URL itself. It could help with creating different tasks for the same URL if you need to. We need to discuss this one further inside the team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the caller party wants to really make sure one executes before the other, they should just call them in a serial way.
There is no way to do that since fetchImage
is an async function and Swift schedules it to be executed on arbitrary threads, potentially in parallel with each other.
Here's another simple example, even without cancellation:
- view A calls
fetchImage(with:)
from the main thread - view B calls
fetchImage(with:)
form the main thread - both async functions are scheduled by Swift Concurrency to run on two separate threads and they end up running in parallel
- A runs line 42, finds that
imageCache.getEntry(with: url.absoluteString)
returns nil and starts a new task - B runs line 42 and also does the same
- A runs line 53
setEntry(.inProgress(task), for: key)
and sets its task in the "cache" - B also reaches line 53 and overwrites it with its own cache
As a result, two tasks are running instead of one and there is no way to cancel the first one.
they are fully responsible for synchronizing accesses to it.
That was also the case with the previous design where the cache methods were async
. You could fairly easily write a non thread-safe cache with it. With non-async methods, ImageDownloader
now has a fighting chance to synchronize access to its internal state (.inProgress
) using an actor. It was impossible using async
.
ImageDownloader
does need to synchronize code that accessed its internal state (.inProgress
). It'd be easier to do if it was managed internally and not expose to the external cache. fetchImage
and cancelImage
both need to be isolated on an actor or any instructions in them could run in parallel with each other.
I was thinking about introducing the ability of using a separate cache key than the URL itself. It could help with creating different tasks for the same URL if you need to.
That's not exactly what I was alluding to. A better option would to either remove .inProgress
entirely or implement automatic coalescing.
Btw, it's be cool to support cancellation via tasks:
let task = Task {
let image = ImageDownloader.shared.fetchImage(with: url)
}
// At a later time
task.cancel()
I think the bigger issue is that the system doesn't automatically refresh avatars. I would suggest prioritizing it to ensure that if I change avatar on one device, it will eventually be updated on the other device without having to restart the app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there's no mechanism on the server side
It's a bit of a different topic, but is kind of related. Gravatar seems to serve images with cache control headers to enable validation:
Cache-Control: max-age=300
Expires: Tue, 10 Sep 2024 19:39:07 GMT
Last-Modified: Mon, 10 Oct 2022 21:24:25 GMT
URLSession
and URLCache
handle cache validation automatically, but the framework needs to send the request every time the view is displayed. If it's on disk cache and is fresh, there won't be any requests. If it's stale, it'll send a quick request and get 304 Not Modified
, so no data is re-downloaded unless the avatar actually changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can run async functions in parallel using async let.
Or if you are simply invoke fetchImage from two different async contexts, they'll run in parallel.
Yes of course, there was no doubt about that... My objection was to "There is no way to do that". "That" being "running 2 async functions in a serial way if desired".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URLSession and URLCache handle cache validation automatically, but the framework needs to send the request every time the view is displayed
Gotcha. Yeah I guess we can just allow other parties to by-pass the ImageCaching
implementation and rely on URLCache also configure the cache policy as they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My objection was to
Ah, got you. Yeah I was a bit confused. It's a bit impractical to do it this way though. In reality, the scenario will be something like you open an iPad app with three columns where the same avatar is displayed in two or three different places, they all create separate async contexts/functions, and this code will likely run in parallel since async
functions all execute concurrently on a default set of threads.
ImageCaching implementation
You do need a memory cache, but yeah, it's a bit tricky to get right. I think a good default would be to synchronously display an image from a memory cache, even if it is stale, and automatically refresh it in the background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -79,12 +79,12 @@ public struct ImageDownloadService: ImageDownloader, Sendable { | |||
} | |||
|
|||
public func cancelTask(for url: URL) async { | |||
if let entry = await imageCache.getEntry(with: url.absoluteString) { | |||
if let entry = imageCache.getEntry(with: url.absoluteString) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) cancelTask(for url: URL) async
no longer needs to be async
. It's generally best not to expose async
public APIs unless there is a clear reason to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. 👍
@@ -9,7 +9,7 @@ public protocol ImageCaching: Sendable { | |||
/// - Parameters: | |||
/// - image: The cache entry to set. | |||
/// - key: The entry's key, used to be found via `.getEntry(key:)`. | |||
func setEntry(_ entry: CacheEntry?, for key: String) async | |||
func setEntry(_ entry: CacheEntry?, for key: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to be synchronous or it ends up with race conditions due to reentrancy. This seems like a good post on it, but I only briefly looked into it. You generally want at little await
calls as possible, especially inside actors (ImageDownloadService
should really be one or be synchronized on one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This individual change looks good, and is a great start!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been following the conversations, thank you @kean for the input! 🙏
Let's move forward with me and continue improving 👍
Solves a problem where a UI glitch occurs because of the async access to the cached image. The problem was first reported here: wordpress-mobile/WordPress-iOS#23534 (comment)
Check out the small "Me" icon:
Sep-06-2024.11-49-00.mp4
Description
Some History
We updated
ImageCaching
to comply with the new concurrency model with this PR: #252 . As described there, we discovered that making protocol functions async doesn’t dictate it on the implementations. Types conforming to ImageCaching could still have their non-asyncsetEntry(…)
,getEntry(..)
implementations. So keeping the protocol functions async was the more "inclusive" way.The problem
But here’s the downside, “Gravatar.ImageCache.shared” is public, so one can just directly get the cached image from there. However, since these methods are async, the system can suspend there to do sth else. So the UI change can’t happen immediately in a synchronous way and can cause UI glitches.
One should be able to just:
Is this critical/urgent?
Nope. WordPress injects its own cache to the Gravatar SDK, so they can access their cache synchronously inside the repo. This is more of a problem for cases where they rely on the SDK's cache.
Testing Steps
CI should be sufficient but let's also smoke test the UIKit demo app.