Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove "async" from ImageCaching #386

Merged
merged 4 commits into from
Sep 12, 2024
Merged

Conversation

pinarol
Copy link
Contributor

@pinarol pinarol commented Sep 6, 2024

Solves a problem where a UI glitch occurs because of the async access to the cached image. The problem was first reported here: wordpress-mobile/WordPress-iOS#23534 (comment)

Check out the small "Me" icon:

Sep-06-2024.11-49-00.mp4

Description

Some History

We updated ImageCaching to comply with the new concurrency model with this PR: #252 . As described there, we discovered that making protocol functions async doesn’t dictate it on the implementations. Types conforming to ImageCaching could still have their non-async setEntry(…), getEntry(..) implementations. So keeping the protocol functions async was the more "inclusive" way.

The problem

But here’s the downside, “Gravatar.ImageCache.shared” is public, so one can just directly get the cached image from there. However, since these methods are async, the system can suspend there to do sth else. So the UI change can’t happen immediately in a synchronous way and can cause UI glitches.

Task {
    if let cachedEntry = await Gravatar.ImageCache.shared.getEntry(with: url.absoluteString) {
        switch cachedEntry {
        case .ready(let image):
            await MainActor.run {
                completion(image) // here the image is updated in the UI
            }
        default:
            break
     }
 }

One should be able to just:

if let cachedEntry = Gravatar.ImageCache.shared.getEntry(with: url.absoluteString) {
      switch cachedEntry {
      case .ready(let image):
          completion(image) // the image is updated in the UI, the whole operation is done synchronously.
      default:
          break
      }
}

Is this critical/urgent?

Nope. WordPress injects its own cache to the Gravatar SDK, so they can access their cache synchronously inside the repo. This is more of a problem for cases where they rely on the SDK's cache.

Testing Steps

CI should be sufficient but let's also smoke test the UIKit demo app.

@pinarol pinarol self-assigned this Sep 6, 2024
@pinarol pinarol requested review from etoledom and kean September 6, 2024 12:48
@wpmobilebot
Copy link

wpmobilebot commented Sep 6, 2024

Gravatar SwiftUI Prototype Build📲 You can test the changes from this Pull Request in Gravatar SwiftUI Prototype Build by scanning the QR code below to install the corresponding build.
App NameGravatar SwiftUI Prototype Build Gravatar SwiftUI Prototype Build
Build Number1216
Version1.0
Bundle IDcom.automattic.gravatar-sdk-demo-swiftui.prototype-build
Commit564c1ac
App Center BuildGravatar SDK Demo - SwiftUI #104
Automatticians: You can use our internal self-serve MC tool to give yourself access to App Center if needed.

@wpmobilebot
Copy link

wpmobilebot commented Sep 6, 2024

Gravatar UIKit Prototype Build📲 You can test the changes from this Pull Request in Gravatar UIKit Prototype Build by scanning the QR code below to install the corresponding build.
App NameGravatar UIKit Prototype Build Gravatar UIKit Prototype Build
Build Number1216
Version1.0
Bundle IDcom.automattic.gravatar-sdk-demo-uikit.prototype-build
Commit564c1ac
App Center BuildGravatar SDK Demo - UIKit #105
Automatticians: You can use our internal self-serve MC tool to give yourself access to App Center if needed.

@@ -9,7 +9,7 @@ public protocol ImageCaching: Sendable {
/// - Parameters:
/// - image: The cache entry to set.
/// - key: The entry's key, used to be found via `.getEntry(key:)`.
func setEntry(_ entry: CacheEntry?, for key: String) async
func setEntry(_ entry: CacheEntry?, for key: String)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I am not sure about removing async from setEntry(...). I can't think of a similar problem happening related to this. Any thoughts?

Copy link
Contributor

@kean kean Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be synchronous or it ends up with race conditions due to reentrancy. This seems like a good post on it, but I only briefly looked into it. You generally want at little await calls as possible, especially inside actors (ImageDownloadService should really be one or be synchronized on one).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That post explains the solution we already applied here. We too have the .inProgress(...) state and stuff.

What else could we be causing by making setEntry async? Any examples?

@pinarol pinarol marked this pull request as ready for review September 6, 2024 13:05
switch entry {
case .inProgress(let task):
if !task.isCancelled {
task.cancel()
await imageCache.setEntry(nil, for: url.absoluteString)
imageCache.setEntry(nil, for: url.absoluteString)
Copy link
Contributor

@kean kean Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest addressing it separately, but I believe there is a race condition in this code. I would suggest synchronizing most of the Gravatar types on a custom global actor. It will ensure that this code runs in the background (good) and is synchronized with no reentrancy (great).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is a race condition in this code.

Like what?

Copy link
Contributor

@kean kean Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an async function, fetchImage runs on arbitrary threads, even if invoked from the main thread. The cancelTask runs in parallel with it.

There are multiple potential race conditions:

// you start a new request
// func cachedImage(for url: URL) is on line 37 and it
// finds an `.inProgress` entry
guard let entry = await imageCache.getEntry(with: url.absoluteString)

// in parallel, cancelTask(for url: URL) is on line 81 and cancels the
// task cachedImage assumes is in progress
task.cancel()

It's hard to reason about it because it's a race condition as any line can be executed in any order in parallel to each other, which can lead to unpredictable results.

While both ImageDownloadService and ImageCache are Sendable and can be passed between threads, the code is not safe from race conditions. I would suggest synchronizing the entire system that runs in the background on an actor.


The cancelTask may also be not the right API because it cancels a download even if it was started from two places. So, as a consumer, you are better off never using it unless you are sure you start a download from only one image view.

Copy link
Contributor Author

@pinarol pinarol Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the results would be "unpredictable" in the sense that it would be in a real race condition though. If you call these 2 methods in a parallel manner, one of them will get to access the cache first it's true, and we won't know which one. But it doesn't mean things will go wrong. For example, both scenarios are fine:

  • if cancelTask() gets to access to the cache when there's already an inProgress task then it'll cancel the task and set the value to "nil". In this case fetchImage() will get a URLSession error "cancelled", it will also set the value to "nil". Which is fine.

  • If the inProgress task of fetchImage() gets to finish first, it'll write the image to the cache, and there will be nothing left to cancel for the cancelTask().

No race condition here. This is just 2 tasks running in parallel. If the caller party wants to really make sure one executes before the other, they should just call them in a serial way.

The important bit is, each method should be able to access to the "correct" state of the cache. We ensure it by using a thread safe cache here (NSCache) so all the accesses to the cache are "synchronized".

If anything, with this change we are making the code more prone to race conditions. Because now if 3rd parties want to inject their own cache to the SDK, they are fully responsible for synchronizing the accesses to it. If they don't use any thread safe cache implementation like NSCache, then there will indeed be race conditions.

The cancelTask may also be not the right API because it cancels a download even if it was started from two places. So, as a consumer, you are better off never using it unless you are sure you start a download from only one image view.

Good point. I was thinking about introducing the ability of using a separate cache key than the URL itself. It could help with creating different tasks for the same URL if you need to. We need to discuss this one further inside the team.

Copy link
Contributor

@kean kean Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the caller party wants to really make sure one executes before the other, they should just call them in a serial way.

There is no way to do that since fetchImage is an async function and Swift schedules it to be executed on arbitrary threads, potentially in parallel with each other.

Here's another simple example, even without cancellation:

  • view A calls fetchImage(with:) from the main thread
  • view B calls fetchImage(with:) form the main thread
  • both async functions are scheduled by Swift Concurrency to run on two separate threads and they end up running in parallel
  • A runs line 42, finds that imageCache.getEntry(with: url.absoluteString) returns nil and starts a new task
  • B runs line 42 and also does the same
  • A runs line 53 setEntry(.inProgress(task), for: key) and sets its task in the "cache"
  • B also reaches line 53 and overwrites it with its own cache

As a result, two tasks are running instead of one and there is no way to cancel the first one.

they are fully responsible for synchronizing accesses to it.

That was also the case with the previous design where the cache methods were async. You could fairly easily write a non thread-safe cache with it. With non-async methods, ImageDownloader now has a fighting chance to synchronize access to its internal state (.inProgress) using an actor. It was impossible using async.

ImageDownloader does need to synchronize code that accessed its internal state (.inProgress). It'd be easier to do if it was managed internally and not expose to the external cache. fetchImage and cancelImage both need to be isolated on an actor or any instructions in them could run in parallel with each other.

I was thinking about introducing the ability of using a separate cache key than the URL itself. It could help with creating different tasks for the same URL if you need to.

That's not exactly what I was alluding to. A better option would to either remove .inProgress entirely or implement automatic coalescing.

Btw, it's be cool to support cancellation via tasks:

let task = Task {
    let image = ImageDownloader.shared.fetchImage(with: url)
}

// At a later time
task.cancel()

I think the bigger issue is that the system doesn't automatically refresh avatars. I would suggest prioritizing it to ensure that if I change avatar on one device, it will eventually be updated on the other device without having to restart the app.

Copy link
Contributor

@kean kean Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there's no mechanism on the server side

It's a bit of a different topic, but is kind of related. Gravatar seems to serve images with cache control headers to enable validation:

Cache-Control: max-age=300
Expires: Tue, 10 Sep 2024 19:39:07 GMT
Last-Modified: Mon, 10 Oct 2022 21:24:25 GMT

URLSession and URLCache handle cache validation automatically, but the framework needs to send the request every time the view is displayed. If it's on disk cache and is fresh, there won't be any requests. If it's stale, it'll send a quick request and get 304 Not Modified, so no data is re-downloaded unless the avatar actually changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can run async functions in parallel using async let.

Or if you are simply invoke fetchImage from two different async contexts, they'll run in parallel.

Yes of course, there was no doubt about that... My objection was to "There is no way to do that". "That" being "running 2 async functions in a serial way if desired".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URLSession and URLCache handle cache validation automatically, but the framework needs to send the request every time the view is displayed

Gotcha. Yeah I guess we can just allow other parties to by-pass the ImageCaching implementation and rely on URLCache also configure the cache policy as they want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My objection was to

Ah, got you. Yeah I was a bit confused. It's a bit impractical to do it this way though. In reality, the scenario will be something like you open an iPad app with three columns where the same avatar is displayed in two or three different places, they all create separate async contexts/functions, and this code will likely run in parallel since async functions all execute concurrently on a default set of threads.

ImageCaching implementation

You do need a memory cache, but yeah, it's a bit tricky to get right. I think a good default would be to synchronously display an image from a memory cache, even if it is stale, and automatically refresh it in the background.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created 2 different issues to track these topics. We can continue discussing these issues there and unblock this PR I think.

#397
#398

@@ -79,12 +79,12 @@ public struct ImageDownloadService: ImageDownloader, Sendable {
}

public func cancelTask(for url: URL) async {
if let entry = await imageCache.getEntry(with: url.absoluteString) {
if let entry = imageCache.getEntry(with: url.absoluteString) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) cancelTask(for url: URL) async no longer needs to be async. It's generally best not to expose async public APIs unless there is a clear reason to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. 👍

@@ -9,7 +9,7 @@ public protocol ImageCaching: Sendable {
/// - Parameters:
/// - image: The cache entry to set.
/// - key: The entry's key, used to be found via `.getEntry(key:)`.
func setEntry(_ entry: CacheEntry?, for key: String) async
func setEntry(_ entry: CacheEntry?, for key: String)
Copy link
Contributor

@kean kean Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be synchronous or it ends up with race conditions due to reentrancy. This seems like a good post on it, but I only briefly looked into it. You generally want at little await calls as possible, especially inside actors (ImageDownloadService should really be one or be synchronized on one).

@kean kean self-requested a review September 11, 2024 13:49
Copy link
Contributor

@kean kean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This individual change looks good, and is a great start!

Copy link
Contributor

@etoledom etoledom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been following the conversations, thank you @kean for the input! 🙏

Let's move forward with me and continue improving 👍

@pinarol pinarol merged commit c8708e7 into trunk Sep 12, 2024
9 checks passed
@pinarol pinarol deleted the wppinar/remove-async-from-cache branch September 12, 2024 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants