Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Argus] Improves 'HEAD /assets/{id}' requests latency by implementing caching QN requests #4834

Merged

Conversation

zeeshanakram3
Copy link
Contributor

addresses #4814

This PR implements max-age based cache strategy for StorageDataObject QN entity, whose corresponding data object asset is missing on the distributor-node & for which a QN request is need to be made.

So whenever a HEAD /assets/{id} request is made, it could be satisfied in the following ways:

  • If the object, for which a QN request is supposed to be made, does not exist in the Apollo's in-memory cache, the network call would be made to fetch the object and update the cache.
  • If object exists in the in-memory cache and is older than the CACHED_OBJECT_MAX_AGE value, a network call would still be made to fetch the object and update the cache.
  • If object exists in the in-memory cache and is NOT old then CACHED_OBJECT_MAX_AGE, the object would be served from the cache

Copy link
Member

@mnaamani mnaamani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good. I left just a clarification question.

I have to say I was surprised that there is no builtin cache expiry config, and I presume that is why you had to add way to track the age of cached object.

Lets make the max age configurable in config.yml (or do you think it makes more sense to hardcode it?) and bump the distributor node version.

return defaultDataIdFromObject(object)
typePolicies: {
ProcessorState: {
keyFields: [],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this tell apollo client explicitly NOT to cache processor state ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out. This is actually incorrect, basically, this is just the alternative & newer way (introduced in Apollo 3.0) to assign the cache IDs to objects. We used dataIdFromObject previously. I have fixed it

@@ -59,6 +63,8 @@ type CustomVariables<T> = Omit<T, keyof PaginationQueryVariables>
export class QueryNodeApi {
private apolloClient: ApolloClient<NormalizedCacheObject>
private logger: Logger
private CachedObjectsAge: Map<string, Date> = new Map()
private CACHED_OBJECT_MAX_AGE = 1000 * 60 // 1 min
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1min sounds good, but we probably want to make this a configurable in the config.yml file

): FragmentT | null {
// Get the fragment name, usually first element of the definitions array is the name of the top level fragment
const fragmentName = (fragment.definitions[0] as FragmentDefinitionNode).name.value
if (!fragmentName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to also check that the fragmentName === "id"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I checked that Id isn't necessarily same as the fragment name. id refers to the cached object ID (e.g. for the StorageDataObject entity type it can be StorageDataObject:1) however the fragment name refers to the specific fragment of the entity type that we want to query from the cache, e.g. DataObjectDetails

@mnaamani
Copy link
Member

I have to say I was surprised that there is no builtin cache expiry config, and I presume that is why you had to add way to track the age of cached object.

@zeeshanakram3 Found a third party cache implementation for apollo client, is it something we can use to make our code a bit cleaner? https://github.com/NerdWalletOSS/apollo-cache-policies

@zeeshanakram3
Copy link
Contributor Author

I have to say I was surprised that there is no builtin cache expiry config, and I presume that is why you had to add way to track the age of cached object.

@zeeshanakram3 Found a third party cache implementation for apollo client, is it something we can use to make our code a bit cleaner? https://github.com/NerdWalletOSS/apollo-cache-policies

Thanks for sharing, will test it.

@zeeshanakram3
Copy link
Contributor Author

zeeshanakram3 commented Sep 14, 2023

  1. I reimplemented & tested apollo-cache-policies npm package (using OpenTelemetry traces) and the TTL seems to be working as expected.
  2. Also made the TTL configurable using interval.queryNodeCacheTTL
  3. Bumped package version + added CHNAGELOG.md

@mnaamani mnaamani self-requested a review September 14, 2023 12:52
@mnaamani
Copy link
Member

mnaamani commented Sep 22, 2023

Works as expected.
To test I commented out the code that checks for file in local fs before trying query:

  // distributor-node/src/services/content/ContentService.ts
  public async objectStatus(
       objectId: string,
       qnFetchPolicy: QueryFetchPolicy = 'no-cache'
  ): Promise<ObjectStatus> {
    /*
    const pendingDownload = this.stateCache.getPendingDownload(objectId)

    if (!pendingDownload && this.exists(objectId)) {
      return { type: ObjectStatusType.Available, path: this.path(objectId) }
    }

    if (pendingDownload) {
      return { type: ObjectStatusType.PendingDownload, pendingDownload }
    }
    */
   const objectInfo = await this.networking.dataObjectInfo(objectId, qnFetchPolicy)
...

then I tested (crudely)

# first request ~ 400ms
time http HEAD http://localhost:3334/api/v1/assets/1
docker stop graphql-server
# confirm we get a cached response, ~200ms response
time http HEAD http://localhost:3334/api/v1/assets/1
.. try a few more times, until we get an error response because distributor tried to do a query to QN
# restart query-node
docker start query-node
time http HEAD http://localhost:3334/api/v1/assets/1
# successful request as expected.

Tried with different TTL values also.

Copy link
Member

@mnaamani mnaamani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.
This will also benefit GET requests.
Not clear how much impact this will have but if there is any kind of repeated requests to assets the node doesn't have this fix will cover it.

@mnaamani mnaamani merged commit ba7218a into Joystream:master Sep 22, 2023
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants