Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow JITServer AOT cache to work without a shared class cache at the client #16721

Open
mpirvu opened this issue Feb 14, 2023 · 30 comments
Open
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project

Comments

@mpirvu
Copy link
Contributor

mpirvu commented Feb 14, 2023

Currently, the JITServer AOT cache feature requires the client JVM to use a shared class cache (SCC) with some empty space and write permissions. This is because the AOT code fetched from the server is first stored in the SCC and then loaded from there, relocated and stored in the code cache. This solution was chosen to simplify the AOT cache implementation by relying as much as possible on the existing SCC mechanism. However, this solution comes with some usability constraints when a SCC is embedded in a container.
Embedding a SCC in container (to speed up start-up of aplications) is typically done in layers, with one SCC layer per container image layer. Such SCC layers are typically trimmed to just the right size and marked read-only.
This issue proposes to change the JITServer AOT cache mechanism such that the AOT code sent by the server is deserialized and relocated in-place, without going through the SCC first.

@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Feb 14, 2023
@mpirvu
Copy link
Contributor Author

mpirvu commented Feb 14, 2023

Attn: @cjjdespres

@cjjdespres
Copy link
Contributor

First, I noticed in remoteCompilationEnd in JITClientCompilationThread.cpp that when a method is AOT-compiled at the server and sent to the client, it's always stored in the local SCC, and then an attempt to relocate and use it immediately is made if canRelocateMethod is true:

      bool canRelocateMethod = TR::CompilationInfo::canRelocateMethod(comp);

That function comes with the following warning that seems relevant:

   // Delay relocation if this is a deserialized AOT method using SVM received from the JITServer AOT cache.
   // Such methods cannot be immediately relocated in the current implementation. An immediate AOT+SVM load
   // uses the ID-symbol mapping created during compilation. This mapping is missing when the client receives
   // a serialized AOT method from the server, and trying to load the deserialized method immediately
   // triggers fatal assertions in SVM validation in certain cases. As a workaround, we delay the AOT load
   // until the next interpreted invocation of the method; see CompilationInfo::replenishInvocationCounter().
   //
   //TODO: Avoid the overhead of rescheduling this compilation request by handling the deserialized AOT load as if
   //      the method came from the local SCC, rather than as if it was freshly AOT-compiled at the JITServer.
   if (comp->isDeserializedAOTMethodUsingSVM())
      return false;

Something to keep in mind, though I can't believe it's that difficult to fix.


As for solving this issue, I think most of the relocation work happens in SymbolValidationManager.cpp, RelocationRecord.cpp, RelocationRuntime.cpp, and JITServerAOTDeserializer.cpp. Perhaps there is more, but all of these are involved and they rely heavily on there being a shared class cache available through a TR_J9VMBase pointer in the relocation runtime or the symbol validation manager. What suggests itself first, then, is to modify these components (with a new front-end, perhaps, or by adding an explicit reference to a shared cache to the relevant classes that we can set ourselves) so we can have them use a distinct, in-memory SCC if there isn't a suitable local one available. This could be an actual non-persistent SCC (that could be kept around between compilations) or something that acts like it - the deserializer and its records currently stick around for the length of a particular server-client connection, so maybe we could write an interface to it that makes it act like an SCC for the purposes of relocation?

That's the only way I can think of that would let us avoid modifying or duplicating the actual relocation logic that currently exists. There is quite a lot of it, and based on the description of the system in https://github.com/eclipse-openj9/openj9/tree/master/doc/compiler/aot I can't see why it wouldn't be possible.

@cjjdespres
Copy link
Contributor

To lay out the problem in more detail: the relocation runtime and SVM rely on the sharedCache() of the frontend passed in at the start of relocation for a couple of things:

  1. Offsets into the shared class cache (opaque uintptr_t values) that are in the relocation records of a method must be converted into concrete entities (class chains, ROM classes)
  2. Classes and class loaders must be resolved and validated, with lookupClassFromChainAndLoader() and classMatchesCachedVersion(), as well as through queries to the persistent class loader table accessed through the sharedCache()

The function of the JITServer AOT deserializer is to patch up the relocation records received from the server so that the SCC offsets they contain point to valid entities in the local SCC. It also stores class data in the local SCC, again so that the relocation runtime will function properly, and caches all records received from the server for the duration of the connection to that server.

At relocation time at the client, we should be able to detect (probably via an explicit option) that we should be ignoring any existing local SCC. If that is the case, we need to change how we currently handle SCC offsets in order to deal with problem (1) above:

  1. The deserializer should not store class data in the local SCC or query it for the offsets of stored entities. Due to the lack of legitimate SCC offsets, it will need to be able to obtain "fake" offsets to update the relocation records. My initial idea was to increment an offset counter and add an entry to a fake-offset-to-entity map that would be maintained by the deserializer, but @dsouzai mentioned that the fake offsets could simply be direct pointers to the relevant entities in the JVM itself.
  2. The relo runtime and SVM need to be able to convert our fake offsets back into legitimate entities during relocation. We could pass in a new frontend at the start of relocation with an overridden sharedCache() that implements the queries they need correctly - those queries would consult the deserializer's offset-to-pointer map (in my original approach) or do nothing (if we can refer to the entities directly).

We also need to implement the queries related to classes and class loaders. The deserializer does cache information received from the server related to class chains and class loaders, but the information that's stored in the class loader table by the TR_PersistentClassLoaderTable::associateClassLoaderWithClass function may also be necessary. (I'm not sure, but if it is, then that function might need to be changed so that it stores class loader information even if a suitable _sharedCache does not exist. Otherwise the relo records might not have the necessary information around at relocation time.).

I'm not sure what can be done to split this up into smaller pieces for implementation. I could add the frontend and have it return a sharedCache() that always fails to find cached data (and errors when anything is attempted to be stored in it), have the client use it when an appropriate option is set, then gradually fill in the interface. My worry is that relocations would mostly just fail until the entire interface is implemented, but I might be wrong about that.

I think that should be all that's needed for clients to relocate received AOT methods while bypassing the existing SCC. I believe @AlexeyKhrabrov has thought about this issue before, so I'm interested to know if there are other major issues that I don't know about in implementing this that I haven't considered. Thoughts from @mpirvu and @dsouzai would also be appreciated.

@AlexeyKhrabrov
Copy link
Contributor

I generally agree with the approach of implementing a SharedCache subclass to act as a "front-end" for the deserializer. I haven't looked into the relocation and SVM infrastructure in a while, but from what I remember this seems like a reasonable approach. It may or may not be easier to swap out the _sharedCache field in the front-end rather than implementing a new front-end subclass for this; exception safety is one of the things to consider.

You'll also need to decouple the TR_AOTHeader struct from the SCC.

I agree with Irwin's suggestion about simply storing "RAM" entity pointers in updated relocation/validation records since the AOT body will be thrown out anyway. The mapping of entity types would be:

  • class chain offset identifying loader -> J9ClassLoader *;
  • ROMClass offset -> J9Class *;
  • ROMMethod offset -> J9Method *;
  • class chain offset -> J9Class *;
  • well-known class chain offset -> nothing (already validated by the deserializer).

While we do lose some type information in this approach compared to the fake offset maps, if we can guarantee that "RAM" pointers always fit into e.g. 32 bits (not sure if that applies to J9ClassLoader *), you could store the entity type in the unused higher order bits for consistency checks/assertions.

Please ping me about significant changes to the deserializer as you work on the implementation. I've been basically rewriting large parts of it for my current work on early remote compilations (which I plan to contribute later, but won't be able to in the next few months at least), and I'd prefer to avoid having to rewrite much of it again in case our changes are too incompatible.

While you have the relocation part mostly figured out, what is the plan for AOT compilations on the server? The AOT codegen infrastructure also assumes that the client has a local SCC. You should be able to start with keeping this assumption while you work on the relocation part, but eventually you'll need to enable remote AOT compilations for clients without a local SCC.

@cjjdespres
Copy link
Contributor

Sure, I'll let you know if I make large changes to the deserializer.

I've been focusing on this initial problem for now, but fresh server AOT compilations would be good to support. I planned on testing my work with JITServer AOT caches that have already been populated (by a previous run or a loaded persistent cache). I'd have to look at the client-server messages in more detail to see what would be needed. I think local AOT compilations are disabled when the local SCC is read-only (and definitely when the local SCC doesn't exist) - is that right?

@AlexeyKhrabrov
Copy link
Contributor

I've been focusing on this initial problem for now, but fresh server AOT compilations would be good to support.

Not just good, but necessary for this feature to be practically useful (unless I'm missing something). Otherwise we're just moving the cache pre-population step to the server side instead of eliminating it (which is basically the main point of the server AOT cache).

@AlexeyKhrabrov
Copy link
Contributor

I planned on testing my work with JITServer AOT caches that have already been populated (by a previous run or a loaded persistent cache).

Using a pre-populated persistent AOT cache is a good idea, it's probably the easiest way to test things.

I'd have to look at the client-server messages in more detail to see what would be needed.

I think the main things to deal with are:

  • SCC offsets in AOT relocation metadata on the server side, which could be substituted with AOT cache IDs. This will also allow us to eventually get rid of the list of SerializedSCCOffsets in serialized methods.
  • Class chains are no longer SCC entities but rather "RAMClass chains" in this new design. See rememberClass() implementation on the server side. There is already code for fetching RAMClass chains from the client, but you'll probably need to modify quite a bit of that whole infrastructure.
  • Same for the "well-known classes" SCC entity.

I think local AOT compilations are disabled when the local SCC is read-only (and definitely when the local SCC doesn't exist) - is that right?

Sounds right, @dsouzai should be able to confirm.

@cjjdespres
Copy link
Contributor

@mpirvu mentioned using the current Liberty image and its read-only embedded SCC with the JITServer AOT cache in connection with this issue, but maybe he was imagining fresh server AOT compilations in the ultimate solution. I could imagine generating a server image with an embedded AOT cache in a similar way to the Liberty image, but that would certainly be harder and less generally useful than simply allowing fresh AOT compilations.

@mpirvu
Copy link
Contributor Author

mpirvu commented Jun 21, 2023

Re: fresh server AOT compilations
Starting with a brand new JITServer instance (with JITServer AOT cache feature enabled), it's important to allow the server to perform AOT compilations, serialize them and store them in its cache. At this point the client can receive a serialized version of the AOT body and everything should work well (I think).

@AlexeyKhrabrov
Copy link
Contributor

@mpirvu mentioned using the current Liberty image and its read-only embedded SCC with the JITServer AOT cache in connection with this issue, but maybe he was imagining fresh server AOT compilations in the ultimate solution.

Please correct me if I'm wrong, but I think it's still necessary in this specific use case. At least when there is no writable SCC layer on top of the read-only SCC, e.g. if the container doesn't have any writable storage (which may be a common setup).

The server AOT cache is only useful if it can store methods not present in the read-only local SCC. For the server to be able to AOT-compile these new methods, the clients need to store new ROMClasses and class chains in their local SCCs, which is impossible if they're read-only. A way around that would be to pre-populate the server AOT cache using a client with a writable SCC, but then the user may as well pre-populate an extra layer of the local SCC instead.

Otherwise an easier workaround would be to have a writable top SCC layer which would store additional ROMClasses (which consume memory regardless of being in the SCC or not) and class chains (which add little extra footprint), but not the AOT methods received from the server.

@mpirvu
Copy link
Contributor Author

mpirvu commented Jun 21, 2023

One complication could be the fact that the client already has a local SCC (marked read-only). So, some bits of information need to be extracted from this SCC and some other bits from the map/fake-scc that we are going to create.

How about this: create an in-memory SCC layer on top of the existing one for storing the deserialized AOT methods. By doing so we break the "read-only" specification, but I don't think that is worse than maintaining some other data structure. It's important for this new SCC layer to be in-memory (not backed by any file) because otherwise "read-only" means nothing.

@AlexeyKhrabrov
Copy link
Contributor

One complication could be the fact that the client already has a local SCC (marked read-only). So, some bits of information need to be extracted from this SCC and some other bits from the map/fake-scc that we are going to create.

If we do implement AOT deserialization without local SCC, the deserializer shouldn't need to access the local SCC at all, even if it's present. All the lookups and validations can be done using only class loaders (identified by name of 1st loaded class) and RAMClasses (looked up by class name).

How about this: create an in-memory SCC layer on top of the existing one for storing the deserialized AOT methods. By doing so we break the "read-only" specification, but I don't think that is worse than maintaining some other data structure. It's important for this new SCC layer to be in-memory (not backed by any file) because otherwise "read-only" means nothing.

Does the SCC support using an anonymous mapping instead of a file-backed one? Even if it doesn't, it might be easier to do that for the writable top SCC layer rather than implementing SCC-less remote AOT loads and compilations. At least if the main goal is to support containers without a writable SCC layer.

But I do believe that SCC-less remote AOT would be a cleaner design. I implemented it on top of SCC only because it seemed easier at the time. I don't think there would be any new data structures to maintain, but it'll probably still require a lot of code changes.

Another advantage would be performance. Current implementation performs most lookups twice (during deserialization and during load), but I don't know if this overhead is significant. There would also be a footprint reduction, but probably not that significant.

@AlexeyKhrabrov
Copy link
Contributor

The deserializer does cache information received from the server related to class chains and class loaders, but the information that's stored in the class loader table by the TR_PersistentClassLoaderTable::associateClassLoaderWithClass function may also be necessary. (I'm not sure, but if it is, then that function might need to be changed so that it stores class loader information even if a suitable _sharedCache does not exist. Otherwise the relo records might not have the necessary information around at relocation time.).

Yes, you'll need to modify the class loader table such that it only maintains the loader-name two-way mapping when there is no SCC.

@dsouzai
Copy link
Contributor

dsouzai commented Jun 22, 2023

EDIT: Leaving the text below as is for posterity, but it isn't a valid solution for bypassing the need to send the client offsets; see #16721 (comment).

One approach that could be taken with JITServer is leveraging the SVM rather than traditional AOT, especially in this context.

From what I understand, when a remote AOT compilation is performed on the server, anything to do with ROMClasses are dealt with using names. As such, the class chain is not a chain of offsets but rather a chain of names. When it's sent to the client, in order to use the existing relocation infrastructure, we have to translate the name into an offset.

This still requires some notion of a client SCC. Instead, the class chain validation for a remotely loaded AOT body should probably just use a different function than TR_J9SharedCache::validateClassChain since that involves using the SCC APIs. When we get the chain of names from the server, we instead construct a chain of J9ROMClass *s; validateClassChain should then just do a pointer comparison of the newly constructed chain and a candidate J9Class's chain of rom classes.

Next, almost all addresses should be materialized using the SVM, namely using the TR_SymbolFromManager relocation record. The SVM maintains a mapping between IDs and values. As part of validation, the SVM

  1. Materializes a value for an ID; if there already is a value for said ID it validates that the newly materialized value matches the existing one
  2. Validates the class chain if a new ID represents a J9Class

Once validation succeeds, it will have built up a map of IDs to symbols. It is trivial to now get the value associated with an ID. None of this now requires dealing with a local SCC.

This would require going through each of the relocation records and determining if it needs to be replaced with a TR_SymbolFromManager or not. Luckily, we don't need to go to all the various locations the records are generated to conditionally replace them with TR_SymbolFromManager under a JITServer remote AOT compilation; we can just use J9::AheadOfTimeCompile::interceptAOTRelocation to intercept the relevant relocation records and transmute them into a TR_SymbolFromManager.

I'm not necessarily suggesting that this approach be taken right now, but it should be considered when dealing with the final solution.

@cjjdespres
Copy link
Contributor

@dsouzai, just so I understand - how would that work with, say, the TR_ProfiledInlinedMethodRelocation? (I'm assuming that this type of record can be created at the server during AOT compilation).

There is some initialization of the relocation record here:

case TR_ProfiledInlinedMethodRelocation:

and then there's some here:

TR_RelocationRecordProfiledInlinedMethod::preparePrivateData(TR_RelocationRuntime *reloRuntime, TR_RelocationTarget *reloTarget)

Would the idea be that I could replace those reloRuntime->fej9()->sharedCache() accesses with SVM queries, if the SVM were active (and the method came from the JITServer AOT cache)? Those queries would then succeed because the deserializer would have already resolved the name-based class chains and handed that information to the SVM.

Or is that an example of record that could be intercepted?

@cjjdespres
Copy link
Contributor

cjjdespres commented Sep 6, 2023

Sorry, I forgot that the initializeCommonAOTRelocationHeader was in the compile-time part of relocation, so I suppose the first part of the code I linked to wouldn't really need to change for client-side relocation - we'd still just need to make sure that the place where an offset is used is noted, so we can fix it (or bypass it) later.

@dsouzai
Copy link
Contributor

dsouzai commented Sep 6, 2023

I had an offline chat with @cjjdespres and I don't think what I wrote in #16721 (comment) is valid for the purpose of this issue.

I originally thought that having the SVM would prevent the need for dealing with sending the client offsets because the SVM maintains a mapping of IDs to Values. However, that mapping has to be built somehow. If there was a way for the server to remember all the J9Method, J9Class, etc pointers from the first client that asked for a relocatable compile (that gets cached on the server) and then map it to the new client's set of J9Method, J9Class, etc pointers, then the server could just send the client a populated map that could be fed to the SVM such that the TR_SymbolFromManager record could be used during relocation.

However, until we figure out a way to do that (if it is even feasible to do so in a performant way), the server still needs to send offsets to the client which needs to then convert those offsets to J9ROMMethod, J9ROMClass, etc pointers. As such, there isn't any additional benefit of using the SVM (aside from the established ones such as having more performant code).

@cjjdespres
Copy link
Contributor

I've been dealing with the persistent class loader table and class loader deserialization portion of this, and was wondering about the proper handling of some of this data.

Right now we look up class loaders by the name of their first loaded class in the persistent class loader table. This retrieves the pointer to the loader and the local SCC class chain (array of local SCC offsets) for the class. That local class chain is cached (via its local SCC offset) and used as a fallback to try to look up a new loader when it's been marked as unloaded in the relevant deserializer map. The offset to this chain is also used to update the relocations records, of course, and then subsequently used during relocation to materialize the class loader.

It's that fallback that I'm not sure about, since we're not relying on the local SCC being present anymore. If the class loader has been unloaded, is there anything wrong with looking up the class loader by name again?

@mpirvu
Copy link
Contributor Author

mpirvu commented Sep 26, 2023

is there anything wrong with looking up the class loader by name again?

I don't see a problem with that. Maybe @AlexeyKhrabrov can offer his opinion as well.

@AlexeyKhrabrov
Copy link
Contributor

It's that fallback that I'm not sure about, since we're not relying on the local SCC being present anymore. If the class loader has been unloaded, is there anything wrong with looking up the class loader by name again?

There is nothing wrong with looking it up by name, as long as you have the name. Current implementation doesn't keep the identifying name around once the corresponding record is cached, and relies on the underlying local SCC to support class (and class loader) "re-loading" after unload. The same applies to classes: the deserializer doesn't keep the class name and hash once the class record is cached, and uses the ROMClass SCC offset to reestablish the mapping if the same class is ever loaded again after being unloaded. Same for class chains.

I don't know if class re-loading is worth supporting, it should be rare in practice. I implemented it this way because it was relatively easy, assuming there is a local SCC. If we do want to support it without a local SCC, there are a couple options I can think of:

  • Remove the corresponding IDs from the set of known IDs in the client session on the server when classes/loaders are unloaded. This could be prone to race conditions.
  • Remember names, hashes etc. when classes are unloaded. This would add a bit of footprint and complicate the implementation. Or keep the data around when records are cached, with more extra footprint (which may or may not be significant).

There might be a simpler solution, I haven't really though this through.

@cjjdespres
Copy link
Contributor

While testing my non-deserializer changes, I ran into the issue where I couldn't get any cache hits when a client without an SCC requested an AOT load corresponding to a cached method that had been compiled for a different client that had an SCC. Part of the issue was a couple of differences in the TR_AOTHeader, but that was solved by enabling the portable AOT flag at the client. After still not getting any cache hits, I realized that the ROMClass values corresponding to certain classes were different in the two cases - the hashes and the romSize reported by the server were different, at least, with the no-SCC romSize being a little higher than the value with the SCC. I haven't checked if there are other differences.

The size parameter is set in

ROMClassBuilder::prepareAndLaydown( BufferManager *bufferManager, ClassFileParser *classFileParser, ROMClassCreationContext *context )

based on the results of getSizeInfo here
ROMClassBuilder::getSizeInfo(ROMClassCreationContext *context, ROMClassWriter *romClassWriter, SRPOffsetTable *srpOffsetTable, bool *countDebugDataOutOfLine, SizeInformation *sizeInformation)

That method uses writeROMClass to pretend (I think) to write the ROMClass information to its internal buffer, then gets the size based on the final positions of the different cursors. I looked at what happened with java/lang/Object at the client to start, and saw that the difference was entirely with the debug info for that class - if canPossiblyStoreDebugInfoOutOfLine() is true
bool canPossiblyStoreDebugInfoOutOfLine() const {

then the debug info is stored out of line (or at least the writer pretends to do so), which affects sizeInformation.rcWithOutUTF8sSize, which ultimately changes the romSize. You can see that canPossiblyStoreDebugInfoOutOfLine() can vary depending on whether or not the ROMClass might be shared or if the local SCC has space or not.

I'm not sure what a safe option here is yet. It's possible that we could just have canPossiblyStoreDebugInfoOutOfLine() be false when the JITServer AOT cache is active (or specified in the options), which would force the debug data inline (I think) after the context->forceDebugDataInline() call in getSizeInfo, and lead to a uniformly slightly higher romSize in the ROMClass. That query is also affected by the _allocationStrategy of the ROM class creation context, so maybe we could somehow force a strategy that allows us to store the debug information out of line.

I haven't looked at other classes to see if there are other ways in which the romSize might vary.

@cjjdespres
Copy link
Contributor

Setting canPossiblyStoreDebugInfoOutOfLine() to false does seem to fix the size issue, but the hash is still different. Looking at the actual bytes of the ROMClass of java/lang/Object, I think that the code emitted in

} else {
UDATA count = cursor->getCount();
writeByteCodes(cursor, &iterator);
count = cursor->getCount() - count;
Trc_BCU_Assert_Equals(count, byteCodeSize);
}

is different in a few places, at least. (The ROMClass header is the same with or without an SCC). For example, the method

ROMClassWriter::writeByteCodes(Cursor* cursor, ClassFileOracle::MethodIterator *methodIterator)
starts with the bytes 2a b8 1 0 e5 for the code in both cases, but by the time the actual ROMClass is being written that has become 2a b8 1 0 ad in the SCC case. It appears to remain the same in the no-SCC case. Maybe something in that fixup table or the later return fixup pass differs between the two?

I should probably print this data at the server to make sure that no other differences have cropped up between ROMClass creation and when the server tries to calculate the hash.

@cjjdespres
Copy link
Contributor

cjjdespres commented Oct 11, 2023

Incidentally, that's all the difference seems to be - there are e5 bytes in a few places in the without-SCC case where there are variously bytes ad, ef, ac in the with-SCC case. That's true at the server as well, as I've now confirmed. I think those are codes for method returns - are the e5 and ef bytes placeholders, or do they mean something?

@cjjdespres
Copy link
Contributor

I've found the difference - when isROMClassShareable() is true in the rom class creation context, the code path we go down will eventually call fixReturnBytecodes on the ROMClass that's been written, which fixes up the return bytecodes in methods. (The bytecodes are enumerated in bcnames.h, as you probably know, so e5 is JBgenericReturn and ef is JBreturnZ, for example). That is what converts the e5 to the different returns that I saw in the with-SCC case. This function isn't called otherwise, so the without-SCC case will always have these JBgenericReturn bytecodes. There's actually a comment here that anticipates that fixReturnBytecodes will not be called when a ROMClass isn't in the SCC:

/*
* An exception must be made for JBgenericReturn bytecodes to allow comparison
* against ROMClasses that have been added to the shared cache. Such ROMClasses
* will have had their return bytecodes "fixed" with fixReturnBytecodes, so any
* of the equivalent return bytecodes is allowed to match an attempt to write a
* JBgenericReturn.
*/

so I suppose this is intentional.

@cjjdespres
Copy link
Contributor

@AlexeyKhrabrov I'd just like to confirm something about deserialization. When the deserializer deals with class chain records from the server currently, it builds up a RAM class chain from the already-cached record IDs, then gets a class chain from the local SCC for the first class in the chain, then verifies that this local class chain is equal to what we expect (in that the ROM classes that the local chain references are equal to those of the RAM class chain). This class chain (once retrieved from the local SCC by the relocation runtime) is subsequently used to make sure that particular J9Class * entities match what the local class chain refers to.

I don't think the local SCC part of the process contains any validation that is not already done during deserialization. Is that right? Once we've gotten to the point of processing a class chain record, all of the class records in the chain will have been associated to full J9Class * entities in the JVM using their class records, and invalidateClass will ensure that class unloading or redefinition won't have invalidated any of the validation that's done during deserialization. The SCC part of the process throws away information, so to speak, in that we take a RAM class chain and convert it into a local SCC class chain that only knows about ROM classes, not about how classes are actually loaded in the current JVM.

In that case, as long as we check that all of the classes in the RAM class chain remain valid (not unloaded or redefined), the overridden versions of SCC functions like J9SharedCache::classMatchesCachedVersion(J9Class *clazz, uintptr_t *chain) won't have to do any validation, other than ensuring that the clazz is equal to whatever pointer is associated to the chain in the deserializer. That assumes that we don't support class reloading (at least initially).

@cjjdespres
Copy link
Contributor

Of course we will still have to check during deserialization that the RAM class chain constructed from the class chain serialization record is compatible with the RAM class chain derived from the J9Class * corresponding to the first entry of the class serialization record.

@AlexeyKhrabrov
Copy link
Contributor

Yes I think that is correct.

@cjjdespres
Copy link
Contributor

Will a J9Method * (corresponding to a MethodSerializationRecord) ever be invalidated like a J9Class * or J9ClassLoader *? At the moment the deserializer only stores the local SCC offset to the J9ROMMethod *, but I think we'll need to keep around the full J9Method * that gets resolved during deserialization.

@AlexeyKhrabrov
Copy link
Contributor

Yes, I think you'll need to keep a two-way mapping between method IDs and J9Method pointers, and invalidate all methods defined by unloaded classes.

@cjjdespres
Copy link
Contributor

cjjdespres commented Feb 1, 2024

Here is a high-level explanation of the JITServer AOT cache changes. I gave some background/useful information in the first few sections about how AOT and the JITServer AOT cache currently function to frame the discussion and define some terms, but the actual changes are explained in the final section. Also, when I say that a client "doesn't have an SCC", what I mean is that we're in a situation where we're choosing to bypass the local SCC entirely, perhaps because it doesn't exist, or because it exists but is unsuitable (read-only, or otherwise full) for normal JITServer AOT cache compilations.

How do locally-compiled AOT methods refer to data?

To save space, the relocation records of locally-compiled AOT methods refer to a lot of data via offset into the local SCC. An offset is a uintptr_t value that can be (and indeed should be if you're not working on low-level implementation details) considered an opaque "key" that can be used to retrieve data from a particular local SCC. This data is stored in the local SCC during the course of an AOT compilation and the offsets to that data are stored in the relocation records. During a load, the relocation runtime uses the stored offsets to get the data from the local SCC for its own use - to validate some property, say, or to look up something in the running JVM.

Non-shared JITServer AOT compilation

Traditionally, the JITServer compiled AOT code on a per-client basis, and required a local SCC in order to perform these compilations. Whenever a relocation record needed a client offset, it would send the data to the client, have it store the data in its local SCC, get the offset to the data back, and store that offset in the relocation record. The resulting compiled method could be sent to the client and stored in its local SCC without issue, and the client could relocate it normally. This is still what happens when the client does not request that the compilation involve the JITServer AOT cache.

Sharing JITServer AOT compilations between clients that have local SCCs

The only real barrier to sharing AOT code between clients with local SCCs is the offset problem - we cannot guarantee that a specified piece of data will be at the same offset in every local SCC, so the offsets in relocation records are only valid with respect to a particular client's SCC. To address this problem, during the course of a compilation intended to be stored in the JITServer AOT cache we build up a collection of records that do two things:

  1. Specify every place in a relocation record where an offset might be used
  2. Specify, for every offset, enough data that we can reconstruct what information that offset refers to in any local SCC.

These records are the JITServer AOT cache serialization records, and they act as relocation records for (normal, local AOT) relocation records. They are stored alongside the cached method in the JITServer AOT cache and sent to clients when an AOT load is requested. A client will use these serialization records to find the data it needs in the local SCC, get the offsets to that data, then update the relocation records for the method that was sent by the server. This is called deserialization, and is performed by the JITServerAOTDeserializer. The resulting method and its records can then be given to the relocation runtime, which can consult the local SCC as usual as it relocates the method.

Removing the dependency on the local SCC for the JITServer AOT cache

The description of the process of local relocation and deserialization I gave above is incomplete in two crucial ways:

  1. The data in the SCC that the offsets in the relocation records refer to are the persistent representations of individual dynamic JVM entities that were recorded during an AOT compilation, say a particular J9Class or J9Method. The relocation runtime will use an individual offset to get that persistent representation from the local SCC, use it to look up a corresponding candidate dynamic JVM entity, perform some direct validation of it, then hand that dynamic entity to the next step of the relocation process. That's all it uses the data for - any other validation it needs to do is encoded in other ways in the relocation records.
  2. The deserializer effectively duplicates this process. What it actually does right now is use its serialization records to look up a dynamic JVM entity corresponding to a particular offset that is stored in a method's relocation records, perform some direct validation of it, look up its persistent representation in the local SCC, get the offset to that data, and then update the relocation records of the method with that local offset. Its validation provides the exact same guarantees that the relocation runtime provides when it materializes JVM entities.

For AOT loads, then, the SCC can be cut out of the process entirely. In the new method of JITServer AOT cache loading, what we do is:

  1. Use the existing caches in the JITServer AOT deserializer to build up an association between whatever dynamic JVM entities the deserializer looked up during deserialization and "deserializer offsets", which are uintptr_t keys that can be used to retrieve these JVM entities from the deserializer.
  2. Update the offsets in a method's relocation records with these deserializer offsets.
  3. Temporarily override the SCC interface used when relocating these methods with a new "deserializer SCC" that acts as a frontend to the deserializer, and will return the cached dynamic JVM entities directly when the relocation runtime queries the SCC interface it's using for information.

It's important to note here that the interface to the local SCC that the relocation runtime uses is at a high enough level that it never actually examines the structure of the data that it retrieves from the local SCC - it just performs various SCC API calls on the offsets and the results of those API calls, and so on. That's how we can get away with returning the entities directly in (3). Otherwise the implementation would need to be much more complicated; we'd have to duplicate a lot of the implementation details of the SCC itself.

As for AOT stores (generating AOT code for a method and storing it in the JITServer AOT cache), the main problem to solve is that relocation records expect the offsets to be consistent with each other, in that two equal offsets are expected to refer to the same data, and two distinct offsets are (effectively, by the relo runtime) expected to refer to distinct data. Fortunately, the JITServer AOT cache itself provides a way of obtaining such offsets - its own records have uintptr_t identifiers, and these can be used directly in the relocation records in place of the non-existent local SCC's offsets. The rest of the changes needed involve being able to generate serialization records and look them up without a local SCC being present, but this only involves some straightforward refactoring of the existing code, as the local SCC was never an integral part of that process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project
Projects
None yet
Development

No branches or pull requests

4 participants