-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow JITServer AOT cache to work without a shared class cache at the client #16721
Comments
Attn: @cjjdespres |
First, I noticed in bool canRelocateMethod = TR::CompilationInfo::canRelocateMethod(comp); That function comes with the following warning that seems relevant: // Delay relocation if this is a deserialized AOT method using SVM received from the JITServer AOT cache.
// Such methods cannot be immediately relocated in the current implementation. An immediate AOT+SVM load
// uses the ID-symbol mapping created during compilation. This mapping is missing when the client receives
// a serialized AOT method from the server, and trying to load the deserialized method immediately
// triggers fatal assertions in SVM validation in certain cases. As a workaround, we delay the AOT load
// until the next interpreted invocation of the method; see CompilationInfo::replenishInvocationCounter().
//
//TODO: Avoid the overhead of rescheduling this compilation request by handling the deserialized AOT load as if
// the method came from the local SCC, rather than as if it was freshly AOT-compiled at the JITServer.
if (comp->isDeserializedAOTMethodUsingSVM())
return false; Something to keep in mind, though I can't believe it's that difficult to fix. As for solving this issue, I think most of the relocation work happens in That's the only way I can think of that would let us avoid modifying or duplicating the actual relocation logic that currently exists. There is quite a lot of it, and based on the description of the system in https://github.com/eclipse-openj9/openj9/tree/master/doc/compiler/aot I can't see why it wouldn't be possible. |
To lay out the problem in more detail: the relocation runtime and SVM rely on the
The function of the JITServer AOT deserializer is to patch up the relocation records received from the server so that the SCC offsets they contain point to valid entities in the local SCC. It also stores class data in the local SCC, again so that the relocation runtime will function properly, and caches all records received from the server for the duration of the connection to that server. At relocation time at the client, we should be able to detect (probably via an explicit option) that we should be ignoring any existing local SCC. If that is the case, we need to change how we currently handle SCC offsets in order to deal with problem (1) above:
We also need to implement the queries related to classes and class loaders. The deserializer does cache information received from the server related to class chains and class loaders, but the information that's stored in the class loader table by the I'm not sure what can be done to split this up into smaller pieces for implementation. I could add the frontend and have it return a I think that should be all that's needed for clients to relocate received AOT methods while bypassing the existing SCC. I believe @AlexeyKhrabrov has thought about this issue before, so I'm interested to know if there are other major issues that I don't know about in implementing this that I haven't considered. Thoughts from @mpirvu and @dsouzai would also be appreciated. |
I generally agree with the approach of implementing a You'll also need to decouple the I agree with Irwin's suggestion about simply storing "RAM" entity pointers in updated relocation/validation records since the AOT body will be thrown out anyway. The mapping of entity types would be:
While we do lose some type information in this approach compared to the fake offset maps, if we can guarantee that "RAM" pointers always fit into e.g. 32 bits (not sure if that applies to Please ping me about significant changes to the deserializer as you work on the implementation. I've been basically rewriting large parts of it for my current work on early remote compilations (which I plan to contribute later, but won't be able to in the next few months at least), and I'd prefer to avoid having to rewrite much of it again in case our changes are too incompatible. While you have the relocation part mostly figured out, what is the plan for AOT compilations on the server? The AOT codegen infrastructure also assumes that the client has a local SCC. You should be able to start with keeping this assumption while you work on the relocation part, but eventually you'll need to enable remote AOT compilations for clients without a local SCC. |
Sure, I'll let you know if I make large changes to the deserializer. I've been focusing on this initial problem for now, but fresh server AOT compilations would be good to support. I planned on testing my work with JITServer AOT caches that have already been populated (by a previous run or a loaded persistent cache). I'd have to look at the client-server messages in more detail to see what would be needed. I think local AOT compilations are disabled when the local SCC is read-only (and definitely when the local SCC doesn't exist) - is that right? |
Not just good, but necessary for this feature to be practically useful (unless I'm missing something). Otherwise we're just moving the cache pre-population step to the server side instead of eliminating it (which is basically the main point of the server AOT cache). |
Using a pre-populated persistent AOT cache is a good idea, it's probably the easiest way to test things.
I think the main things to deal with are:
Sounds right, @dsouzai should be able to confirm. |
@mpirvu mentioned using the current Liberty image and its read-only embedded SCC with the JITServer AOT cache in connection with this issue, but maybe he was imagining fresh server AOT compilations in the ultimate solution. I could imagine generating a server image with an embedded AOT cache in a similar way to the Liberty image, but that would certainly be harder and less generally useful than simply allowing fresh AOT compilations. |
Re: fresh server AOT compilations |
Please correct me if I'm wrong, but I think it's still necessary in this specific use case. At least when there is no writable SCC layer on top of the read-only SCC, e.g. if the container doesn't have any writable storage (which may be a common setup). The server AOT cache is only useful if it can store methods not present in the read-only local SCC. For the server to be able to AOT-compile these new methods, the clients need to store new ROMClasses and class chains in their local SCCs, which is impossible if they're read-only. A way around that would be to pre-populate the server AOT cache using a client with a writable SCC, but then the user may as well pre-populate an extra layer of the local SCC instead. Otherwise an easier workaround would be to have a writable top SCC layer which would store additional ROMClasses (which consume memory regardless of being in the SCC or not) and class chains (which add little extra footprint), but not the AOT methods received from the server. |
One complication could be the fact that the client already has a local SCC (marked read-only). So, some bits of information need to be extracted from this SCC and some other bits from the map/fake-scc that we are going to create. How about this: create an in-memory SCC layer on top of the existing one for storing the deserialized AOT methods. By doing so we break the "read-only" specification, but I don't think that is worse than maintaining some other data structure. It's important for this new SCC layer to be in-memory (not backed by any file) because otherwise "read-only" means nothing. |
If we do implement AOT deserialization without local SCC, the deserializer shouldn't need to access the local SCC at all, even if it's present. All the lookups and validations can be done using only class loaders (identified by name of 1st loaded class) and RAMClasses (looked up by class name).
Does the SCC support using an anonymous mapping instead of a file-backed one? Even if it doesn't, it might be easier to do that for the writable top SCC layer rather than implementing SCC-less remote AOT loads and compilations. At least if the main goal is to support containers without a writable SCC layer. But I do believe that SCC-less remote AOT would be a cleaner design. I implemented it on top of SCC only because it seemed easier at the time. I don't think there would be any new data structures to maintain, but it'll probably still require a lot of code changes. Another advantage would be performance. Current implementation performs most lookups twice (during deserialization and during load), but I don't know if this overhead is significant. There would also be a footprint reduction, but probably not that significant. |
Yes, you'll need to modify the class loader table such that it only maintains the loader-name two-way mapping when there is no SCC. |
EDIT: Leaving the text below as is for posterity, but it isn't a valid solution for bypassing the need to send the client offsets; see #16721 (comment). One approach that could be taken with JITServer is leveraging the SVM rather than traditional AOT, especially in this context. From what I understand, when a remote AOT compilation is performed on the server, anything to do with ROMClasses are dealt with using names. As such, the class chain is not a chain of offsets but rather a chain of names. When it's sent to the client, in order to use the existing relocation infrastructure, we have to translate the name into an offset. This still requires some notion of a client SCC. Instead, the class chain validation for a remotely loaded AOT body should probably just use a different function than Next, almost all addresses should be materialized using the SVM, namely using the
Once validation succeeds, it will have built up a map of IDs to symbols. It is trivial to now get the value associated with an ID. None of this now requires dealing with a local SCC. This would require going through each of the relocation records and determining if it needs to be replaced with a I'm not necessarily suggesting that this approach be taken right now, but it should be considered when dealing with the final solution. |
@dsouzai, just so I understand - how would that work with, say, the There is some initialization of the relocation record here:
and then there's some here:
Would the idea be that I could replace those Or is that an example of record that could be intercepted? |
Sorry, I forgot that the |
I had an offline chat with @cjjdespres and I don't think what I wrote in #16721 (comment) is valid for the purpose of this issue. I originally thought that having the SVM would prevent the need for dealing with sending the client offsets because the SVM maintains a mapping of IDs to Values. However, that mapping has to be built somehow. If there was a way for the server to remember all the J9Method, J9Class, etc pointers from the first client that asked for a relocatable compile (that gets cached on the server) and then map it to the new client's set of J9Method, J9Class, etc pointers, then the server could just send the client a populated map that could be fed to the SVM such that the However, until we figure out a way to do that (if it is even feasible to do so in a performant way), the server still needs to send offsets to the client which needs to then convert those offsets to J9ROMMethod, J9ROMClass, etc pointers. As such, there isn't any additional benefit of using the SVM (aside from the established ones such as having more performant code). |
I've been dealing with the persistent class loader table and class loader deserialization portion of this, and was wondering about the proper handling of some of this data. Right now we look up class loaders by the name of their first loaded class in the persistent class loader table. This retrieves the pointer to the loader and the local SCC class chain (array of local SCC offsets) for the class. That local class chain is cached (via its local SCC offset) and used as a fallback to try to look up a new loader when it's been marked as unloaded in the relevant deserializer map. The offset to this chain is also used to update the relocations records, of course, and then subsequently used during relocation to materialize the class loader. It's that fallback that I'm not sure about, since we're not relying on the local SCC being present anymore. If the class loader has been unloaded, is there anything wrong with looking up the class loader by name again? |
I don't see a problem with that. Maybe @AlexeyKhrabrov can offer his opinion as well. |
There is nothing wrong with looking it up by name, as long as you have the name. Current implementation doesn't keep the identifying name around once the corresponding record is cached, and relies on the underlying local SCC to support class (and class loader) "re-loading" after unload. The same applies to classes: the deserializer doesn't keep the class name and hash once the class record is cached, and uses the ROMClass SCC offset to reestablish the mapping if the same class is ever loaded again after being unloaded. Same for class chains. I don't know if class re-loading is worth supporting, it should be rare in practice. I implemented it this way because it was relatively easy, assuming there is a local SCC. If we do want to support it without a local SCC, there are a couple options I can think of:
There might be a simpler solution, I haven't really though this through. |
While testing my non-deserializer changes, I ran into the issue where I couldn't get any cache hits when a client without an SCC requested an AOT load corresponding to a cached method that had been compiled for a different client that had an SCC. Part of the issue was a couple of differences in the The size parameter is set in openj9/runtime/bcutil/ROMClassBuilder.cpp Line 502 in 7e24d9a
based on the results of getSizeInfo hereopenj9/runtime/bcutil/ROMClassBuilder.cpp Line 449 in 7e24d9a
That method uses writeROMClass to pretend (I think) to write the ROMClass information to its internal buffer, then gets the size based on the final positions of the different cursors. I looked at what happened with java/lang/Object at the client to start, and saw that the difference was entirely with the debug info for that class - if canPossiblyStoreDebugInfoOutOfLine() is true
then the debug info is stored out of line (or at least the writer pretends to do so), which affects sizeInformation.rcWithOutUTF8sSize , which ultimately changes the romSize . You can see that canPossiblyStoreDebugInfoOutOfLine() can vary depending on whether or not the ROMClass might be shared or if the local SCC has space or not.
I'm not sure what a safe option here is yet. It's possible that we could just have I haven't looked at other classes to see if there are other ways in which the |
Setting openj9/runtime/bcutil/ROMClassWriter.cpp Lines 1385 to 1390 in 8a817fe
is different in a few places, at least. (The openj9/runtime/bcutil/ROMClassWriter.cpp Line 2125 in 8a817fe
2a b8 1 0 e5 for the code in both cases, but by the time the actual ROMClass is being written that has become 2a b8 1 0 ad in the SCC case. It appears to remain the same in the no-SCC case. Maybe something in that fixup table or the later return fixup pass differs between the two?
I should probably print this data at the server to make sure that no other differences have cropped up between |
Incidentally, that's all the difference seems to be - there are |
I've found the difference - when openj9/runtime/bcutil/ComparingCursor.cpp Lines 325 to 331 in 7463bc9
so I suppose this is intentional. |
@AlexeyKhrabrov I'd just like to confirm something about deserialization. When the deserializer deals with class chain records from the server currently, it builds up a RAM class chain from the already-cached record IDs, then gets a class chain from the local SCC for the first class in the chain, then verifies that this local class chain is equal to what we expect (in that the ROM classes that the local chain references are equal to those of the RAM class chain). This class chain (once retrieved from the local SCC by the relocation runtime) is subsequently used to make sure that particular I don't think the local SCC part of the process contains any validation that is not already done during deserialization. Is that right? Once we've gotten to the point of processing a class chain record, all of the class records in the chain will have been associated to full In that case, as long as we check that all of the classes in the RAM class chain remain valid (not unloaded or redefined), the overridden versions of SCC functions like |
Of course we will still have to check during deserialization that the RAM class chain constructed from the class chain serialization record is compatible with the RAM class chain derived from the |
Yes I think that is correct. |
Will a |
Yes, I think you'll need to keep a two-way mapping between method IDs and |
Here is a high-level explanation of the JITServer AOT cache changes. I gave some background/useful information in the first few sections about how AOT and the JITServer AOT cache currently function to frame the discussion and define some terms, but the actual changes are explained in the final section. Also, when I say that a client "doesn't have an SCC", what I mean is that we're in a situation where we're choosing to bypass the local SCC entirely, perhaps because it doesn't exist, or because it exists but is unsuitable (read-only, or otherwise full) for normal JITServer AOT cache compilations. How do locally-compiled AOT methods refer to data?To save space, the relocation records of locally-compiled AOT methods refer to a lot of data via offset into the local SCC. An offset is a Non-shared JITServer AOT compilationTraditionally, the JITServer compiled AOT code on a per-client basis, and required a local SCC in order to perform these compilations. Whenever a relocation record needed a client offset, it would send the data to the client, have it store the data in its local SCC, get the offset to the data back, and store that offset in the relocation record. The resulting compiled method could be sent to the client and stored in its local SCC without issue, and the client could relocate it normally. This is still what happens when the client does not request that the compilation involve the JITServer AOT cache. Sharing JITServer AOT compilations between clients that have local SCCsThe only real barrier to sharing AOT code between clients with local SCCs is the offset problem - we cannot guarantee that a specified piece of data will be at the same offset in every local SCC, so the offsets in relocation records are only valid with respect to a particular client's SCC. To address this problem, during the course of a compilation intended to be stored in the JITServer AOT cache we build up a collection of records that do two things:
These records are the JITServer AOT cache serialization records, and they act as relocation records for (normal, local AOT) relocation records. They are stored alongside the cached method in the JITServer AOT cache and sent to clients when an AOT load is requested. A client will use these serialization records to find the data it needs in the local SCC, get the offsets to that data, then update the relocation records for the method that was sent by the server. This is called deserialization, and is performed by the Removing the dependency on the local SCC for the JITServer AOT cacheThe description of the process of local relocation and deserialization I gave above is incomplete in two crucial ways:
For AOT loads, then, the SCC can be cut out of the process entirely. In the new method of JITServer AOT cache loading, what we do is:
It's important to note here that the interface to the local SCC that the relocation runtime uses is at a high enough level that it never actually examines the structure of the data that it retrieves from the local SCC - it just performs various SCC API calls on the offsets and the results of those API calls, and so on. That's how we can get away with returning the entities directly in (3). Otherwise the implementation would need to be much more complicated; we'd have to duplicate a lot of the implementation details of the SCC itself. As for AOT stores (generating AOT code for a method and storing it in the JITServer AOT cache), the main problem to solve is that relocation records expect the offsets to be consistent with each other, in that two equal offsets are expected to refer to the same data, and two distinct offsets are (effectively, by the relo runtime) expected to refer to distinct data. Fortunately, the JITServer AOT cache itself provides a way of obtaining such offsets - its own records have |
Currently, the JITServer AOT cache feature requires the client JVM to use a shared class cache (SCC) with some empty space and write permissions. This is because the AOT code fetched from the server is first stored in the SCC and then loaded from there, relocated and stored in the code cache. This solution was chosen to simplify the AOT cache implementation by relying as much as possible on the existing SCC mechanism. However, this solution comes with some usability constraints when a SCC is embedded in a container.
Embedding a SCC in container (to speed up start-up of aplications) is typically done in layers, with one SCC layer per container image layer. Such SCC layers are typically trimmed to just the right size and marked read-only.
This issue proposes to change the JITServer AOT cache mechanism such that the AOT code sent by the server is deserialized and relocated in-place, without going through the SCC first.
The text was updated successfully, but these errors were encountered: