Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8243669: Improve library loading for Panama libraries #132

Closed
wants to merge 3 commits into from

Conversation

mcimadamore
Copy link
Collaborator

@mcimadamore mcimadamore commented Apr 27, 2020

The code for loading native libraries has recently been cleaned up so as to allow for same library to be loaded by multiple loaders, in case the library being loaded is NOT a JNI library - see JDK-8240975.

This patch relaxes the Panama library loading mechanism so that the classloader restriction is dropped; in fact, Panama library loading is now orthogonal from the class loader in which the loading operation occurs.

The main issue with this enhancement is to decide how libraries should be unloaded, given that same library might be referred to by many different lookup objects in many different threads.

If we aim for fully explicit library unloading (e.g. LibraryLookup::close) this raises similar issues to those we have for foreign memory access: we now have a library lookup which can be closed while a method handle is operating on an address generated by it in some other thread.

We could solve the problem in the same way we solved the memory segment problem - that is, making library lookup objects thread-confined, and have each address be invalidated when a lookup is closed (but then looked up addresses are only usable withing the confinement thread). While doable, this seems to go against how clients will use SystemABI to generate native method handles, where they will probably want to stash a bunch of method handles in static final constants - and if these handles depend on a confined address, it means they can't be shared.

A saner solution is to let the GC manage library unloading - that is, we setup a reference counter with each loaded library; the more lookups are created from the same underlying native library, the more the counter is incremented; as lookup instances become unreachable, the counter is decremented. When the counter reaches zero, the library is also unloaded.

To prevent races between library loading/unloading, the two routines in charge of loading/unloading have been marked as synchronized. This also means that the lookup on the nativeLibraries instance has to be performed inside the synchronized block (we don't want to accidentally try to use a NativeLibrary instance which has concurrently been unloaded by the cleaner).

This is a simple strategy, but also very effective: the lifetime of a LibraryLookup controls that of the native library it is associated with. In addition, all memory addresses generated by a lookup keep a strong reference to the lookup - and a native method handle generated by a SystemABI::downcallHandle call will also keep a strong reference to the address it refers to. Which means that if you store a method handle in a static final field, you don't have to worry about the LibraryLookup becoming unreachable, as it will be kept alive by the method handles.


Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed

Issue

  • JDK-8243669: Improve library loading for Panama libraries

Reviewers

  • Athijegannathan Sundararajan (sundar - Committer) ⚠️ Review applies to a49192f
  • Jorn Vernee (jvernee - Committer) ⚠️ Review applies to 0307fd4
  • Mandy Chung (mchung - Committer)

Download

$ git fetch https://git.openjdk.java.net/panama-foreign pull/132/head:pull/132
$ git checkout pull/132

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 27, 2020

👋 Welcome back mcimadamore! A progress list of the required criteria for merging this PR into foreign-abi will be added to the body of your pull request.

@openjdk openjdk bot added the rfr Ready for review label Apr 27, 2020
@mlbridge
Copy link

mlbridge bot commented Apr 27, 2020

Webrevs

@mlbridge
Copy link

mlbridge bot commented Apr 27, 2020

Mailing list message from Samuel Audet on panama-dev:

If reference counting gets used for both memory allocations and library
loading, what about extracting that functionality into a public API that we
could use with any other resources out there, such as GPU resources?
JavaCPP is already doing that and it works great, but I don't see any
reason why this kind of feature shouldn't be part of the JDK, something
I've mentioned before...

Samuel

2020?4?27?(?) 23:32 Maurizio Cimadamore <mcimadamore at openjdk.java.net>:

The code for loading native libraries has recently been cleaned up so as
to allow for same library to be loaded by
multiple loaders, in case the library being loaded is NOT a JNI library -
see JDK-8240975.

This patch relaxes the Panama library loading mechanism so that the
classloader restriction is dropped; in fact, Panama
library loading is now orthogonal from the class loader in which the
loading operation occurs.

The main issue with this enhancement is to decide how libraries should be
unloaded, given that same library might be
referred to by many different lookup objects in many different threads.

If we aim for fully explicit library unloading (e.g.
`LibraryLookup::close`) this raises similar issues to those we
have for foreign memory access: we now have a library lookup which can be
closed while a method handle is operating on
an address generated by it in some other thread.

We could solve the problem in the same way we solved the memory segment
problem - that is, making library lookup
objects thread-confined, and have each address be invalidated when a
lookup is closed (but then looked up addresses are
only usable withing the confinement thread). While doable, this seems to
go against how clients will use `SystemABI` to
generate native method handles, where they will probably want to stash a
bunch of method handles in static final
constants - and if these handles depend on a confined address, it means
they can't be shared.

A saner solution is to let the GC manage library unloading - that is, we
setup a reference counter with each loaded
library; the more lookups are created from the same underlying native
library, the more the counter is incremented; as
lookup instances become unreachable, the counter is decremented. When the
counter reaches zero, the library is also
unloaded.

To prevent races between library loading/unloading, the two routines in
charge of loading/unloading have been marked as
synchronized. This also means that the lookup on the `nativeLibraries`
instance has to be performed _inside_ the
synchronized block (we don't want to accidentally try to use a
`NativeLibrary` instance which has concurrently been
unloaded by the cleaner).

This is a simple strategy, but also very effective: the lifetime of a
LibraryLookup controls that of the native library
it is associated with. In addition, all memory addresses generated by a
lookup keep a strong reference to the lookup -
and a native method handle generated by a `SystemABI::downcallHandle` call
will also keep a strong reference to the
address it refers to. Which means that if you store a method handle in a
static final field, you don't have to worry
about the `LibraryLookup` becoming unreachable, as it will be kept alive
by the method handles.

-------------

Commit messages:
- Add support for library loading mechanism that is not dependent on
class loaders.

Changes: https://git.openjdk.java.net/panama-foreign/pull/132/files
Webrev: https://webrevs.openjdk.java.net/panama-foreign/132/webrev.00
Issue: https://bugs.openjdk.java.net/browse/JDK-JDK-8243669
Stats: 213 lines in 15 files changed: 174 ins; 10 del; 29 mod
Patch: https://git.openjdk.java.net/panama-foreign/pull/132.diff
Fetch: git fetch https://git.openjdk.java.net/panama-foreign
pull/132/head:pull/132

PR: https://git.openjdk.java.net/panama-foreign/pull/132

@mlbridge
Copy link

mlbridge bot commented Apr 28, 2020

Mailing list message from Maurizio Cimadamore on panama-dev:

On 27/04/2020 23:13, Samuel Audet wrote:

If reference counting gets used for both memory allocations and library
loading, what about extracting that functionality into a public API that we
could use with any other resources out there, such as GPU resources?
JavaCPP is already doing that and it works great, but I don't see any
reason why this kind of feature shouldn't be part of the JDK, something
I've mentioned before...

There's no _explicit_ reference counting mechanism anywhere - not in the
memory segment API, not in the library lookup API (which is what changed
here). Not sure what you have in mind - but with the foreign API we have
been trying to stay clear of explicit 'retain'/'release' calls.? Is
there a specific reason why you think that the JDK would be in a unique
position to provide a better solution for something like this?

Maurizio

Samuel

2020?4?27?(?) 23:32 Maurizio Cimadamore <mcimadamore at openjdk.java.net>:

The code for loading native libraries has recently been cleaned up so as
to allow for same library to be loaded by
multiple loaders, in case the library being loaded is NOT a JNI library -
see JDK-8240975.

This patch relaxes the Panama library loading mechanism so that the
classloader restriction is dropped; in fact, Panama
library loading is now orthogonal from the class loader in which the
loading operation occurs.

The main issue with this enhancement is to decide how libraries should be
unloaded, given that same library might be
referred to by many different lookup objects in many different threads.

If we aim for fully explicit library unloading (e.g.
`LibraryLookup::close`) this raises similar issues to those we
have for foreign memory access: we now have a library lookup which can be
closed while a method handle is operating on
an address generated by it in some other thread.

We could solve the problem in the same way we solved the memory segment
problem - that is, making library lookup
objects thread-confined, and have each address be invalidated when a
lookup is closed (but then looked up addresses are
only usable withing the confinement thread). While doable, this seems to
go against how clients will use `SystemABI` to
generate native method handles, where they will probably want to stash a
bunch of method handles in static final
constants - and if these handles depend on a confined address, it means
they can't be shared.

A saner solution is to let the GC manage library unloading - that is, we
setup a reference counter with each loaded
library; the more lookups are created from the same underlying native
library, the more the counter is incremented; as
lookup instances become unreachable, the counter is decremented. When the
counter reaches zero, the library is also
unloaded.

To prevent races between library loading/unloading, the two routines in
charge of loading/unloading have been marked as
synchronized. This also means that the lookup on the `nativeLibraries`
instance has to be performed _inside_ the
synchronized block (we don't want to accidentally try to use a
`NativeLibrary` instance which has concurrently been
unloaded by the cleaner).

This is a simple strategy, but also very effective: the lifetime of a
LibraryLookup controls that of the native library
it is associated with. In addition, all memory addresses generated by a
lookup keep a strong reference to the lookup -
and a native method handle generated by a `SystemABI::downcallHandle` call
will also keep a strong reference to the
address it refers to. Which means that if you store a method handle in a
static final field, you don't have to worry
about the `LibraryLookup` becoming unreachable, as it will be kept alive
by the method handles.

-------------

Commit messages:
- Add support for library loading mechanism that is not dependent on
class loaders.

Changes: https://git.openjdk.java.net/panama-foreign/pull/132/files
Webrev: https://webrevs.openjdk.java.net/panama-foreign/132/webrev.00
Issue: https://bugs.openjdk.java.net/browse/JDK-JDK-8243669
Stats: 213 lines in 15 files changed: 174 ins; 10 del; 29 mod
Patch: https://git.openjdk.java.net/panama-foreign/pull/132.diff
Fetch: git fetch https://git.openjdk.java.net/panama-foreign
pull/132/head:pull/132

PR: https://git.openjdk.java.net/panama-foreign/pull/132

@mlbridge
Copy link

mlbridge bot commented Apr 28, 2020

Mailing list message from Samuel Audet on panama-dev:

On 4/28/20 10:39 AM, Maurizio Cimadamore wrote:

On 27/04/2020 23:13, Samuel Audet wrote:

If reference counting gets used for both memory allocations and library
loading, what about extracting that functionality into a public API that we
could use with any other resources out there, such as GPU resources?
JavaCPP is already doing that and it works great, but I don't see any
reason why this kind of feature shouldn't be part of the JDK, something
I've mentioned before...

There's no _explicit_ reference counting mechanism anywhere - not in the
memory segment API, not in the library lookup API (which is what changed
here). Not sure what you have in mind - but with the foreign API we have
been trying to stay clear of explicit 'retain'/'release' calls.? Is
there a specific reason why you think that the JDK would be in a unique
position to provide a better solution for something like this?

I understand that there is no "explicit reference counting", but it is
nonetheless exactly what it is doing "under the hood". Let's put aside
the question of offering a standard API for reference counting, and
start with a simpler problem first. Since you wish to abstract away
completely reference counting from users when it comes to memory
allocations and library loading, what are you planning to do about its
limitations, mainly when dealing with circular references?

Samuel

@openjdk
Copy link

openjdk bot commented Apr 28, 2020

@mcimadamore This change now passes all automated pre-integration checks, type /integrate in a new comment to proceed. After integration, the commit message will be:

JDK-8243669: Improve library loading for Panama libraries

Reviewed-by: sundar, jvernee, mchung
  • If you would like to add a summary, use the /summary command.
  • To credit additional contributors, use the /contributor command.
  • To add additional solved issues, use the /solves command.

Since the source branch of this PR was last updated there have been 4 commits pushed to the foreign-abi branch:

  • f84874e: Automatic merge of foreign-memaccess into foreign-abi
  • 98ad8aa: 8244128: Allocations larger than MAX_ALIGN can fail to be sliced to proper size.
  • 63c69a0: Automatic merge of foreign-memaccess into foreign-abi
  • 7fad4ad: 8244127: "memory stomping error" when running java/foreign/TestNative.java on a debug build

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid automatic rebasing, please merge foreign-abi into your branch, and then specify the current head hash when integrating, like this: /integrate f84874ef87b82b70105534f24c08fca7e5cada82.

➡️ To integrate this PR with the above commit message to the foreign-abi branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Ready to be integrated label Apr 28, 2020
@mlbridge
Copy link

mlbridge bot commented Apr 28, 2020

Mailing list message from Samuel Audet on panama-dev:

On 4/28/20 11:15 AM, Samuel Audet wrote:

On 4/28/20 10:39 AM, Maurizio Cimadamore wrote:

On 27/04/2020 23:13, Samuel Audet wrote:

If reference counting gets used for both memory allocations and library
loading, what about extracting that functionality into a public API that we
could use with any other resources out there, such as GPU resources?
JavaCPP is already doing that and it works great, but I don't see any
reason why this kind of feature shouldn't be part of the JDK, something
I've mentioned before...

There's no _explicit_ reference counting mechanism anywhere - not in
the memory segment API, not in the library lookup API (which is what
changed here). Not sure what you have in mind - but with the foreign
API we have been trying to stay clear of explicit 'retain'/'release'
calls.? Is there a specific reason why you think that the JDK would be
in a unique position to provide a better solution for something like
this?

I understand that there is no "explicit reference counting", but it is
nonetheless exactly what it is doing "under the hood". Let's put aside
the question of offering a standard API for reference counting, and
start with a simpler problem first. Since you wish to abstract away
completely reference counting from users when it comes to memory
allocations and library loading, what are you planning to do about its
limitations, mainly when dealing with circular references?

After rereading your proposal a second time, it's clear that you are
planning simply to fall back on the GC.

Well, I do agree that we shouldn't be doing reference counting manually,
but as you know, since you've already started experimenting with it,
there is smarter way of doing it, aka "scopes".

The idea with a standard API for reference counting would be to offer a
framework that could be used for any native resources, not just memory
segments or libraries or whatever next Panama is going to decide is
"important", and that could be shared across any number of native
libraries that are often used together to manage their resources in a
sane way. Just for reference, here's an example with OpenCV and
TensorFlow using JavaCPP's PointerScope:
http://bytedeco.org/news/2018/07/17/bytedeco-as-distribution/

It works well, and I do see the JDK offering something like that, but
more ironed out, etc simply because it also works fine with C++, Python,
Swift, etc as part of the language itself!

Do you disagree and if so, why?

Samuel

Copy link
Member

@JornVernee JornVernee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but one test is failing on Windows. Left an inline comment.

test/jdk/java/foreign/TestLibraryLookup.java Outdated Show resolved Hide resolved
test/jdk/java/foreign/TestLibraryLookup.java Outdated Show resolved Hide resolved
@mlbridge
Copy link

mlbridge bot commented Apr 28, 2020

Mailing list message from Maurizio Cimadamore on panama-dev:

On 28/04/2020 09:24, Samuel Audet wrote:

On 4/28/20 11:15 AM, Samuel Audet wrote:

On 4/28/20 10:39 AM, Maurizio Cimadamore wrote:

On 27/04/2020 23:13, Samuel Audet wrote:

If reference counting gets used for both memory allocations and
library
loading, what about extracting that functionality into a public API
that we
could use with any other resources out there, such as GPU resources?
JavaCPP is already doing that and it works great, but I don't see any
reason why this kind of feature shouldn't be part of the JDK,
something
I've mentioned before...

There's no _explicit_ reference counting mechanism anywhere - not in
the memory segment API, not in the library lookup API (which is what
changed here). Not sure what you have in mind - but with the foreign
API we have been trying to stay clear of explicit 'retain'/'release'
calls.? Is there a specific reason why you think that the JDK would
be in a unique position to provide a better solution for something
like this?

I understand that there is no "explicit reference counting", but it
is nonetheless exactly what it is doing "under the hood". Let's put
aside the question of offering a standard API for reference counting,
and start with a simpler problem first. Since you wish to abstract
away completely reference counting from users when it comes to memory
allocations and library loading, what are you planning to do about
its limitations, mainly when dealing with circular references?

After rereading your proposal a second time, it's clear that you are
planning simply to fall back on the GC.
Yes :-) That's why I wasn't getting your claim

Well, I do agree that we shouldn't be doing reference counting
manually, but as you know, since you've already started experimenting
with it, there is smarter way of doing it, aka "scopes".

The idea with a standard API for reference counting would be to offer
a framework that could be used for any native resources, not just
memory segments or libraries or whatever next Panama is going to
decide is "important", and that could be shared across any number of
native libraries that are often used together to manage their
resources in a sane way. Just for reference, here's an example with
OpenCV and TensorFlow using JavaCPP's PointerScope:
http://bytedeco.org/news/2018/07/17/bytedeco-as-distribution/

My take on scope is that they work generally well - but, to make them
fully safe, then you have to start throwing in assumption about thread
confinement (which is described in this proposal). Your pointer API has
explicit retain/release method - if you call release() and the refCount
== 0 you deallocate. Right? (sorry if I got the library names wrong).
So, how do you solve problems like this:

Thread A accessing a pointer while thread B is 'releasing' it, where A
is not well-behaved and did NOT perform a retain() ?

So, in my mental model, refcounts, scopes are _tools. If the _goal_ is
to write a safe API, these tools, alone, are not going to save the day.
A scope-like abstraction can of course be made to work, if you bring
together _other_ restrictions (e.g. pointers in a scope can only be used
by one thread). Assuming your API (and your clients) are ok with that
restriction, of course. Otherwise, we're basically discussing ways on
how to build an _unsafe_ API, which is a much simpler problem and not
what the Foreign Memory Access API is trying to do.

As for the claim that library and memory resources are in the same
league, I think even that claim is questionable. Memory can be short
lived - you allocate something on the stack, pass it on a function and
then clear the memory. But a library has (typically) a much longer
lifespan. So, while it is in general not great to rely on the GC to
auto-clean the memory allocated off-heap (there are many war stories as
to why this fails to scale at some point), I see very little gain in
adding a lot of complexity to allow for deterministic library unloading,
when the GC is probably going to do fine for such longer-lived objects -
at the same time avoiding the "same thread" restrictions, and providing
a guarantee that all native method handles derived from a library will
keep the library alive.

Maurizio

It works well, and I do see the JDK offering something like that, but
more ironed out, etc simply because it also works fine with C++,
Python, Swift, etc as part of the language itself!

Do you disagree and if so, why?

Samuel

* enhanced test to also try loading same library from multiple class loaders
@mlbridge
Copy link

mlbridge bot commented Apr 30, 2020

Mailing list message from Samuel Audet on panama-dev:

On 4/28/20 8:56 PM, Maurizio Cimadamore wrote:

The idea with a standard API for reference counting would be to offer
a framework that could be used for any native resources, not just
memory segments or libraries or whatever next Panama is going to
decide is "important", and that could be shared across any number of
native libraries that are often used together to manage their
resources in a sane way. Just for reference, here's an example with
OpenCV and TensorFlow using JavaCPP's PointerScope:
http://bytedeco.org/news/2018/07/17/bytedeco-as-distribution/

My take on scope is that they work generally well - but, to make them
fully safe, then you have to start throwing in assumption about thread
confinement (which is described in this proposal). Your pointer API has
explicit retain/release method - if you call release() and the refCount
== 0 you deallocate. Right? (sorry if I got the library names wrong).

Yes, that's the basic idea, but the point is that we can do everything
via "scopes". We don't usually need to know that it's doing reference
counting in the background, sort of like ARC in Swift.

So, how do you solve problems like this:

Thread A accessing a pointer while thread B is 'releasing' it, where A
is not well-behaved and did NOT perform a retain() ?

Right, it's not perfect, but I think these kinds of issues are solvable,
if we're willing to spend time and work on them. For example, if
something like `PointerScope` could be integrated into the Java language
itself, we would be able to guarantee that what you describe above never
happens, making everything thread-safe. I don't see any limitations in
that regards, but I may be missing something. Could you provide an
example that fails? Or is there just concern about the performance hit
that could be incurred (in which case I'd still say "let's work on it")?

So, in my mental model, refcounts, scopes are _tools. If the _goal_ is
to write a safe API, these tools, alone, are not going to save the day.
A scope-like abstraction can of course be made to work, if you bring
together _other_ restrictions (e.g. pointers in a scope can only be used
by one thread). Assuming your API (and your clients) are ok with that
restriction, of course. Otherwise, we're basically discussing ways on
how to build an _unsafe_ API, which is a much simpler problem and not
what the Foreign Memory Access API is trying to do.

As for the claim that library and memory resources are in the same
league, I think even that claim is questionable. Memory can be short
lived - you allocate something on the stack, pass it on a function and
then clear the memory. But a library has (typically) a much longer
lifespan. So, while it is in general not great to rely on the GC to
auto-clean the memory allocated off-heap (there are many war stories as
to why this fails to scale at some point), I see very little gain in
adding a lot of complexity to allow for deterministic library unloading,
when the GC is probably going to do fine for such longer-lived objects -
at the same time avoiding the "same thread" restrictions, and providing
a guarantee that all native method handles derived from a library will
keep the library alive.

My point is that this can all be part of a standard API to deal with all
these issues, *in one place*, instead of forcing users to come up with
their own heuristics, over and over again, leaving us with a large
amount of mental models to deal with. JavaCPP also relies on the GC to
clean up things around, and as I mentioned in another thread, it even
tries to call System.gc and malloc_trim(0) a few times as a last-ditch
effort to not throw OutOfMemoryError. Why not put that kind of thing in
the JDK where everyone can agree on something that makes sense instead
of having them come up with different heuristics for these things?
That's what I think we should agree on first, not the technical details
of what should ultimately be done, but that there is a need to agree on
doing something to standardize this, as has been done with varying
levels of success in other languages like C++, Python, and Swift. If
Java is different in that sense, could you explain why it needs to *not*
provide a standard way of doing these kinds of things?

Samuel

@mlchung
Copy link
Member

mlchung commented Apr 30, 2020

I like this simpler solution to let GC manage the native library unloading as in System::loadLibrary. I'm happy to see that LibraryLookup::ofLibrary and other factory methods no longer need the Lookup parameter as the context.

@mlbridge
Copy link

mlbridge bot commented Apr 30, 2020

Mailing list message from Maurizio Cimadamore on panama-dev:

Right, it's not perfect, but I think these kinds of issues are
solvable, if we're willing to spend time and work on them. For
example, if something like `PointerScope` could be integrated into the
Java language itself, we would be able to guarantee that what you
describe above never happens, making everything thread-safe. I don't
see any limitations in that regards, but I may be missing something.
Could you provide an example that fails? Or is there just concern
about the performance hit that could be incurred (in which case I'd
still say "let's work on it")?

This is the very central issue we're trying to address with memory
segments: how do you allow access to a segment from multiple threads while:

* retaining access performances
* retaining deterministic deallocation guarantees

It's not a theoretical problem. If you want to make your JavaCPP code
safe you need to add a mutex so that threads can either access memory OR
deallocate. Then I'm sure you won't be happy with the numbers that will
come out of your benchmarks.

Other solutions "include" something like reference counting, but not the
same reference counting you do in your API. That is, if a thread wants
to use a "pointer" (in your API) you must create a new instance just for
that thread, you can't just increment some shared counter on the
original pointer. That is, the _new_ thread must _not_ have access to
the original pointer. Otherwise it is possible for people to write code
where they don't call "retain" and they happily access the pointer from
multiple threads, but the pointer doesn't know about it.

While something like that might be made to work (we had something
similar with our MemorySegment::acquire), it is not very appealing from
an API perspective, as it creates "trees" of associated
segments/pointers where the parent cannot be deallocated until all
children are.

All this to say what I was trying to say before: wrapping up
AtomicInteger inside some ARC abstraction is _not_ a solution to the
problem. First, it doesn't really take that long to implement,? but,
most importantly, having a class which can do "retain"/"release" doesn't
save you from uncooperative clients trying to use an instance from a
different thread w/o calling "retain".

So, I don't see a lot of value for providing such an abstraction in the
JDK. The fact that there are libaries out there which might rely on
reference counting to provide some sort of perceived safety doesn't
automatically make this a candidate for providing something with the
degree of safety that would (and should) be expected from a Java SE API.

Maurizio

@mcimadamore
Copy link
Collaborator Author

/integrate

@openjdk openjdk bot closed this Apr 30, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Ready to be integrated rfr Ready for review labels Apr 30, 2020
@openjdk
Copy link

openjdk bot commented Apr 30, 2020

@mcimadamore The following commits have been pushed to foreign-abi since your change was applied:

  • f84874e: Automatic merge of foreign-memaccess into foreign-abi
  • 98ad8aa: 8244128: Allocations larger than MAX_ALIGN can fail to be sliced to proper size.
  • 63c69a0: Automatic merge of foreign-memaccess into foreign-abi
  • 7fad4ad: 8244127: "memory stomping error" when running java/foreign/TestNative.java on a debug build

Your commit was automatically rebased without conflicts.

Pushed as commit 7b6d993.

@mlbridge
Copy link

mlbridge bot commented May 2, 2020

Mailing list message from Samuel Audet on panama-dev:

On 4/30/20 7:07 PM, Maurizio Cimadamore wrote:

Right, it's not perfect, but I think these kinds of issues are
solvable, if we're willing to spend time and work on them. For
example, if something like `PointerScope` could be integrated into the
Java language itself, we would be able to guarantee that what you
describe above never happens, making everything thread-safe. I don't
see any limitations in that regards, but I may be missing something.
Could you provide an example that fails? Or is there just concern
about the performance hit that could be incurred (in which case I'd
still say "let's work on it")?

This is the very central issue we're trying to address with memory
segments: how do you allow access to a segment from multiple threads while:

* retaining access performances
* retaining deterministic deallocation guarantees

It's not a theoretical problem. If you want to make your JavaCPP code
safe you need to add a mutex so that threads can either access memory OR
deallocate. Then I'm sure you won't be happy with the numbers that will
come out of your benchmarks.

Other solutions "include" something like reference counting, but not the
same reference counting you do in your API. That is, if a thread wants
to use a "pointer" (in your API) you must create a new instance just for
that thread, you can't just increment some shared counter on the
original pointer. That is, the _new_ thread must _not_ have access to
the original pointer. Otherwise it is possible for people to write code
where they don't call "retain" and they happily access the pointer from
multiple threads, but the pointer doesn't know about it.

While something like that might be made to work (we had something
similar with our MemorySegment::acquire), it is not very appealing from
an API perspective, as it creates "trees" of associated
segments/pointers where the parent cannot be deallocated until all
children are.

All this to say what I was trying to say before: wrapping up
AtomicInteger inside some ARC abstraction is _not_ a solution to the
problem. First, it doesn't really take that long to implement,? but,
most importantly, having a class which can do "retain"/"release" doesn't
save you from uncooperative clients trying to use an instance from a
different thread w/o calling "retain".

So, I don't see a lot of value for providing such an abstraction in the
JDK. The fact that there are libaries out there which might rely on
reference counting to provide some sort of perceived safety doesn't
automatically make this a candidate for providing something with the
degree of safety that would (and should) be expected from a Java SE API.

Thank you for bearing with me, but I must seriously be missing
something. Please explain why the following doesn't work:

Main Thread:
{
// some "global" scope
Pointer p = // some constructor
// code emitted by the compiler increments counter of p
p.initSomeMore()
// start threads, etc
// code emitted by the compiler decrements counter of p
}

Thread 1:
// some "local" scope
{
// code emitted by the compiler increments counter of p
p.doSomething()
// code emitted by the compiler decrements counter of p
}

Thread 2:
// some other "local" scope
{
// code emitted by the compiler increments counter of p
p.doSomethingElse()
// code emitted by the compiler decrements counter of p
}

We have only one instance of the object, while locks, if they are
required, only need to happen when accessing the reference counter. The
user *does not* have access to the counter, it *cannot* be manually
incremented or decremented. I'm not saying this is easy to implement
into something meaningful, but I don't see where the roadblocks are.

Samuel

@mlbridge
Copy link

mlbridge bot commented May 2, 2020

Mailing list message from Maurizio Cimadamore on panama-dev:

On 01/05/2020 10:57, Samuel Audet wrote:

On 4/30/20 7:07 PM, Maurizio Cimadamore wrote:

Right, it's not perfect, but I think these kinds of issues are
solvable, if we're willing to spend time and work on them. For
example, if something like `PointerScope` could be integrated into
the Java language itself, we would be able to guarantee that what
you describe above never happens, making everything thread-safe. I
don't see any limitations in that regards, but I may be missing
something. Could you provide an example that fails? Or is there just
concern about the performance hit that could be incurred (in which
case I'd still say "let's work on it")?

This is the very central issue we're trying to address with memory
segments: how do you allow access to a segment from multiple threads
while:

* retaining access performances
* retaining deterministic deallocation guarantees

It's not a theoretical problem. If you want to make your JavaCPP code
safe you need to add a mutex so that threads can either access memory
OR deallocate. Then I'm sure you won't be happy with the numbers that
will come out of your benchmarks.

Other solutions "include" something like reference counting, but not
the same reference counting you do in your API. That is, if a thread
wants to use a "pointer" (in your API) you must create a new instance
just for that thread, you can't just increment some shared counter on
the original pointer. That is, the _new_ thread must _not_ have
access to the original pointer. Otherwise it is possible for people
to write code where they don't call "retain" and they happily access
the pointer from multiple threads, but the pointer doesn't know about
it.

While something like that might be made to work (we had something
similar with our MemorySegment::acquire), it is not very appealing
from an API perspective, as it creates "trees" of associated
segments/pointers where the parent cannot be deallocated until all
children are.

All this to say what I was trying to say before: wrapping up
AtomicInteger inside some ARC abstraction is _not_ a solution to the
problem. First, it doesn't really take that long to implement,? but,
most importantly, having a class which can do "retain"/"release"
doesn't save you from uncooperative clients trying to use an instance
from a different thread w/o calling "retain".

So, I don't see a lot of value for providing such an abstraction in
the JDK. The fact that there are libaries out there which might rely
on reference counting to provide some sort of perceived safety
doesn't automatically make this a candidate for providing something
with the degree of safety that would (and should) be expected from a
Java SE API.

Thank you for bearing with me, but I must seriously be missing
something. Please explain why the following doesn't work:

Are you now proposing a _language_ feature - not just an API?

Maurizio

Main Thread:
{
// some "global" scope
Pointer p = // some constructor
// code emitted by the compiler increments counter of p
p.initSomeMore()
// start threads, etc
// code emitted by the compiler decrements counter of p
}

Thread 1:
// some "local" scope
{
// code emitted by the compiler increments counter of p
p.doSomething()
// code emitted by the compiler decrements counter of p
}

Thread 2:
// some other "local" scope
{
// code emitted by the compiler increments counter of p
p.doSomethingElse()
// code emitted by the compiler decrements counter of p
}

We have only one instance of the object, while locks, if they are
required, only need to happen when accessing the reference counter.
The user *does not* have access to the counter, it *cannot* be
manually incremented or decremented. I'm not saying this is easy to
implement into something meaningful, but I don't see where the
roadblocks are.

Samuel

@mlbridge
Copy link

mlbridge bot commented May 2, 2020

Mailing list message from Samuel Audet on panama-dev:

On 5/1/20 7:34 PM, Maurizio Cimadamore wrote:

On 01/05/2020 10:57, Samuel Audet wrote:

On 4/30/20 7:07 PM, Maurizio Cimadamore wrote:

Right, it's not perfect, but I think these kinds of issues are
solvable, if we're willing to spend time and work on them. For
example, if something like `PointerScope` could be integrated into
the Java language itself, we would be able to guarantee that what
you describe above never happens, making everything thread-safe. I
don't see any limitations in that regards, but I may be missing
something. Could you provide an example that fails? Or is there just
concern about the performance hit that could be incurred (in which
case I'd still say "let's work on it")?

This is the very central issue we're trying to address with memory
segments: how do you allow access to a segment from multiple threads
while:

* retaining access performances
* retaining deterministic deallocation guarantees

It's not a theoretical problem. If you want to make your JavaCPP code
safe you need to add a mutex so that threads can either access memory
OR deallocate. Then I'm sure you won't be happy with the numbers that
will come out of your benchmarks.

Other solutions "include" something like reference counting, but not
the same reference counting you do in your API. That is, if a thread
wants to use a "pointer" (in your API) you must create a new instance
just for that thread, you can't just increment some shared counter on
the original pointer. That is, the _new_ thread must _not_ have
access to the original pointer. Otherwise it is possible for people
to write code where they don't call "retain" and they happily access
the pointer from multiple threads, but the pointer doesn't know about
it.

While something like that might be made to work (we had something
similar with our MemorySegment::acquire), it is not very appealing
from an API perspective, as it creates "trees" of associated
segments/pointers where the parent cannot be deallocated until all
children are.

All this to say what I was trying to say before: wrapping up
AtomicInteger inside some ARC abstraction is _not_ a solution to the
problem. First, it doesn't really take that long to implement,? but,
most importantly, having a class which can do "retain"/"release"
doesn't save you from uncooperative clients trying to use an instance
from a different thread w/o calling "retain".

So, I don't see a lot of value for providing such an abstraction in
the JDK. The fact that there are libaries out there which might rely
on reference counting to provide some sort of perceived safety
doesn't automatically make this a candidate for providing something
with the degree of safety that would (and should) be expected from a
Java SE API.

Thank you for bearing with me, but I must seriously be missing
something. Please explain why the following doesn't work:

Are you now proposing a _language_ feature - not just an API?

Yes, that's what I said above. Do you agree this could work?

I'm not saying that it's going to be easy, but if the Java community
doesn't get onto this train, it will be left behind at some point, and
things will just move on to something else like Python or whatever
platform where applications that require GPUs, accelerators, etc are
welcome. I would really hate it if the best humanity can come up with is
Python.

Samuel

@mlbridge
Copy link

mlbridge bot commented May 5, 2020

Mailing list message from Maurizio Cimadamore on panama-dev:

On 02/05/2020 10:34, Samuel Audet wrote:

On 5/1/20 7:34 PM, Maurizio Cimadamore wrote:

On 01/05/2020 10:57, Samuel Audet wrote:

On 4/30/20 7:07 PM, Maurizio Cimadamore wrote:

Right, it's not perfect, but I think these kinds of issues are
solvable, if we're willing to spend time and work on them. For
example, if something like `PointerScope` could be integrated into
the Java language itself, we would be able to guarantee that what
you describe above never happens, making everything thread-safe. I
don't see any limitations in that regards, but I may be missing
something. Could you provide an example that fails? Or is there
just concern about the performance hit that could be incurred (in
which case I'd still say "let's work on it")?

This is the very central issue we're trying to address with memory
segments: how do you allow access to a segment from multiple
threads while:

* retaining access performances
* retaining deterministic deallocation guarantees

It's not a theoretical problem. If you want to make your JavaCPP
code safe you need to add a mutex so that threads can either access
memory OR deallocate. Then I'm sure you won't be happy with the
numbers that will come out of your benchmarks.

Other solutions "include" something like reference counting, but
not the same reference counting you do in your API. That is, if a
thread wants to use a "pointer" (in your API) you must create a new
instance just for that thread, you can't just increment some shared
counter on the original pointer. That is, the _new_ thread must
_not_ have access to the original pointer. Otherwise it is possible
for people to write code where they don't call "retain" and they
happily access the pointer from multiple threads, but the pointer
doesn't know about it.

While something like that might be made to work (we had something
similar with our MemorySegment::acquire), it is not very appealing
from an API perspective, as it creates "trees" of associated
segments/pointers where the parent cannot be deallocated until all
children are.

All this to say what I was trying to say before: wrapping up
AtomicInteger inside some ARC abstraction is _not_ a solution to
the problem. First, it doesn't really take that long to implement,?
but, most importantly, having a class which can do
"retain"/"release" doesn't save you from uncooperative clients
trying to use an instance from a different thread w/o calling
"retain".

So, I don't see a lot of value for providing such an abstraction in
the JDK. The fact that there are libaries out there which might
rely on reference counting to provide some sort of perceived safety
doesn't automatically make this a candidate for providing something
with the degree of safety that would (and should) be expected from
a Java SE API.

Thank you for bearing with me, but I must seriously be missing
something. Please explain why the following doesn't work:

Are you now proposing a _language_ feature - not just an API?

Yes, that's what I said above. Do you agree this could work?

Changing the language to support reference counting in a language that
is, essentially, garbage-collected, doesn't seem like a great idea.

On top of that, moving the problem onto the language typically amounts
at 'kicking the can down the road'. Constructs like try-with-resources
are already quite convoluted in their attempt to try and detect all the
possible cases in which things could go wrong - see the code that javac
generates for them here

And, despite all that, there are still cases where the close() method in
a TWR can be skipped - and where, if we had to support something
similar, a language change alone would not be sufficient, but a deeper
change to the VM (and GC) would be required if we were to ensure that
the semantics of these 'auto-closeable' entities are correctly managed.
Project Loom explored many of these topics and concluded that TWR alone
was not sufficient to handle the use cases they had (as there was, for
instance, no guarantee that a thread could not be stopped in the middle
of a close()? ).

To which you might say "why don't you just do all that?" - and the
answer (which you won't like) is that it feels to me that we are in a
land of diminishing return, where you have to do a lot of effort in
order to do something that is only marginally better to what you could
achieve _today_ with a slightly different API: if your pointer access
API is separated from the pointer API itself, then doing what you
describe can be achieved like this:

Pointer p ....
p.withHandle(PointerHandle ph -> {
???? ph.get(...)
???? ph.set(...)
});
p.deallocate() // will fail if there are any pending handles

This is a relatively pleasing API - which has the added benefit that it
makes it extra obvious as to which operations are allowed on with
entities (e.g. you cannot just call `get` on a pointer, it just isn't
there!). This is much simpler for the user to understand (IMHO) than an
API which say "there is a `get` method here, but to call it safely you
must wrap access inside some kind of try-pointer block provided by the
language".

Maurizio

I'm not saying that it's going to be easy, but if the Java community
doesn't get onto this train, it will be left behind at some point, and
things will just move on to something else like Python or whatever
platform where applications that require GPUs, accelerators, etc are
welcome. I would really hate it if the best humanity can come up with
is Python.

Samuel

@mlbridge
Copy link

mlbridge bot commented May 5, 2020

Mailing list message from Samuel Audet on panama-dev:

Hi, Maurizio,

On 5/5/20 7:32 PM, Maurizio Cimadamore wrote:

To which you might say "why don't you just do all that?" - and the
answer (which you won't like) is that it feels to me that we are in a
land of diminishing return, where you have to do a lot of effort in
order to do something that is only marginally better to what you could
achieve _today_ with a slightly different API: if your pointer access
API is separated from the pointer API itself, then doing what you
describe can be achieved like this:

Pointer p ....
p.withHandle(PointerHandle ph -> {
???? ph.get(...)
???? ph.set(...)
});
p.deallocate() // will fail if there are any pending handles

This is a relatively pleasing API - which has the added benefit that it
makes it extra obvious as to which operations are allowed on with
entities (e.g. you cannot just call `get` on a pointer, it just isn't
there!). This is much simpler for the user to understand (IMHO) than an
API which say "there is a `get` method here, but to call it safely you
must wrap access inside some kind of try-pointer block provided by the
language".

That actually looks good, I agree, but I still don't see why that kind
of pattern could not be made (more) standard. You're using different
APIs for memory layouts and library loading, why? In technical terms,
what prevents us from using something similar for memory layouts,
library loading, or anything else users would like to support, such as
GPU resources?

It sounds like you're saying that this pattern is so easy to implement
that everyone should reimplement their own custom versions, but I don't
consider this to be obvious given that everyone else on other platforms
standardizes resource management, either at the language level or by
using some sort of standard API. However, even as part of the JDK,
you're not even using the same API for memory layouts and library
loading! In other words, why not make it (even) easier to implement for
*any* type of resources? What are the roadblocks that we should start to
look at?

That's basically my stance. I'm not trying to push solutions, I'm just
trying to drive the point home that we need some sort of solution. GPUs,
FPGAs, DSPs, and other accelerators in general are not going to become
magically irrelevant simply because OpenJDK does not consider them
important! They are important, they are here to stay, and their
importance is only going to continue to grow.

Again, I'm not saying that all of this is going to be easy to figure
out. What I would really like to do is to get a discussion started about
this.

Samuel

@mlbridge
Copy link

mlbridge bot commented May 5, 2020

Mailing list message from Maurizio Cimadamore on panama-dev:

On 05/05/2020 23:56, Samuel Audet wrote:

you're not even using the same API for memory layouts and library
loading! In other words, why not make it (even) easier to implement
for *any* type of resources? What are the roadblocks that we should
start to look at?

I think we're going in circles. The PR summary explains why we can't use
the same mechanism for segment deallocation (not layouts, as you say,
which have no such needs) and native libraries. It all goes back to
confinement.

Even with the API I proposed like this:

Pointer p ....
p.withHandle(PointerHandle ph -> {
????? ph.get(...)
????? ph.set(...)
});
p.deallocate() // will fail if there are any pending handles

There is a big caveat; this idiom only guarantees 100% safety is the
pointer handle you obtain is unusable from other threads. This
restriction might be fine for memory access - after all, a thread might
get a pointer to some struct, might want to read/write contents, and do
that in a lexically scoped way. That's the principle behind scopes.

But I simply don't buy that what works for memory works for libraries.
If you keep pulling on this string that means that, to be able to use a
native library, each thread would have to create its own private version
of the library method it wants to use, which seems overkill.

I'm all for unification and minimizing abstractions... where it makes
sense. In this case I just don't see it. If we come across another
abstraction which shares similar aspects to the way in which memory
segments are handled (and as I noted, Project Loom is exploring _a lot_
of this stuff), we might provide some common abstraction in that
direction, but I can say with certainty that it won't look like a
ref-counting API abstraction (which is what you asked for at the
beginning of this thread).

Maurizio

@mlbridge
Copy link

mlbridge bot commented May 5, 2020

Mailing list message from Maurizio Cimadamore on panama-dev:

On 05/05/2020 23:56, Samuel Audet wrote:

I'm just trying to drive the point home that we need some sort of
solution. GPUs, FPGAs, DSPs, and other accelerators in general are not
going to become magically irrelevant simply because OpenJDK does not
consider them important! They are important, they are here to stay,
and their importance is only going to continue to grow.

We are aware of that, and nobody has really mentioned that said devices
are not considered as important (and I think you should really stop
making absurd claims without any evidence to back them up). I think the
memory access API makes it fairly easy to create an ad-hoc memory
segment backed by e.g. GPU memory - I've demonstrated how easy it is to
wire things up and create your own memory sources:

https://gist.github.com/mcimadamore/128ee904157bb6c729a10596e69edffd

Now, replace mmap/munmap with cudaMalloc/cudaFree and you will have a
MemorySegment that can be used to model GPU memory. All the lifecycle
aspects of "traditional", off-heap memory segments can in fact translate
onto this ad-hoc segment, so that its use can be made safe.

Of course the memory access API is a building block - together with ABI
support (another building block) it allows you to model and manipulate
memory sources (of all kinds, provided you have some native library to
interact with it); if you are looking for an high-end Cuda-like GPU
library port written in Java, Panama simply isn't the place to look for
it. But it should be possible (and hopefully easier) to build one given
the tools we're building.

Maurizio

@mlbridge
Copy link

mlbridge bot commented May 9, 2020

Mailing list message from Samuel Audet on panama-dev:

On 5/6/20 8:36 AM, Maurizio Cimadamore wrote:

On 05/05/2020 23:56, Samuel Audet wrote:

I'm just trying to drive the point home that we need some sort of
solution. GPUs, FPGAs, DSPs, and other accelerators in general are not
going to become magically irrelevant simply because OpenJDK does not
consider them important! They are important, they are here to stay,
and their importance is only going to continue to grow.

We are aware of that, and nobody has really mentioned that said devices
are not considered as important (and I think you should really stop
making absurd claims without any evidence to back them up). I think the

I'm sorry if I'm making absurd claims about information that you're not
making available publicly :) It would be nice to get a roadmap of some
sort, even if it's just to mention: "Hey, we're actually not ignoring
these things!"

memory access API makes it fairly easy to create an ad-hoc memory
segment backed by e.g. GPU memory - I've demonstrated how easy it is to
wire things up and create your own memory sources:

https://gist.github.com/mcimadamore/128ee904157bb6c729a10596e69edffd

Now, replace mmap/munmap with cudaMalloc/cudaFree and you will have a
MemorySegment that can be used to model GPU memory. All the lifecycle
aspects of "traditional", off-heap memory segments can in fact translate
onto this ad-hoc segment, so that its use can be made safe.

That looks like a good starting point, yes. Are saying that this is
intended to be a public API that end users can use to replace
mmap/munmap with not only cudaMalloc/cudaFree but whatever they might wish?

Let's assume this is going to be all public. The next thing that worries
me is about simultaneous access from multiple threads. We have no such
restrictions in C++, so that is bound to cause issues down the road.
Does OpenJDK intend to force this onto the Java community in a similar
fashion to JPMS? Or are you open for debate on this, and other points?

Of course the memory access API is a building block - together with ABI
support (another building block) it allows you to model and manipulate
memory sources (of all kinds, provided you have some native library to
interact with it); if you are looking for an high-end Cuda-like GPU
library port written in Java, Panama simply isn't the place to look for
it. But it should be possible (and hopefully easier) to build one given
the tools we're building.

Right, that's how I see it, but your lack of reply to my query about the
intended usability of these APIs here concerns me:
https://github.com/bytedeco/javacpp/issues/391#issuecomment-623030899

Samuel

@mlbridge
Copy link

mlbridge bot commented May 11, 2020

Mailing list message from Maurizio Cimadamore on panama-dev:

On 09/05/2020 03:16, Samuel Audet wrote:

On 5/6/20 8:36 AM, Maurizio Cimadamore wrote:

On 05/05/2020 23:56, Samuel Audet wrote:

I'm just trying to drive the point home that we need some sort of
solution. GPUs, FPGAs, DSPs, and other accelerators in general are
not going to become magically irrelevant simply because OpenJDK does
not consider them important! They are important, they are here to
stay, and their importance is only going to continue to grow.

We are aware of that, and nobody has really mentioned that said
devices are not considered as important (and I think you should
really stop making absurd claims without any evidence to back them
up). I think the

I'm sorry if I'm making absurd claims about information that you're
not making available publicly :) It would be nice to get a roadmap of
some sort, even if it's just to mention: "Hey, we're actually not
ignoring these things!"

memory access API makes it fairly easy to create an ad-hoc memory
segment backed by e.g. GPU memory - I've demonstrated how easy it is
to wire things up and create your own memory sources:

https://gist.github.com/mcimadamore/128ee904157bb6c729a10596e69edffd

Now, replace mmap/munmap with cudaMalloc/cudaFree and you will have a
MemorySegment that can be used to model GPU memory. All the lifecycle
aspects of "traditional", off-heap memory segments can in fact
translate onto this ad-hoc segment, so that its use can be made safe.

That looks like a good starting point, yes. Are saying that this is
intended to be a public API that end users can use to replace
mmap/munmap with not only cudaMalloc/cudaFree but whatever they might
wish?
That's the spirit, yes. We have to figure out how to make this piece of
"more unsafe API" cohexist with the rest of the API, but that's the
direction.

Let's assume this is going to be all public. The next thing that
worries me is about simultaneous access from multiple threads. We have
no such restrictions in C++, so that is bound to cause issues down the
road. Does OpenJDK intend to force this onto the Java community in a
similar fashion to JPMS? Or are you open for debate on this, and other
points?
The above method already allows you to create unconfined segments. We
are also exploring (in parallel) very hard ways on how to make these
restrictions either disappear completely (by using some sort of GC-based
handhsake), or be less intrusive (by using a broader definition of
confinement which spans not across a single thread, but across multiple,
logically related, threads).

Of course the memory access API is a building block - together with
ABI support (another building block) it allows you to model and
manipulate memory sources (of all kinds, provided you have some
native library to interact with it); if you are looking for an
high-end Cuda-like GPU library port written in Java, Panama simply
isn't the place to look for it. But it should be possible (and
hopefully easier) to build one given the tools we're building.

Right, that's how I see it, but your lack of reply to my query about
the intended usability of these APIs here concerns me:
https://github.com/bytedeco/javacpp/issues/391#issuecomment-623030899

I didn't see that comment. In general you can attach whatever index
pre-processing capability you want with MemoryHandles.filterCoordinates.
Once you have a function that goes from a logical index (or tuples of
indices) into a index into the basic memory segment you can insert that
function as a filter of the coordinate - and you will get back a var
handle which features the desired access coordinates, with the right
behavior.

In your example the filtering function could be something like this
(taken from your example):

@OverRide
??? public long index(long i, long j, long k) {
??????? return (offsets[0] + hyperslabStrides[0] * (i / blocks[0]) + (i
% blocks[0])) * strides[0]
??????????????? + (offsets[1] + hyperslabStrides[1] * (j / blocks[1]) +
(j % blocks[1])) * strides[1]
??????????????? + (offsets[2] + hyperslabStrides[2] * (k / blocks[2]) +
(k % blocks[2])) * strides[2];
??? }

So, assuming you have a plain indexed var handle whose only coordinate
is a `long` (the offset of the element in the segment to be addressed),
if you attach a method handle wrapping the above method to the such var
handle, you will get back a var handle that takes three longs - in other
words you will go from

VarHandle(MemoryAddress, long)

to

VarHandle(MemoryAddress, long, long, long)

where, on each access, the above function will be computed, yield a long
index value which can then be used to access the underlying memory region.

Maurizio

Samuel

@mlbridge
Copy link

mlbridge bot commented May 12, 2020

Mailing list message from Samuel Audet on panama-dev:

On 5/11/20 7:11 PM, Maurizio Cimadamore wrote:

That looks like a good starting point, yes. Are saying that this is
intended to be a public API that end users can use to replace
mmap/munmap with not only cudaMalloc/cudaFree but whatever they might
wish?
That's the spirit, yes. We have to figure out how to make this piece of
"more unsafe API" cohexist with the rest of the API, but that's the
direction.

Ok, good to hear that. Please do bounce off your ideas about that here
if you can. Right now, for interop with native libraries, with the
current state of MemorySegment, it still wouldn't bring anything more
than what we can already do with sun.misc.Unsafe. It would be pretty sad
if it stayed like that.

Let's assume this is going to be all public. The next thing that
worries me is about simultaneous access from multiple threads. We have
no such restrictions in C++, so that is bound to cause issues down the
road. Does OpenJDK intend to force this onto the Java community in a
similar fashion to JPMS? Or are you open for debate on this, and other
points?
The above method already allows you to create unconfined segments. We
are also exploring (in parallel) very hard ways on how to make these
restrictions either disappear completely (by using some sort of GC-based
handhsake), or be less intrusive (by using a broader definition of
confinement which spans not across a single thread, but across multiple,
logically related, threads).

What about offering an option to do more or less the same thing as what
you've decided to do for library loading? That is the kind of thing that
I was talking about unifying resource management a bit more. There are
use cases when we would like to consider a large buffer the same way as
a loaded library (that is, long lived, probably will never be
deallocated), which is precisely how ByteBuffer is being handled. It
sounds reasonable to me to surface that as an option to the user. For
example, something like this:

MemorySegment segment = MemorySegment.allocateCloseableNative(...);
// will get crappy performance trying to use that across threads
segment.close(); // OK

MemorySegment segment = MemorySegment.allocateLongLivedNative(...);
// will get normal performance trying to use that across threads
segment.close(); // error
// ...
// the GC will clean that up eventually, maybe, maybe not...

Not exactly perfect, but still a step in the right direction.

I didn't see that comment. In general you can attach whatever index
pre-processing capability you want with MemoryHandles.filterCoordinates.
Once you have a function that goes from a logical index (or tuples of
indices) into a index into the basic memory segment you can insert that
function as a filter of the coordinate - and you will get back a var
handle which features the desired access coordinates, with the right
behavior.

Ok, thank you, I've replied there:
https://github.com/bytedeco/javacpp/issues/391#issuecomment-627044880

It would be nice if you were able to make more information public about
what you're planning to do. Leaving the community in the dark about
things like potential avenues for resource management and the "rich
VarHandle combinator API" isn't IMO the best way to build an API that's
supposed to be useful to as many people as possible.

Samuel

@mlbridge
Copy link

mlbridge bot commented May 12, 2020

Mailing list message from Maurizio Cimadamore on panama-dev:

I'm closing down this thread, sorry.

We're going in circles and this has absolutely nothing to do with the
original RFR (yes, you did it again).

Maurizio

On 12/05/2020 02:54, Samuel Audet wrote:

On 5/11/20 7:11 PM, Maurizio Cimadamore wrote:

That looks like a good starting point, yes. Are saying that this is
intended to be a public API that end users can use to replace
mmap/munmap with not only cudaMalloc/cudaFree but whatever they
might wish?
That's the spirit, yes. We have to figure out how to make this piece
of "more unsafe API" cohexist with the rest of the API, but that's
the direction.

Ok, good to hear that. Please do bounce off your ideas about that here
if you can. Right now, for interop with native libraries, with the
current state of MemorySegment, it still wouldn't bring anything more
than what we can already do with sun.misc.Unsafe. It would be pretty
sad if it stayed like that.

Let's assume this is going to be all public. The next thing that
worries me is about simultaneous access from multiple threads. We
have no such restrictions in C++, so that is bound to cause issues
down the road. Does OpenJDK intend to force this onto the Java
community in a similar fashion to JPMS? Or are you open for debate
on this, and other points?
The above method already allows you to create unconfined segments. We
are also exploring (in parallel) very hard ways on how to make these
restrictions either disappear completely (by using some sort of
GC-based handhsake), or be less intrusive (by using a broader
definition of confinement which spans not across a single thread, but
across multiple, logically related, threads).

What about offering an option to do more or less the same thing as
what you've decided to do for library loading? That is the kind of
thing that I was talking about unifying resource management a bit
more. There are use cases when we would like to consider a large
buffer the same way as a loaded library (that is, long lived, probably
will never be deallocated), which is precisely how ByteBuffer is being
handled. It sounds reasonable to me to surface that as an option to
the user. For example, something like this:

MemorySegment segment = MemorySegment.allocateCloseableNative(...);
// will get crappy performance trying to use that across threads
segment.close(); // OK

MemorySegment segment = MemorySegment.allocateLongLivedNative(...);
// will get normal performance trying to use that across threads
segment.close(); // error
// ...
// the GC will clean that up eventually, maybe, maybe not...

Not exactly perfect, but still a step in the right direction.

I didn't see that comment. In general you can attach whatever index
pre-processing capability you want with
MemoryHandles.filterCoordinates. Once you have a function that goes
from a logical index (or tuples of indices) into a index into the
basic memory segment you can insert that function as a filter of the
coordinate - and you will get back a var handle which features the
desired access coordinates, with the right behavior.

Ok, thank you, I've replied there:
https://github.com/bytedeco/javacpp/issues/391#issuecomment-627044880

It would be nice if you were able to make more information public
about what you're planning to do. Leaving the community in the dark
about things like potential avenues for resource management and the
"rich VarHandle combinator API" isn't IMO the best way to build an API
that's supposed to be useful to as many people as possible.

Samuel

@mlbridge
Copy link

mlbridge bot commented May 13, 2020

Mailing list message from Samuel Audet on panama-dev:

I don't feel that we're going in circles, I thought we were making
progress actually. I fully disagree that this has nothing to do with the
original RFR. I still firmly stand by my opinion that managing
resources, whether they are loaded libraries or memory segments, or
anything else, should be unified in a central framework of sorts. I'm
sorry you feel differently. Your acrimony towards me though reinforces
the feeling I have that OpenJDK is going the wrong way and that it is
unwilling to deal with the needs of the community at large...

Samuel

On 5/12/20 7:13 PM, Maurizio Cimadamore wrote:

I'm closing down this thread, sorry.

We're going in circles and this has absolutely nothing to do with the
original RFR (yes, you did it again).

Maurizio

On 12/05/2020 02:54, Samuel Audet wrote:

On 5/11/20 7:11 PM, Maurizio Cimadamore wrote:

That looks like a good starting point, yes. Are saying that this is
intended to be a public API that end users can use to replace
mmap/munmap with not only cudaMalloc/cudaFree but whatever they
might wish?
That's the spirit, yes. We have to figure out how to make this piece
of "more unsafe API" cohexist with the rest of the API, but that's
the direction.

Ok, good to hear that. Please do bounce off your ideas about that here
if you can. Right now, for interop with native libraries, with the
current state of MemorySegment, it still wouldn't bring anything more
than what we can already do with sun.misc.Unsafe. It would be pretty
sad if it stayed like that.

Let's assume this is going to be all public. The next thing that
worries me is about simultaneous access from multiple threads. We
have no such restrictions in C++, so that is bound to cause issues
down the road. Does OpenJDK intend to force this onto the Java
community in a similar fashion to JPMS? Or are you open for debate
on this, and other points?
The above method already allows you to create unconfined segments. We
are also exploring (in parallel) very hard ways on how to make these
restrictions either disappear completely (by using some sort of
GC-based handhsake), or be less intrusive (by using a broader
definition of confinement which spans not across a single thread, but
across multiple, logically related, threads).

What about offering an option to do more or less the same thing as
what you've decided to do for library loading? That is the kind of
thing that I was talking about unifying resource management a bit
more. There are use cases when we would like to consider a large
buffer the same way as a loaded library (that is, long lived, probably
will never be deallocated), which is precisely how ByteBuffer is being
handled. It sounds reasonable to me to surface that as an option to
the user. For example, something like this:

MemorySegment segment = MemorySegment.allocateCloseableNative(...);
// will get crappy performance trying to use that across threads
segment.close(); // OK

MemorySegment segment = MemorySegment.allocateLongLivedNative(...);
// will get normal performance trying to use that across threads
segment.close(); // error
// ...
// the GC will clean that up eventually, maybe, maybe not...

Not exactly perfect, but still a step in the right direction.

I didn't see that comment. In general you can attach whatever index
pre-processing capability you want with
MemoryHandles.filterCoordinates. Once you have a function that goes
from a logical index (or tuples of indices) into a index into the
basic memory segment you can insert that function as a filter of the
coordinate - and you will get back a var handle which features the
desired access coordinates, with the right behavior.

Ok, thank you, I've replied there:
https://github.com/bytedeco/javacpp/issues/391#issuecomment-627044880

It would be nice if you were able to make more information public
about what you're planning to do. Leaving the community in the dark
about things like potential avenues for resource management and the
"rich VarHandle combinator API" isn't IMO the best way to build an API
that's supposed to be useful to as many people as possible.

Samuel

@mcimadamore mcimadamore deleted the libraryLoad branch May 15, 2020 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants