Default 256 for bnd.executor.maximumPoolSize is excessive #6175

tjwatson · 2024-07-08T16:33:15Z

See:

bnd/biz.aQute.bndlib/src/aQute/bnd/osgi/ExecutorGroup.java

Lines 70 to 71 in c9ea777

    
           this(2, Integer.getInteger("bnd.executor.maximumPoolSize", 256) 
        
           	.intValue());

For a very large workspace, like Open Liberty, this can cause massive contention on the Memoize<Workspace> in bndtools.central.Central.workspace when launching the Eclipse workspace, in some cases it can take hours. If I set bnd.executor.maximumPoolSize to something more reasonable (e.g. 16) then my workspace can launch much faster by allowing the threads to make progress without the massive contention.

Here is a thread dump that shows the contention on aQute.bnd.memoize.MemoizingSupplier@a4f5f834 out.threads.txt

It is unclear to me if the contention can somehow be removed, or if a better approach is to give a more reasonable default, like the number of cores available on the system.

The text was updated successfully, but these errors were encountered:

chrisrueger · 2024-07-08T18:48:16Z

Related #5112 PR #5116

chrisrueger · 2024-07-08T19:30:25Z

more reasonable default, like the number of cores available on the system.

I think this is a reasonable suggestion. I tried it out locally, and could not observe a noticable negative difference for our project. Since you could set it via System.property one can set a higher value if needed.

tjwatson · 2024-07-08T20:20:14Z

I think this is a reasonable suggestion. I tried it out locally, and could not observe a noticable negative difference for our project. Since you could set it via System.property one can set a higher value if needed.

I can get a PR going but will likely be next week, unless someone gets to it before then.

pkriens · 2024-07-09T06:56:15Z

@bjhargrave any feedback on this?

pkriens · 2024-07-09T06:58:07Z

I always felt the enormous number was kind of annoying. But I recall vaguely something with download deadlocks. We download stuff massively with a new workspace. Since we use a thread per download, and these threads are suspended while waiting for input, we might have to give them a separate pool?

bjhargrave · 2024-07-09T10:16:16Z

But I recall vaguely something with download deadlocks. We download stuff massively with a new workspace. Since we use a thread per download, and these threads are suspended while waiting for input

This was the case. We did not have non-blocking I/O and we fanned out the many downloads across multiple threads. The repo code uses Promises but not always well. Sometimes the downloads blocked the the Promise callbacks (which generally should complete quickly) on I/O. 256 is a number I made up which was big enough that I did not hang because I was out of threads. I was primarily using the bnd Workspace as my "test" workspace.

You will note that 256 is the MAX pool size not the default pool size.

bnd/biz.aQute.bndlib/src/aQute/bnd/osgi/ExecutorGroup.java

Lines 77 to 79 in c9ea777

    
           executor = new ThreadPoolExecutor(corePoolSize, maximumPoolSize, 60L, TimeUnit.SECONDS, 
        
           	new SynchronousQueue<>(), new ExecutorThreadFactory(defaultThreadFactory, "Bnd-Executor,"), 
        
           	rejectedExecutionHandler);

So you only get 256 threads if something needs that many and this is probably downloads because you have massive repos. As Promise callbacks start blocking, new threads are requested from the pool to make progress. On a smaller workspace, you will probably never use that many (at least after startup).

If you have a problem in Open Liberty, I would suggest you have all the developers set the bnd.executor.maximumPoolSize system property to some value lower than 256 and run for a few months to gather experience on what lower number actually works as you may get deadlocks in the Promises used by the repos if you do not have enough threads to make progress. I would be apprehensive about lowering the default from 256 to some new value until you have some time using this new value. You may be trading one problem for another.

we might have to give them a separate pool

Perhaps moving to the new virtual threads for the executors will help. Then the repo impls could replace Promise usage (async programming) with boring old sync programming. Alternate would be a rewrite of the repo logic around removing blocking in Promise callbacks.

Replacing the memoizing of the workspace to remove contention would also be nice but I am sure the contention would move somewhere deeper into the workspace startup.

The Open Liberty workspace is absurdly large (>2000 projects). One of the repo .maven files has >970 entries in it. Starting this workspace will always take forever since there is sooooo much to do including initializing the repos. Max threads of 16 may work on a previously used workspace but may not on a fresh workspace where the repo's/.m2's caches are empty and all the 970+ bundles needs downloading upon workspace init.

pkriens · 2024-07-09T11:46:48Z

I also recall that there was a solution to steal the current thread to run the callbacks? This tends to even out the load.

I agree Liberty is insanely large but it is a wonderful testbed :-)

bjhargrave · 2024-07-09T11:54:14Z

I also recall that there was a solution to steal the current thread to run the callbacks

This is the default behavior:

https://github.com/osgi/osgi/blob/b32032ea3bb23de7d3b27622dc3865dfc803cef5/org.osgi.util.promise/src/org/osgi/util/promise/PromiseFactory.java#L74-L81

But it does not help when the callback blocks on I/O.

It also only applies when a callback is added to an already resolved Promise. If the Promise is not yet resolved, callbacks are added to a queue and when the Promise is resolved, the callbacks on the queue are dispatched to the executor so they can all end up in different threads so the callback execution is not serialized.

https://github.com/osgi/osgi/blob/b32032ea3bb23de7d3b27622dc3865dfc803cef5/org.osgi.util.promise/src/org/osgi/util/promise/PromiseImpl.java#L145-L151

bjhargrave · 2024-07-09T12:01:53Z

Another idea may be to use an InlineExecutor for some Promises to force their callbacks to be run inline (and serialized). So top-level Promises would use the normal executor, but deeper Promises could use the InlineExecutor. But any Promise callback that blocks on I/O is still going to tie up the thread. Maybe virtual threads can help?

https://github.com/osgi/osgi/blob/b32032ea3bb23de7d3b27622dc3865dfc803cef5/org.osgi.util.promise/src/org/osgi/util/promise/PromiseFactory.java#L368-L381

chrisrueger · 2024-07-11T18:54:17Z

based on

We download stuff massively with a new workspace. Since we use a thread per download, and these threads are suspended while waiting for input, we might have to give them a separate pool?

and

But it does not help when the callback blocks on I/O.

i dug a bit in the code:

It seems all HTTPClients gets their PromiseFactory from Processor.getPromiseFactory(); (see here and here )

bnd/biz.aQute.bndlib/src/aQute/bnd/http/HttpClient.java

Line 100 in 75d7743

promiseFactory = Processor.getPromiseFactory();

And this promiseFactory is passed e.g. to repos:

What about introducing a Processor.getPromiseFactoryForDownloads(); (or something like this) which we call there instead?
And this returns a PromiseFactory which is initialized with a differently sized Executor via

bnd/biz.aQute.bndlib/src/aQute/bnd/osgi/ExecutorGroup.java

Line 74 in 75d7743

public ExecutorGroup(int corePoolSize, int maximumPoolSize) {

Wouldn't that give all "Downloads via HTTPClient" to run in a separate threadpool?

bjhargrave · 2024-07-11T19:38:33Z

So your idea is to use a different, smaller executor pool for HttpClient?

How would you handle running out of threads? The current executors use a SynchronousQueue which will cause a rejection of work when there are no free threads and then the work runs on the caller's thread (RejectedExecutionHandler). This means you are still blocking the caller's thread for I/O. So this new executor for HttpClient would need to use a different queuing model to handle running out of threads which does not block the caller.

chrisrueger · 2024-07-11T20:41:58Z

So your idea is to use a different, smaller executor pool for HttpClient?

Yes, but let's just stay with "different". I shouldn't have mentioned the sizing aspect.

Maybe i misunderstood.
I understood the discussion so far as if the "long running" downloads share the same threadpool as other more "short lived" processes. So separating the downloads from the rest could help... I thought.

Thus i brought up the places in the code up for discussion.

But maybe I'm thinking to simple or haven't understood the problem.

bjhargrave · 2024-07-11T21:05:09Z

Here is a thread dump that shows the contention on aQute.bnd.memoize.MemoizingSupplier@a4f5f834 out.threads.txt

In looking through this, it seems most all of the issues are centered around use of Central.onCnfWorkspace/onAnyWorkspace.

bnd/bndtools.core/src/bndtools/central/Central.java

Lines 353 to 359 in 75d7743

    
           public static Promise<Workspace> onAnyWorkspace(Consumer<? super Workspace> callback) { 
        
           	return callback(anyWorkspaceDeferred.getPromise(), callback, "onAnyWorkspace callback failed"); 
        
           } 
        
           public static Promise<Workspace> onCnfWorkspace(Consumer<? super Workspace> callback) { 
        
           	return callback(cnfWorkspaceDeferred.getPromise(), callback, "onCnfWorkspace callback failed"); 
        
           }

	at org.bndtools.builder.classpath.BndContainerInitializer.lambda$initialize$1(BndContainerInitializer.java:103)
	at org.osgi.util.promise.DeferredPromiseImpl$ThenAccept.accept(DeferredPromiseImpl.java:433)
	at org.osgi.util.promise.DeferredPromiseImpl.result(DeferredPromiseImpl.java:151)
	at org.osgi.util.promise.DeferredPromiseImpl$ThenAccept.run(DeferredPromiseImpl.java:426)
	at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at [email protected]/java.lang.Thread.run(Thread.java:1595)

There are many callbacks queued up (one for each of the 2000 projects) and when the workspace promise is resolved, they all rush to execute consuming threads. And when the threads are all in use, they steal the caller's thread.

	at org.bndtools.builder.classpath.BndContainerInitializer.lambda$initialize$1(BndContainerInitializer.java:103)
	at org.osgi.util.promise.DeferredPromiseImpl$ThenAccept.accept(DeferredPromiseImpl.java:433)
	at org.osgi.util.promise.DeferredPromiseImpl.result(DeferredPromiseImpl.java:151)
	at org.osgi.util.promise.DeferredPromiseImpl$ThenAccept.run(DeferredPromiseImpl.java:426)
	at aQute.bnd.osgi.ExecutorGroup$RejectedExecution.rejectedExecution(ExecutorGroup.java:47)
	at [email protected]/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:841)
	at [email protected]/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1376)

Changing the workspace promises to use a PromiseFactory with a custom executor will help. This could be an InlineExecutor to serialize all the callbacks onto the workspace promise resolver's thread. (Care then needs to be taken that the workspace promise resolver is not the main UI thread as this seems to be the case.) Or it could be an executor with a queue which has a smaller number of threads (16?)

I would recommend pursuing a fix like this as I think it is fairly safe and easy to implement.

bjhargrave · 2024-07-11T21:10:46Z

It would also be good if it could be made so that the workspace promise is resolved AFTER the workspace is memoized. Not sure if that would be thread safe?

bjhargrave · 2024-07-11T21:19:07Z

This is a fun problem. Sorry I am no longer working on Bnd...

kriegfrj · 2024-07-11T22:09:46Z

On Tue, 9 July 2024, 7:46 pm BJ Hargrave, ***@***.***> wrote: we might have to give them a separate pool Perhaps moving to the new virtual threads for the executors will help. Then the repo impls could replace Promise usage (async programming) with boring old sync programming. Alternate would be a rewrite of the repo logic around removing blocking in Promise callbacks.

This seems like an easy win. I have been following Project Loom (though not so much recently) and it seemed like (in theory) it would solve many thread contention issues simply, without the complexity of asynchronous programming and callback chains. I am unsure as to the current state of maturity of this project though - I don't think it made it into core Java 17? —

…

Reply to this email directly, view it on GitHub <#6175 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABXDWF3BBRMKSPGRVALNXPLZLO2AVAVCNFSM6AAAAABKRF6LUGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXGI2DSNBZGM> . You are receiving this because you are subscribed to this thread. Message ID: ***@***.***>

pkriens · 2024-07-12T08:32:35Z

This is a fun problem. Sorry I am no longer working on Bnd...

So are we!!!

tjwatson · 2024-07-12T18:25:55Z

Catching up a bit while on vacation this week. I'm willing to try something out once I get back next week. I can also call on the Open Liberty developers to set the max pool size to something lower. I will also try importing into a fresh workspace with an empty .m2 cache to see how if that causes issues with a smaller pool size.

tjwatson · 2024-07-16T12:21:12Z

I tried starting a new Eclipse workspace against a fresh clone of open-liberty and a fresh empty, local .m2 repository. With the max pool size set to 16 I saw no deadlock. Looking at the jstack during the initial load (which did take a while to download all the maven dependencies) I noticed the downloads didn't even use this thread pool at all. Seemed to be downloaded from an Eclipse workspace jobs thread. And it seemed like only a single thread was doing all the downloads:

"ModalContext" prio=6 Id=78 RUNNABLE
	at [email protected]/java.net.URI.needsNormalization(URI.java:2349)
	at [email protected]/java.net.URI.normalize(URI.java:2566)
	at [email protected]/java.net.URI.relativize(URI.java:2279)
	at [email protected]/java.net.URI.relativize(URI.java:1152)
	at org.eclipse.core.internal.localstore.FileSystemResourceManager.allPathsForLocationNonCanonical(FileSystemResourceManager.java:150)
	at org.eclipse.core.internal.localstore.FileSystemResourceManager.allPathsForLocation(FileSystemResourceManager.java:120)
	at org.eclipse.core.internal.localstore.FileSystemResourceManager.allResourcesFor(FileSystemResourceManager.java:278)
	at org.eclipse.core.internal.resources.WorkspaceRoot.findFilesForLocationURI(WorkspaceRoot.java:110)
	at org.eclipse.core.internal.resources.WorkspaceRoot.findFilesForLocationURI(WorkspaceRoot.java:103)
	at bndtools.central.Central.toFullPath(Central.java:510)
	at bndtools.central.Central.toPath(Central.java:488)
	at org.bndtools.builder.classpath.BndContainerInitializer$Updater.fileToPath(BndContainerInitializer.java:695)
	at org.bndtools.builder.classpath.BndContainerInitializer$Updater.calculateContainersClasspath(BndContainerInitializer.java:358)
	at org.bndtools.builder.classpath.BndContainerInitializer$Updater.calculateProjectClasspath(BndContainerInitializer.java:311)
	at aQute.bnd.build.WorkspaceLock.locked(WorkspaceLock.java:168)
	at aQute.bnd.build.Workspace.readLocked(Workspace.java:1500)
	at org.bndtools.builder.classpath.BndContainerInitializer$Updater.updateClasspathContainer(BndContainerInitializer.java:248)
	at org.bndtools.builder.classpath.BndContainerInitializer.initialize(BndContainerInitializer.java:91)
	at org.eclipse.jdt.internal.core.JavaModelManager.initializeContainer(JavaModelManager.java:3277)
	at org.eclipse.jdt.internal.core.JavaModelManager$11.run(JavaModelManager.java:3165)
	at org.eclipse.jdt.internal.core.JavaModelManager.initializeAllContainers(JavaModelManager.java:3221)
	at org.eclipse.jdt.internal.core.JavaModelManager.getClasspathContainer(JavaModelManager.java:2200)
	at org.eclipse.jdt.core.JavaCore.getClasspathContainer(JavaCore.java:4000)
	at org.eclipse.jdt.internal.core.JavaProject.resolveClasspath(JavaProject.java:3151)
	at org.eclipse.jdt.internal.core.JavaProject.resolveClasspath(JavaProject.java:3315)
	at org.eclipse.jdt.internal.core.JavaProject.getResolvedClasspath(JavaProject.java:2429)
	at org.eclipse.jdt.internal.core.DeltaProcessingState.getRootInfos(DeltaProcessingState.java:334)
	at org.eclipse.jdt.internal.core.DeltaProcessingState.initializeRoots(DeltaProcessingState.java:282)
	at org.eclipse.jdt.internal.core.DeltaProcessor.processResourceDelta(DeltaProcessor.java:1947)
	at org.eclipse.jdt.internal.core.DeltaProcessor.resourceChanged(DeltaProcessor.java:2143)
	at org.eclipse.jdt.internal.core.DeltaProcessingState.resourceChanged(DeltaProcessingState.java:501)
	at org.eclipse.core.internal.events.NotificationManager$1.run(NotificationManager.java:321)
	at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:47)
	at org.eclipse.core.internal.events.NotificationManager.notify(NotificationManager.java:311)
	at org.eclipse.core.internal.events.NotificationManager.broadcastChanges(NotificationManager.java:174)
	at org.eclipse.core.internal.resources.Workspace.broadcastPostChange(Workspace.java:459)
	at org.eclipse.core.internal.resources.Workspace.endOperation(Workspace.java:1587)
	at org.eclipse.core.internal.resources.Resource.refreshLocal(Resource.java:1607)
	at org.eclipse.egit.core.op.ConnectProviderOperation.connectProject(ConnectProviderOperation.java:174)
	at org.eclipse.egit.core.op.ConnectProviderOperation.execute(ConnectProviderOperation.java:115)
	at org.eclipse.egit.ui.internal.clone.ProjectUtils$1.run(ProjectUtils.java:133)
	at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2452)
	at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2472)
	at org.eclipse.egit.ui.internal.clone.ProjectUtils.createProjects(ProjectUtils.java:138)
	at org.eclipse.egit.ui.internal.clone.ProjectUtils.createProjects(ProjectUtils.java:62)
	at org.eclipse.egit.ui.internal.clone.GitImportWizard.importProjects(GitImportWizard.java:254)
	at org.eclipse.egit.ui.internal.clone.GitImportWizard$4.run(GitImportWizard.java:206)
	at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:124)

pkriens · 2024-07-16T12:38:03Z

You can see the parallelism in the Progress monitor.

pkriens · 2024-08-16T15:44:19Z

@tjwatson do we need to do anything or can this be closed?

tjwatson · 2024-08-16T16:38:32Z

I'm still facing issues, even when I reduce the maximumPoolSize. It appears that is not the solution. It may involve one of the other suggestions to fix this. Unfortunately I am not familiar enough with the way BND behaves here to give something a try quickly.

juergen-albert · 2024-08-30T15:53:12Z

I will try to follow up on @bjhargrave suggestion from July 11th, to give central its own threadpool with a reasonable default.

pkriens · 2024-11-04T14:04:55Z

@juergen-albert since we're talking about releasing, will you fix this or should we close this?

pkriens assigned juergen-albert Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default 256 for bnd.executor.maximumPoolSize is excessive #6175

Default 256 for bnd.executor.maximumPoolSize is excessive #6175

tjwatson commented Jul 8, 2024

chrisrueger commented Jul 8, 2024

chrisrueger commented Jul 8, 2024

tjwatson commented Jul 8, 2024

pkriens commented Jul 9, 2024

pkriens commented Jul 9, 2024

bjhargrave commented Jul 9, 2024

pkriens commented Jul 9, 2024

bjhargrave commented Jul 9, 2024 •

edited

Loading

bjhargrave commented Jul 9, 2024 •

edited

Loading

chrisrueger commented Jul 11, 2024 •

edited

Loading

bjhargrave commented Jul 11, 2024 •

edited

Loading

chrisrueger commented Jul 11, 2024

bjhargrave commented Jul 11, 2024 •

edited

Loading

bjhargrave commented Jul 11, 2024

bjhargrave commented Jul 11, 2024

kriegfrj commented Jul 11, 2024 via email

pkriens commented Jul 12, 2024

tjwatson commented Jul 12, 2024

tjwatson commented Jul 16, 2024

pkriens commented Jul 16, 2024

pkriens commented Aug 16, 2024

tjwatson commented Aug 16, 2024

juergen-albert commented Aug 30, 2024

pkriens commented Nov 4, 2024

Default 256 for bnd.executor.maximumPoolSize is excessive #6175

Default 256 for bnd.executor.maximumPoolSize is excessive #6175

Comments

tjwatson commented Jul 8, 2024

chrisrueger commented Jul 8, 2024

chrisrueger commented Jul 8, 2024

tjwatson commented Jul 8, 2024

pkriens commented Jul 9, 2024

pkriens commented Jul 9, 2024

bjhargrave commented Jul 9, 2024

pkriens commented Jul 9, 2024

bjhargrave commented Jul 9, 2024 • edited Loading

bjhargrave commented Jul 9, 2024 • edited Loading

chrisrueger commented Jul 11, 2024 • edited Loading

bjhargrave commented Jul 11, 2024 • edited Loading

chrisrueger commented Jul 11, 2024

bjhargrave commented Jul 11, 2024 • edited Loading

bjhargrave commented Jul 11, 2024

bjhargrave commented Jul 11, 2024

kriegfrj commented Jul 11, 2024 via email

pkriens commented Jul 12, 2024

tjwatson commented Jul 12, 2024

tjwatson commented Jul 16, 2024

pkriens commented Jul 16, 2024

pkriens commented Aug 16, 2024

tjwatson commented Aug 16, 2024

juergen-albert commented Aug 30, 2024

pkriens commented Nov 4, 2024

bjhargrave commented Jul 9, 2024 •

edited

Loading

bjhargrave commented Jul 9, 2024 •

edited

Loading

chrisrueger commented Jul 11, 2024 •

edited

Loading

bjhargrave commented Jul 11, 2024 •

edited

Loading

bjhargrave commented Jul 11, 2024 •

edited

Loading