Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin possible cause of RejectedExecutionException - unhandled events #53

Open
carlmot opened this issue Sep 29, 2016 · 7 comments
Open

Comments

@carlmot
Copy link

carlmot commented Sep 29, 2016

Bitbucket Data Center v4.5.1 (2 nodes).
Issue occurs during peak usage hours.

In logs a few entries for 'The event queue is full. Stacks for the processing threads follow'
Then many for 'RejectedExecutionException' after the previous occurs.

Atlassian has documented this issue here:
https://confluence.atlassian.com/bitbucketserverkb/rejectedexecutionexception-779171377.html
It points to "The event queue is full. Stacks for the processing threads follow:"
Looks like PR Harmony may be the cause judging by the log (snippet attached) and the code.
*Attempting to confirm issue is due to this plugin (e.g. by disabling the plugin during error events) - unable to do so as of today.

Searched plugins on GitHub for these issues.
Pull Request Notifier had one issue logged and had a fix (Processing on Bitbucket Server's event threads):
tomasbjerre/pull-request-notifier-for-bitbucket#78
Fix added (v 2.10/1.35) use of import java.util.concurrent.ExecutorService and handleEventAsync() instead of handleEvent().

Explanation and fix/proper way to handle events that can take longer than 'a second or two' here:
https://bitbucket.org/izymesdev/workzone/issues/79/buildeventlistener-is-doing-its-processing

PRHarmony_Log.txt

@monitorjbl
Copy link
Owner

Interesting, I wasn't aware the Datacenter version worked that way. So the BuildEventListener can certainly be handled asynchronously but I don't see how the merge checks could. Hopefully that's never an issue, but if it is you can always increase the queue depth.

@bturner
Copy link

bturner commented Jan 3, 2018

@monitorjbl

I'm the principal developer for Bitbucket Server. I just wanted to note that there's nothing special about Data Center with regards to event processing; all versions of Bitbucket Server are the same. They all apply a limit to the number of events that can be queued before they start dropping them, to prevent OutOfMemoryErrors.

From looking over the source of your plugin, it looks like PullRequestListener.findPRByCommitId could be significantly improved to make it more performant. A couple of suggestions:

  • Use a larger page size than 10 when searching for open pull requests. 25 or 50 would produce a significant reduction in requests to the database without an undue impact on memory usage
  • Instead of using start += 10 to advance, use requests.getNextPageRequest()
    • If that returns null, it means there are no more pages
    • Based on that, I'd use a do/while loop instead, where the PageRequest is declared outside the loop instead of the Page<PullRequest>
    • You can also check requests.getIsLastPage() to know when you're at the last page
    • Regardless of exactly how you do it, these checks will prevent you from making a request beyond the last page to detect that there aren't any more results, saving another set of database queries
  • On the PullRequestSearchRequest.Builder, call withProperties(false) before build()ing
    • You're not using PullRequest.getProperties(), so loading them only makes your search slower
    • With this set to false, you could probably increase your page size even further; I might go as high as 100
  • Instead of using PullRequestService.getCommits, use PullRequestService.streamCommits and provide a CommitCallback
    • In your callback, you can check the commit ID against what you're looking for and, if you find it, you can return false; from onCommit. That allows the system to eagerly drop out of the git rev-list process once a match is found, rather than listing up to a million commits (and holding them all in memory to do so!) only to have the code find a match on the second one and discard all the others
    • Using a callback means you might need to use something like a MutableBoolean to allow you to set a flag that can be seen "outside" the callback to indicate that the pull request matched

You're welcome to make some changes and mention me, if you'd like a review.

Best regards,
Bryan Turner
Atlassian Bitbucket

monitorjbl added a commit that referenced this issue Jan 4, 2018
* Using larger page size for open PR searches
* Using PR commit stream instead of list
@monitorjbl
Copy link
Owner

@bturner I've made the changes you suggested. Is there a way I can test this out on a local cluster of Bitbucket Data Center? My biggest problem with supporting that platform is the SDK doesn't seem to support it so I can't test the plugin on it.

@bturner
Copy link

bturner commented Jan 5, 2018

@monitorjbl,

In general you shouldn't need to do anything specific to support Data Center. The only time you need to tailor for it specifically is when the code needs to not execute in parallel when multiple nodes are present (scheduled jobs, for example, that need to only run on one node at most) or when you have cached data that needs to remain consistent across multiple nodes.

Our Building cluster-safe plugins document has a section (toward the bottom) about how to start a local cluster, but I'm not sure it's really clear enough. (Note that while it mentions AMPS 6.1.0, in order to test with Bitbucket Server 5+ you need to be using AMPS 6.3.0+, which includes all the cluster-relevant changes from AMPS 6.1.) Another alternative might be our full Data Center installation guide.

If it proves too difficult to run up a cluster using any of those resources, let me know and I'll see if I can find/write a better walkthrough.

Best regards,
Bryan Turner
Atlassian Bitbucket

@Pistahh
Copy link

Pistahh commented Feb 14, 2018

Hi, we are hit by this issue too. I can see that a fix is already committed but I'm wondering if there is a new add-on release available that has this fix? We are on version 2.4.0.

Thanks,
Istvan

@monitorjbl
Copy link
Owner

v2.5.0 is available now

@Pistahh
Copy link

Pistahh commented Feb 23, 2018

We use v2.5.0 in production now since a couple of days now, it works correctly and we no longer see the "event queue if full" error anymore. Thanks. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants