Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content tracking with configurable txn batch size #427

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AFaust
Copy link

@AFaust AFaust commented Jun 9, 2023

This pull requests adds two optional configurations to the content tracking process that allow users / customers to

  • set a transaction ID lookup offset instead of relying on a hard-coded 500 offset
  • set a transaction ID processing batch size for collecting documents to be content-indexed

This purpose of these configurations is to allow optimisations for (re-)indexation processes over Alfresco systems with extremely sparse transaction / content update distributions. This affects e.g. systems undergoing a lot of fine grained updates where content transactions with indexable content updates may be spread substantially. In such systems, the costly getDocsWithUncleanContent operation may often be invoked yielding only a single- to low double-digit number of content-containing nodes to be indexed. This may significantly prolong content indexation as the phases to perform concurrent content indexation are very short and may not even be able to use all allowed concurrent threads in the fork-join pool.

As for default values, both options use the previously hard-coded 500 txn offset, so that there is no difference in behaviour to previous versions. Users / customers with sparse transaction / content update distributions may configure substantially higher values as needed. I personally would recommend that Alfresco consider setting a default value for alfresco.content.txnIdLookupBatchSize that is maybe an order of magnitude larger than for alfresco.content.txnProcessingBatchSize - due to backwards consistency concerns I have not included that in the PR.

@CLAassistant
Copy link

CLAassistant commented Jun 9, 2023

CLA assistant check
All committers have signed the CLA.

@aitseitz
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants