Content tracking with configurable txn batch size #427
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull requests adds two optional configurations to the content tracking process that allow users / customers to
This purpose of these configurations is to allow optimisations for (re-)indexation processes over Alfresco systems with extremely sparse transaction / content update distributions. This affects e.g. systems undergoing a lot of fine grained updates where content transactions with indexable content updates may be spread substantially. In such systems, the costly
getDocsWithUncleanContent
operation may often be invoked yielding only a single- to low double-digit number of content-containing nodes to be indexed. This may significantly prolong content indexation as the phases to perform concurrent content indexation are very short and may not even be able to use all allowed concurrent threads in the fork-join pool.As for default values, both options use the previously hard-coded 500 txn offset, so that there is no difference in behaviour to previous versions. Users / customers with sparse transaction / content update distributions may configure substantially higher values as needed. I personally would recommend that Alfresco consider setting a default value for
alfresco.content.txnIdLookupBatchSize
that is maybe an order of magnitude larger than foralfresco.content.txnProcessingBatchSize
- due to backwards consistency concerns I have not included that in the PR.