-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADBDEV-6339 Implement files tracking for arenadata_toolkit #1079
base: adb-6.x-dev
Are you sure you want to change the base?
Conversation
ac5cde3
to
0e4ce97
Compare
0e4ce97
to
d349006
Compare
Allure report https://allure.adsw.io/launch/83009 |
Allure report https://allure.adsw.io/launch/83030 |
Allure report https://allure.adsw.io/launch/83066 |
Allure report https://allure.adsw.io/launch/83098 |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Allure report https://allure.adsw.io/launch/86273 |
Allure report https://allure.adsw.io/launch/86294 |
Allure report https://allure.adsw.io/launch/86309 |
gpcontrib/arenadata_toolkit/expected/arenadata_toolkit_tracking.out
Outdated
Show resolved
Hide resolved
Allure report https://allure.adsw.io/launch/86353 |
Allure report https://allure.adsw.io/launch/86430 |
510d87e
Last changes fixes the slow size calculation for AO tables.
|
Allure report https://allure.adsw.io/launch/86831 |
Implement relfilenode tracking for arenadata_toolkit
This patch introduces some enhancements to the arenadata_toolkit for the GPDB,
primarily focusing on tracking relfilenode changes. The key
components of this implementation include: tracking API, hooks processing and
background workers to facilitate initialization tasks on both the master and
segment levels. The main concept of implementation is in utilizing Bloom filters
to efficiently track relfilenode changes.
The main purpose of this code is achieving fast database size
calculation and tracking file changes at relation level. The extension
implements a probabilistic tracking system using Bloom filters to monitor file
changes across Greenplum segments. It utilizes shared memory for state
management and employs background workers to maintain consistency.
The original code of toolkit extension has been reorganized due to increased
logic complexity. Now extension is built from several units stored in
gpcontrib/arenadata_toolkit/src folder.
Previous code for calculation relation sizes is stored in dbsize.c. New api
functions related to tracking are in track_files.c. Other units are dedicated
to utility purposes providing sustainable infrastructure for extension.
Generally speaking, extension uses storage manager hooks which catch relations
files changes (create, extend, truncate, unlink) via Bloom filter. We use our own
implementation of bloom filter, which is stored in shared memory. Main hash
function used in Bloom filter is an adapted wyhash function for our specific case.
Transactional semantics works as follow. Each bloom filter is allocated with
double arenadata_toolkit.tracking_bloom_size in order to preserve filter state
in case of track acquisition rollback. Each bloom filter is also assigned a
"version" variable, which represents a simple counter. The main
tracking_get_track function takes a version value from master as an argument,
and after function is dispatched on segments, function compares filter version
on segment and incoming master's version. If they are equal track is acquired
normally, current active filter is copied to local memory and is then switched
to other filter (which is allocated near it in shared memory). Then new active
filter is cleared, and filter version is bumped. If transaction with track
acquisition is committed, master version is bumped as well leading to
consistency with segments. In case of abort, master version stays the same,
and on next tracking_get_track call there will be version conflict, which
indicates that function needs to use previously preserved filter in current
track acquisition.
Relation sizes are calculated via stat call for segment files. However, locks
for the relation are not acquired in order to avoid performance decrease.
If something happen with files, we will just ignore erroneous stat call
and return zero size.