Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand file rotation conditions in HKAggregator #400

Open
BrianJKoopman opened this issue Aug 14, 2024 · 3 comments
Open

Expand file rotation conditions in HKAggregator #400

BrianJKoopman opened this issue Aug 14, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@BrianJKoopman
Copy link
Member

Originally from https://github.com/simonsobs/ocs-deployment-configs/issues/172.

For HK, specifically, we're sort of aiming for ~1GB files, but without crossing ctime % 100000 boundaries. So it might be good if aggregator had a mode where it could change files based on a variety of conditions; e.g. if any of:

  • file size reaches some threshold
  • some time has passed (current implementation)
  • some ctime boundary (e.g. ctime % 100000 == 0) is crossed.
@BrianJKoopman BrianJKoopman added the enhancement New feature or request label Aug 14, 2024
@jlashner
Copy link
Collaborator

What requirement is the ctime boundary coming from here? I've always thought that this boundary was soft and it was ok for files to go over. Is this just to make sure small files are eventually written within a day?

@BrianJKoopman
Copy link
Member Author

What requirement is the ctime boundary coming from here? I've always thought that this boundary was soft and it was ok for files to go over. Is this just to make sure small files are eventually written within a day?

I'll defer to @mhasself on that.

@mhasself
Copy link
Member

What requirement is the ctime boundary coming from here? I've always thought that this boundary was soft and it was ok for files to go over. Is this just to make sure small files are eventually written within a day?

I'll defer to @mhasself on that.

Yes, it definitely help in making sure small files are eventually flushed. If you're going to do that, it makes sense to do it on the ctime5 boundaries because a lot of our data organization is by ctime5. For example, HK books can be finalized and archived if we make a point of finalizing data files around ctime5 boundaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants