You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add shuffle_dataset_upfront as a config option to CacheActivationsRunnerConfig.
Motivation
Currently, users can cache activations using the CacheActivationsRunner class. However, when caching these activations, an enormous portion of runtime is spent shuffling data pairwise within buffers. In my (highly-unscientific) experiments, shuffling (with default values) was >50% of the runtime, and ended up consistently triggering OOM errors on my GPU.
While it's currently possible to configure the CacheActivationsRunner to avoid shuffling all-together, this might have a training impact on the resulting SAE depending on the organization of the initial dataset.
As such, it would be ideal to allow users to:
Disable all pairwise shuffling between different saved activation tensors.
Shuffle input token sequences upfront. On top of being less data to move around, we only need to shuffle once in this case.
Both of these changes could be enabled with backward-compatible extensions to CacheActivationsRunnerConfig. Simple a new param, shuffle_dataset_upfront (or something) - which must have streaming=False.
Pitch
I made this change - and it resulted in me being able to cache activations! Previously, I would just get OOM errors on my GPU during shuffling (which might be an related bug resulting in not cleaning up old buffers).
Alternatives
Not sure there are too many, if you're aiming for a random order of activations - you need to either shuffle before or during. This adds shuffling before.
Alternatively, the user could be responsible for shuffling and then reuploading the dataset to hugging face before just redownloading - but this is a lot of extra work that we could totally avoid and handle easily with a single extra param.
Checklist
I have checked that there is no similar issue in the repo (required)
The text was updated successfully, but these errors were encountered:
Happy to take a shot at adding this, btw (would be a good early contribution for me) -- but let me know if there's an appetite for it, before I go for it.
I don't think shuffling tokens up front would help. We need activations from different contexts to get mixed. I'd be open to a PR which makes the shuffling less frequent or turns it off so we people can move more quickly sometimes (though the shuffling is supposed to be important according to Anthropic).
Proposal
Add
shuffle_dataset_upfront
as a config option to CacheActivationsRunnerConfig.Motivation
Currently, users can cache activations using the CacheActivationsRunner class. However, when caching these activations, an enormous portion of runtime is spent shuffling data pairwise within buffers. In my (highly-unscientific) experiments, shuffling (with default values) was >50% of the runtime, and ended up consistently triggering OOM errors on my GPU.
While it's currently possible to configure the CacheActivationsRunner to avoid shuffling all-together, this might have a training impact on the resulting SAE depending on the organization of the initial dataset.
As such, it would be ideal to allow users to:
Both of these changes could be enabled with backward-compatible extensions to CacheActivationsRunnerConfig. Simple a new param,
shuffle_dataset_upfront
(or something) - which must have streaming=False.Pitch
I made this change - and it resulted in me being able to cache activations! Previously, I would just get OOM errors on my GPU during shuffling (which might be an related bug resulting in not cleaning up old buffers).
Alternatives
Not sure there are too many, if you're aiming for a random order of activations - you need to either shuffle before or during. This adds shuffling before.
Alternatively, the user could be responsible for shuffling and then reuploading the dataset to hugging face before just redownloading - but this is a lot of extra work that we could totally avoid and handle easily with a single extra param.
Checklist
The text was updated successfully, but these errors were encountered: