You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some datasets and transformations that I want to run that unfortunately won't fit on n1-highmem-16 instances (which is what FlexRS requires). The features are fairly standard scalar features with tft.quantiles analyzer and string features with tft.vocabulary analyzer (but there are a lot of each type of feature). Generally the analyze step will run fine up until the final combine which will typically run on a very small number of machines and cause them to repeatedly OOM.
Of course I can do something like use a larger machine type or even a custom machine type, but these don't work with FlexRS and would be more expensive. I'm generally curious about whether either of the following two options would be viable solutions:
Shard the analyze step by features. So split up the set of features into separate groups and run multiple different analyze steps sequentially, which should hopefully reduce peak memory usage. The challenge would be how to merge the outputs of the analyze steps together at the end.
Add beam resource hints specifically to the problematic combine tasks so that they do not get scheduled to run on the same machine.
Are either of these two options viable or is there a solution that I have not considered yet?
The text was updated successfully, but these errors were encountered:
Quick initial questions -
a. what version of TFT is your pipeline running with
b. same question about beam
c. how many analyzers are defined in the pipeline, and of what types? (how many quantiles, vocabulary, etc.)
a. TFT 1.5.0
b. Beam 2.35.0
c. 1 tft.quantiles analyzer with dimension 840 (and reduce_instance_dims=False), 15 tft.vocabulary analyzers, and 96 tft.experimental.approximate_vocabulary analyzers.
Also, I should add that from my experiments testing these analyzers using DirectRunner it's not the quantiles analyzer that consumes most of the memory, it's the tft.vocabulary analyzers (tested this by disabling different analyzers and measuring the amount of memory allocated)
I have some datasets and transformations that I want to run that unfortunately won't fit on n1-highmem-16 instances (which is what FlexRS requires). The features are fairly standard scalar features with
tft.quantiles
analyzer and string features withtft.vocabulary
analyzer (but there are a lot of each type of feature). Generally the analyze step will run fine up until the final combine which will typically run on a very small number of machines and cause them to repeatedly OOM.Of course I can do something like use a larger machine type or even a custom machine type, but these don't work with FlexRS and would be more expensive. I'm generally curious about whether either of the following two options would be viable solutions:
Are either of these two options viable or is there a solution that I have not considered yet?
The text was updated successfully, but these errors were encountered: