You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wonder if there's a way to iteratively train over chunks of input data (or even row by row), manually. We deal with data much larger than RAM and also doesn't fit the table interface -- in short, each "row" can contain many variables, some are vectors with un-fixed length, so we need to compute input to EvoTree on the fly.
The text was updated successfully, but these errors were encountered:
Moelf
changed the title
How to train over input thatis >> RAM?
How to train over input that is >> larger than RAM?
Mar 9, 2022
Support for out of memory data is a feature I'd like to see supported.
Do you have constraints with regard to the storage format of the data? On top of my mind, I'd think of working out of DTable: https://juliaparallel.github.io/Dagger.jl/stable/dtable/ and perhaps integrate with a DataLoader interface if needed. I understand your source data is in another format, yet I can hardly image a totally arbitrary data loader, as boosted trees algorithm assumes that all variables/features are consistently available to all data points.
Would it be reasonable to perform a preprocessing step on your data to bring it in a more structured form like DTable?
yeah well I don't think I can just make a DTable, because variables I'd like to use for BDT is not available in the file, and it's non-trivial selection/transformation to make those on the fly. (but we still need to make them on the fly, staging files is just too cumbersome).
I wonder if there's a way to iteratively train over chunks of input data (or even row by row), manually. We deal with data much larger than RAM and also doesn't fit the table interface -- in short, each "row" can contain many variables, some are vectors with un-fixed length, so we need to compute input to EvoTree on the fly.
The text was updated successfully, but these errors were encountered: