-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to read directly from hdf5 files (for large datasets) as well as numpy arrays. #12
base: master
Are you sure you want to change the base?
Conversation
Awesome! I didn't know about the shuffle='batch' trick - very neat!
In the current PR, the data are not loaded using multiple workers, or? Regarding the tests: can you
Old: def test_compilefn_train_test_split(tmpdir):
fn = CompileFN(db_name, exp_name,
data_fn=data.data,
model_fn=model.build_model,
....) New import pytest
@pytest.mark.parametrize("data_fn", [data.data, data.data_hdf5])
def test_compilefn_train_test_split(data_fn, tmpdir):
fn = CompileFN(db_name, exp_name,
data_fn=data_fn,
model_fn=model.build_model,
....) |
My apologies, what I meant by multiple workers is using MongoDB with multiple workers with the KMongoTrials function. I'll generate some sample data, and add your suggested changes. |
Okay, so finally got around to this. I could not use the dataset currently being used to test based on the formats of the data, so I used a cifar10 dataset from keras, and wrote some of it to disk. This also made it problematic to incorporate the tests properly, so please have a look and see if things are okay. |
@kryczko do the tests work for you locally? Seems that they fail when checking |
How should I run them locally?
Kevin
… On Jan 3, 2019, at 1:55 PM, Žiga Avsec ***@***.***> wrote:
@kryczko do the tests work for you locally? Seems that they fail when checking fn_test. Can you fix these to make them work with hdf5?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Install pytest and run |
Dear author,
I have found your module quite useful, but I think with my changes to allow reading directly from hdf5 files makes this module much more impactful for deep learning applications with larger datasets.
With only numpy arrays, you're restricted to loading everything in memory. With hdf5 and Keras, this is not the case.
Please let me know if there are tests that I should run. I have already tested some of my code locally and have successfully read directly from hdf5 files with multiple workers concurrently.
Thanks,
Kevin Ryczko