Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem running the training script #10

Open
jucab opened this issue Sep 27, 2017 · 10 comments
Open

Problem running the training script #10

jucab opened this issue Sep 27, 2017 · 10 comments

Comments

@jucab
Copy link

jucab commented Sep 27, 2017

Hi,
Congrats for your nice work. I am trying to run the training script but I get the following error:
"TypeError: Value passed to parameter 'paddings' has DataType float32 not in list of allowed values: int32, int64"
in gcnetwork.py, when calling
cv = Lambda(getCostVolume, arguments = {'max_d':d/2}, output_shape = (d/2, None, None, num_filters * 2))(unifeature)
Any suggestion?
Thanks,
Julian

@LinHungShi
Copy link
Owner

LinHungShi commented Sep 28, 2017

Do you change the any hyperparameters in hyperparam.json file? What python version do you use?
If you use Python 3.x, the division operation implicitly converts integer to float. What you have to do is to explicitly convert the result of division to integer.
Replace

cv = Lambda(getCostVolume, arguments = {'max_d':d/2}, output_shape = (d/2, None, None, num_filters * 2))(unifeature)

with

cv = Lambda(getCostVolume, arguments = {'max_d':int(d/2)}, output_shape = (int(d/2), None, None, num_filters * 2))(unifeature)

@jucab
Copy link
Author

jucab commented Oct 3, 2017

Yes, I am using Python 3.5 and I have not changed the hyperparameters. I have replaced the code and the previous error has been corrected. However, now I get
"ValueError: Operands could not be broadcast together with shapes (12, None, None, 64) (96, None, None, 64)" in
File "train.py", line 100, in
trainSceneFlowData(hp, tp, up, env, callbacks, weight_path = weight_path)
File "train.py", line 64, in trainSceneFlowData
model = createGCNetwork(hp, tp, pre_weight)
File "src/gcnetwork.py", line 154, in createGCNetwork
disp_map = LearnReg(cv, num_filters, ksize, ds_stride, resnet, padding, highway_func, num_down_conv)
File "src/gcnetwork.py", line 127, in LearnReg
up_convs = add([deconv, down_convs[i+1]])
Thanks

@LinHungShi
Copy link
Owner

Hi, I have updated the code, please download the new version, and run "python train.py" to see if it works.

@jucab
Copy link
Author

jucab commented Oct 9, 2017

Hi. I have had to unify the use of tabs and spaces in the files (python 3.5 complains a lot ...) and I have had to do some minor modifications to the load_pfm file. It is now running although it is very slow. I am running it on a TITAN X but it seems the program is not using it properly. The % of Volatile GPU-Util is most of the time 0% while running. Any idea of what I am missing? Thanks

@LinHungShi
Copy link
Owner

LinHungShi commented Oct 9, 2017 via email

@jucab
Copy link
Author

jucab commented Oct 9, 2017

I think so. With log_device_placement flag I can see that the tasks are assigned to the gpu. Indeed the the gpu memory is allocated

@LinHungShi
Copy link
Owner

I don't know how that happened, but you can look up the similar issue here tensorflow/tensorflow#543

@jucab
Copy link
Author

jucab commented Oct 10, 2017

Thanks. Since I have been changing tabs and spaces, I may have unintentionally change the code. Could you check that the GCnetwork is right in this file? thanks
gcnetwork.py.tar.gz

@jucab
Copy link
Author

jucab commented Oct 10, 2017

I found out the problem. It was in the generator.py file. I messed up with the tabs and spaces. It is running now properly on the GPU.
By the way, I get this warning "UserWarning: Update your fit_generator call to the Keras 2 API: fit_generator(<generator..., validation_steps=880, callbacks=[<keras.ca..., validation_data=<generator..., steps_per_epoch=3520, epochs=50, max_queue_size=1) but I think your are already calling fit_generator on Keras 2, aren't you?

@LinHungShi
Copy link
Owner

It seems they change API in Keras 2.0. This is just a warning though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants