Problem running the training script #10

jucab · 2017-09-27T14:26:28Z

Hi,
Congrats for your nice work. I am trying to run the training script but I get the following error:
"TypeError: Value passed to parameter 'paddings' has DataType float32 not in list of allowed values: int32, int64"
in gcnetwork.py, when calling
cv = Lambda(getCostVolume, arguments = {'max_d':d/2}, output_shape = (d/2, None, None, num_filters * 2))(unifeature)
Any suggestion?
Thanks,
Julian

LinHungShi · 2017-09-28T15:47:16Z

Do you change the any hyperparameters in hyperparam.json file? What python version do you use?
If you use Python 3.x, the division operation implicitly converts integer to float. What you have to do is to explicitly convert the result of division to integer.
Replace

cv = Lambda(getCostVolume, arguments = {'max_d':d/2}, output_shape = (d/2, None, None, num_filters * 2))(unifeature)

with

cv = Lambda(getCostVolume, arguments = {'max_d':int(d/2)}, output_shape = (int(d/2), None, None, num_filters * 2))(unifeature)

jucab · 2017-10-03T10:29:52Z

Yes, I am using Python 3.5 and I have not changed the hyperparameters. I have replaced the code and the previous error has been corrected. However, now I get
"ValueError: Operands could not be broadcast together with shapes (12, None, None, 64) (96, None, None, 64)" in
File "train.py", line 100, in
trainSceneFlowData(hp, tp, up, env, callbacks, weight_path = weight_path)
File "train.py", line 64, in trainSceneFlowData
model = createGCNetwork(hp, tp, pre_weight)
File "src/gcnetwork.py", line 154, in createGCNetwork
disp_map = LearnReg(cv, num_filters, ksize, ds_stride, resnet, padding, highway_func, num_down_conv)
File "src/gcnetwork.py", line 127, in LearnReg
up_convs = add([deconv, down_convs[i+1]])
Thanks

LinHungShi · 2017-10-06T22:28:38Z

Hi, I have updated the code, please download the new version, and run "python train.py" to see if it works.

jucab · 2017-10-09T15:27:17Z

Hi. I have had to unify the use of tabs and spaces in the files (python 3.5 complains a lot ...) and I have had to do some minor modifications to the load_pfm file. It is now running although it is very slow. I am running it on a TITAN X but it seems the program is not using it properly. The % of Volatile GPU-Util is most of the time 0% while running. Any idea of what I am missing? Thanks

LinHungShi · 2017-10-09T15:55:19Z

Are you sure that you’re running the job on GPU?

On Mon, Oct 9, 2017 at 11:27 jucab ***@***.***> wrote: Hi. I have had to unify the use of tabs and spaces in the files (python 3.5 complains a lot ...) and I have had to do some minor modifications to the load_pfm file. It is now running although it is very slow. I am running it on a TITAN X but it seems the program is not using it properly. The % of Volatile GPU-Util is most of the time 0% while running. Any idea of what I am missing? Thanks — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMAM6Uq2N2G8Y22xSBHBRkV-ikdsria8ks5sqjtWgaJpZM4Pl21O> .

-- Hung Shi Lin Data Science Institute, Columbia University, New York, New York, U.S

jucab · 2017-10-09T16:13:20Z

I think so. With log_device_placement flag I can see that the tasks are assigned to the gpu. Indeed the the gpu memory is allocated

LinHungShi · 2017-10-09T16:32:24Z

I don't know how that happened, but you can look up the similar issue here tensorflow/tensorflow#543

jucab · 2017-10-10T10:03:24Z

Thanks. Since I have been changing tabs and spaces, I may have unintentionally change the code. Could you check that the GCnetwork is right in this file? thanks
gcnetwork.py.tar.gz

jucab · 2017-10-10T11:04:41Z

I found out the problem. It was in the generator.py file. I messed up with the tabs and spaces. It is running now properly on the GPU.
By the way, I get this warning "UserWarning: Update your fit_generator call to the Keras 2 API: fit_generator(<generator..., validation_steps=880, callbacks=[<keras.ca..., validation_data=<generator..., steps_per_epoch=3520, epochs=50, max_queue_size=1) but I think your are already calling fit_generator on Keras 2, aren't you?

LinHungShi · 2017-10-10T23:39:01Z

It seems they change API in Keras 2.0. This is just a warning though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem running the training script #10

Problem running the training script #10

jucab commented Sep 27, 2017

LinHungShi commented Sep 28, 2017 •

edited

Loading

jucab commented Oct 3, 2017

LinHungShi commented Oct 6, 2017

jucab commented Oct 9, 2017

LinHungShi commented Oct 9, 2017 via email

jucab commented Oct 9, 2017

LinHungShi commented Oct 9, 2017

jucab commented Oct 10, 2017

jucab commented Oct 10, 2017

LinHungShi commented Oct 10, 2017

Problem running the training script #10

Problem running the training script #10

Comments

jucab commented Sep 27, 2017

LinHungShi commented Sep 28, 2017 • edited Loading

jucab commented Oct 3, 2017

LinHungShi commented Oct 6, 2017

jucab commented Oct 9, 2017

LinHungShi commented Oct 9, 2017 via email

jucab commented Oct 9, 2017

LinHungShi commented Oct 9, 2017

jucab commented Oct 10, 2017

jucab commented Oct 10, 2017

LinHungShi commented Oct 10, 2017

LinHungShi commented Sep 28, 2017 •

edited

Loading