training error: OutOfRangeError: End of sequence #60

muxizju · 2018-09-27T03:20:59Z

I use the codes to train my own dataset, but raised this error at sees.run(). The detail printed log is as below in which I changed some args such as net_input_height size and batch_p. my tensorflow version is 1.7. I don't know what's wrong here

Instructions for updating:
Use the retry module or similar alternatives.
2018-09-27 11:12:06,474 [INFO] train: Training using the following parameters:
2018-09-27 11:12:06,474 [INFO] train: batch_k: 4
2018-09-27 11:12:06,474 [INFO] train: batch_p: 8
2018-09-27 11:12:06,474 [INFO] train: checkpoint_frequency: 1000
2018-09-27 11:12:06,474 [INFO] train: crop_augment: False
2018-09-27 11:12:06,474 [INFO] train: decay_start_iteration: 100000
2018-09-27 11:12:06,474 [INFO] train: detailed_logs: False
2018-09-27 11:12:06,474 [INFO] train: embedding_dim: 128
2018-09-27 11:12:06,475 [INFO] train: experiment_root: F:/projector/GestureClassification/TripletBasedGestureRecognition/experiment_root/20180926/
2018-09-27 11:12:06,475 [INFO] train: flip_augment: False
2018-09-27 11:12:06,475 [INFO] train: head_name: fc1024
2018-09-27 11:12:06,475 [INFO] train: image_root: F:/projector/GestureClassification/data/img/20180919/triplet_data/img/
2018-09-27 11:12:06,475 [INFO] train: initial_checkpoint: None
2018-09-27 11:12:06,475 [INFO] train: learning_rate: 0.0003
2018-09-27 11:12:06,475 [INFO] train: loading_threads: 4
2018-09-27 11:12:06,475 [INFO] train: loss: batch_hard
2018-09-27 11:12:06,476 [INFO] train: margin: soft
2018-09-27 11:12:06,476 [INFO] train: metric: euclidean
2018-09-27 11:12:06,476 [INFO] train: model_name: resnet_v1_50
2018-09-27 11:12:06,476 [INFO] train: net_input_height: 64
2018-09-27 11:12:06,476 [INFO] train: net_input_width: 64
2018-09-27 11:12:06,476 [INFO] train: pre_crop_height: 64
2018-09-27 11:12:06,476 [INFO] train: pre_crop_width: 64
2018-09-27 11:12:06,476 [INFO] train: resume: False
2018-09-27 11:12:06,476 [INFO] train: train_iterations: 250000
2018-09-27 11:12:06,476 [INFO] train: train_set: F:/projector/GestureClassification/data/img/20180919/triplet_data/gesture_train.csv
2018-09-27 11:12:07,403 [INFO] tensorflow: Scale of 0 disables regularizer.
2018-09-27 11:12:07,403 [INFO] tensorflow: Scale of 0 disables regularizer.
2018-09-27 11:12:08,569 [WARNING] tensorflow: From F:\projector\GestureClassification\TripletBasedGestureRecognition\triplet-reid\nets\resnet_v1.py:219: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2018-09-27 11:12:08,569 [WARNING] tensorflow: From F:\projector\GestureClassification\TripletBasedGestureRecognition\triplet-reid\nets\resnet_v1.py:219: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\ops\gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2018-09-27 11:12:11.533610: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-09-27 11:12:11.936193: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 1060 5GB major: 6 minor: 1 memoryClockRate(GHz): 1.7085
pciBusID: 0000:01:00.0
totalMemory: 5.00GiB freeMemory: 4.12GiB
2018-09-27 11:12:11.936710: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1423] Adding visible gpu devices: 0
2018-09-27 11:12:14.388590: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-27 11:12:14.388811: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:917] 0
2018-09-27 11:12:14.388948: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:930] 0: N
2018-09-27 11:12:14.415769: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3871 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 5GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-09-27 11:12:16.275624: I T:\src\github\tensorflow\tensorflow\core\kernels\cuda_solvers.cc:159] Creating CudaSolver handles for stream 000001A50E54E080
2018-09-27 11:12:20,572 [INFO] tensorflow: F:/projector/GestureClassification/TripletBasedGestureRecognition/experiment_root/20180926/checkpoint-0 is not in all_model_checkpoint_paths. Manually adding it.
2018-09-27 11:12:20,572 [INFO] tensorflow: F:/projector/GestureClassification/TripletBasedGestureRecognition/experiment_root/20180926/checkpoint-0 is not in all_model_checkpoint_paths. Manually adding it.
2018-09-27 11:12:23,207 [INFO] train: Starting training from iteration 0.

Traceback (most recent call last):
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1327, in _do_call
return fn(*args)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1312, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1420, in _call_tf_sessionrun
status, run_metadata)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,64,64,3], [?], [?]], output_types=[DT_FLOAT, DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "F:/projector/GestureClassification/TripletBasedGestureRecognition/triplet-reid/train.py", line 439, in
main()
File "F:/projector/GestureClassification/TripletBasedGestureRecognition/triplet-reid/train.py", line 393, in main
prec_at_k, endpoints['emb'], losses, fids])
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 905, in run
run_metadata_ptr)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1140, in _run
feed_dict_tensor, options, run_metadata)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1321, in _do_run
run_metadata)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,64,64,3], [?], [?]], output_types=[DT_FLOAT, DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'IteratorGetNext', defined at:
File "F:/projector/GestureClassification/TripletBasedGestureRecognition/triplet-reid/train.py", line 439, in
main()
File "F:/projector/GestureClassification/TripletBasedGestureRecognition/triplet-reid/train.py", line 280, in main
images, fids, pids = dataset.make_one_shot_iterator().get_next()
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 366, in get_next
name=name)), self._output_types,
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 1484, in iterator_get_next
output_shapes=output_shapes, name=name)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\framework\ops.py", line 3290, in create_op
op_def=op_def)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\framework\ops.py", line 1654, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

OutOfRangeError (see above for traceback): End of sequence
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,64,64,3], [?], [?]], output_types=[DT_FLOAT, DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Process finished with exit code 1

muxizju · 2018-09-27T04:46:38Z

I just found the reason. I have only 7 classes or persons in my dataset but I set batch_P as 8.

# Constrain the dataset size to a multiple of the batch-size, so that
# we don't get overlap at the end of each epoch.
dataset = dataset.take((len(unique_pids) // args.batch_p) * args.batch_p)

this step just take(0) as a result and the iteration of data will end at the first iteration then which raise the error mentioned.

It's a silly mistake but I suggest to add a if-else statement to notice this condition

lucasb-eyer · 2018-10-28T16:03:39Z

Thanks for updating with the reason. Indeed we could add code catching this mistake, I'd happily accept a PR doing so!

duyanfang123 · 2019-04-09T08:59:29Z

you do a good job

mazatov · 2020-03-12T15:19:37Z

@muxizju Just came across this as I also have few classes. My question is what happens with the rest of the classes if I say I have 7 classes and Batch_P is 4. What happens with the other 3 remainder classes. Do they get reiterated into the future batches or just ignored?

I just found the reason. I have only 7 classes or persons in my dataset but I set batch_P as 8.
# Constrain the dataset size to a multiple of the batch-size, so that
# we don't get overlap at the end of each epoch.
dataset = dataset.take((len(unique_pids) // args.batch_p) * args.batch_p)
this step just take(0) as a result and the iteration of data will end at the first iteration then which raise the error mentioned.

It's a silly mistake but I suggest to add a if-else statement to notice this condition

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training error: OutOfRangeError: End of sequence #60

training error: OutOfRangeError: End of sequence #60

muxizju commented Sep 27, 2018 •

edited

Loading

muxizju commented Sep 27, 2018

lucasb-eyer commented Oct 28, 2018

duyanfang123 commented Apr 9, 2019

mazatov commented Mar 12, 2020

training error: OutOfRangeError: End of sequence #60

training error: OutOfRangeError: End of sequence #60

Comments

muxizju commented Sep 27, 2018 • edited Loading

muxizju commented Sep 27, 2018

lucasb-eyer commented Oct 28, 2018

duyanfang123 commented Apr 9, 2019

mazatov commented Mar 12, 2020

muxizju commented Sep 27, 2018 •

edited

Loading