Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTX1070:CUDA Error: out of memory #17

Open
Megatron2032 opened this issue Aug 28, 2017 · 10 comments
Open

GTX1070:CUDA Error: out of memory #17

Megatron2032 opened this issue Aug 28, 2017 · 10 comments

Comments

@Megatron2032
Copy link

Megatron2032 commented Aug 28, 2017

GTX1070:7.9G memory
when i run_optimizerset.sh,the train_9180_18360.log displayed errors.

train_9180_18360.log:
2017-08-28 17:31:50.229531: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-08-28 17:31:50.351945: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-28 17:31:50.352224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.8225
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.31GiB
2017-08-28 17:31:50.352235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-08-28 17:31:50.352240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
2017-08-28 17:31:50.352249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
layer filters size input output
0 CUDA Error: out of memory: File exists
CUDA Error: out of memory

@Megatron2032 Megatron2032 changed the title IOError: [Errno 2] No such file or directory: 'jackson-town-square.npy' GTX1070:CUDA Error: out of memory Aug 28, 2017
@Arsey
Copy link

Arsey commented Aug 29, 2017

I'm getting the same error even with 1000 frames video

@ddkang
Copy link
Collaborator

ddkang commented Aug 29, 2017

The system is optimized for a P100 GPU with 16GB of memory. This diff is confirmed to work on a K80, you may need to change 0.8 to much less:

diff --git a/tensorflow/noscope/noscope.cc b/tensorflow/noscope/noscope.cc
index 4cd6a14..98b80e2 100644
--- a/tensorflow/noscope/noscope.cc
+++ b/tensorflow/noscope/noscope.cc
@@ -60,7 +60,7 @@ static tensorflow::Session* InitSession(const std::string& gra                                                                             ph_fname) {
   tensorflow::SessionOptions opts;
   tensorflow::GraphDef graph_def;
   // YOLO needs some memory
-  opts.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(0.9);
+  opts.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(0.8);
   // opts.config.mutable_gpu_options()->set_allow_growth(true);
   tensorflow::Status status = NewSession(opts, &session);
   TF_CHECK_OK(status);

I'd be happy to merge a pull request that automatically detects the amount of memory necessary for YOLOv2 as a fraction of the available GPU memory.

@Arsey
Copy link

Arsey commented Aug 29, 2017

0.8 works fine for GTX 1070 and there is no memory error, but now I'm getting Segmentation fault (core dumped). What can it be?

Update:
The same issue for yolo9000 and tiny-yolo

@ddkang
Copy link
Collaborator

ddkang commented Aug 29, 2017

Please paste the full output log from the run

@Arsey
Copy link

Arsey commented Aug 29, 2017

(noscope) arsey@ml-machine:~/noscope/data/experiments/jackson-town-square/train/jackson-town-square_convnet_128_32_2.pb-non_blocked_mse.src$ ./run_optimizerset.sh 1
2017-08-29 20:17:53.261228: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-08-29 20:17:53.390616: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-29 20:17:53.391159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7465
pciBusID 0000:05:00.0
Total memory: 7.92GiB
Free memory: 7.83GiB
2017-08-29 20:17:53.391171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2017-08-29 20:17:53.391175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
2017-08-29 20:17:53.391180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:05:00.0)
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   608 x 608 x   3   ->   608 x 608 x  32
    1 max          2 x 2 / 2   608 x 608 x  32   ->   304 x 304 x  32
    2 conv     64  3 x 3 / 1   304 x 304 x  32   ->   304 x 304 x  64
    3 max          2 x 2 / 2   304 x 304 x  64   ->   152 x 152 x  64
    4 conv    128  3 x 3 / 1   152 x 152 x  64   ->   152 x 152 x 128
    5 conv     64  1 x 1 / 1   152 x 152 x 128   ->   152 x 152 x  64
    6 conv    128  3 x 3 / 1   152 x 152 x  64   ->   152 x 152 x 128
    7 max          2 x 2 / 2   152 x 152 x 128   ->    76 x  76 x 128
    8 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256
    9 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128
   10 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256
   11 max          2 x 2 / 2    76 x  76 x 256   ->    38 x  38 x 256
   12 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512
   13 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256
   14 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512
   15 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256
   16 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512
   17 max          2 x 2 / 2    38 x  38 x 512   ->    19 x  19 x 512
   18 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024
   19 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512
   20 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024
   21 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512
   22 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024
   23 conv   1024  3 x 3 / 1    19 x  19 x1024   ->    19 x  19 x1024
   24 conv   1024  3 x 3 / 1    19 x  19 x1024   ->    19 x  19 x1024
   25 route  16
   26 conv     64  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x  64
   27 reorg              / 2    38 x  38 x  64   ->    19 x  19 x 256
   28 route  27 24
   29 conv   1024  3 x 3 / 1    19 x  19 x1280   ->    19 x  19 x1024
   30 conv    425  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 425
   31 detection
Loading weights from /home/arsey/projects/darknet/yolo.weights...Done!
Dumping video
./run_optimizerset.sh: line 36: 12270 Segmentation fault      (core dumped) /home/arsey/noscope/tensorflow-noscope/bazel-bin/tensorflow/noscope/noscope --diff_thresh=0 --distill_thresh_lower=0 --distill_thresh_u
pper=0 --skip_small_cnn=0 --skip_diff_detection=0 --skip=30 --avg_fname=/home/arsey/noscope/data/cnn-avg/jackson-town-square.txt --graph=/home/arsey/n
oscope/data/cnn-models/jackson-town-square_convnet_128_32_2.pb --video=/home/arsey/noscope/data/videos/jackson-town-square.mp4 --yolo_cfg=/home/arsey/projects/darknet/cfg/yolo.cfg --yolo_weights=/home/arsey/proj
ects/darknet/yolo.weights --yolo_class=2 --confidence_csv=/home/arsey/noscope/data/experiments/jackson-town-square/train/jackson-town-square_convnet_128_32_2.pb-non_blocked_mse.src/train_${START_FRAME}_${END_FRA
ME}.csv --start_from=${START_FRAME} --nb_frames=$LEN --dumped_videos=/home/arsey/noscope/data/video-cache/jackson-town-square_0_250_1.bin --diff_detection_weights=/dev/null --use_blocked=0 --ref_image=0

real    0m2.665s
user    0m2.176s
sys     0m0.620s

@Arsey
Copy link

Arsey commented Aug 30, 2017

Any thoughts?

@Megatron2032
Copy link
Author

thanks,0.8 is useful for 1070,it works.But,it also have a problem that is no memory error! my computer memory is 8GB.When i run the motherdog.py, if i choose high frames or low target_fp,the problem will appear.I want to use 918000 frames and low target_fp to run motherdog,how to change the code?

@Arsey
Copy link

Arsey commented Aug 30, 2017

Segmentation fault issue was related to the wrong number of frames set for training (250) in noscope_motherdog.py, and a video had 30 frames per second. So the error was appearing in noscope_data.cc inside of a for loop:

  for (size_t i = 0; i < kNbFrames; i++) {
    cap >> frame;
    if (i % kSkip_ == 0) {
      std::cout << "frame: " << i << "\n";
      const size_t ind = i / kSkip_;
      cv::resize(frame, yolo_frame, NoscopeData::kYOLOResol_, 0, 0, cv::INTER_NEAREST);
      cv::resize(frame, diff_frame, NoscopeData::kDiffResol_, 0, 0, cv::INTER_NEAREST);
      cv::resize(frame, dist_frame, NoscopeData::kDistResol_, 0, 0, cv::INTER_NEAREST);
      dist_frame.convertTo(dist_frame_f, CV_32FC3);

      if (!yolo_frame.isContinuous()) {
        throw std::runtime_error("yolo frame is not continuous");
      }
      if (!diff_frame.isContinuous()) {
        throw std::runtime_error("diff frame is not continuous");
      }
      if (!dist_frame.isContinuous()) {
        throw std::runtime_error("dist frame is not conintuous");
      }
      if (!dist_frame_f.isContinuous()) {
        throw std::runtime_error("dist frame f is not continuous");
      }

      memcpy(&yolo_data_[ind * kYOLOFrameSize_], yolo_frame.data, kYOLOFrameSize_);
      memcpy(&diff_data_[ind * kDiffFrameSize_], diff_frame.data, kDiffFrameSize_);
      memcpy(&dist_data_[ind * kDistFrameSize_], dist_frame_f.data, kDistFrameSize_ * sizeof(float));
    }
  }

@ddkang
Copy link
Collaborator

ddkang commented Aug 31, 2017

Unfortunately, the codebase currently assumes videos are 30 FPS.

@Megatron2032
Copy link
Author

I have 8GB memory ,so I use 270000 frames and run run_optimizerset.sh separately in four steps.At last,it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants