Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch-prediction across multiple GPUs and more efficient patch-prediction #48

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

apacha
Copy link
Contributor

@apacha apacha commented Aug 30, 2022

In order to batch-binarize thousands of images, I've rewritten the prediction script to allow us to predict around 1500-2000 images per hour on a decent machine with two GPUs.

The proposed changes include:

  • An efficient way to compute the image patches instead of a very inefficient loop
  • Complete removal of the prediction on the down-scaled image as the results are pretty much always worse
  • Batch-prediction code that can binarize an entire directory into a given output directory while preserving the folder structure and skipping images that have already been binarized, to allow stopping and continuing the conversion
  • Multiprocessing batch-prediction across multiple GPUs using the mpire library
  • A fix for the memory-leak that caused mass-binarization to very quickly crash because we were running out of memory on the GPU. With this fix, we are already running the conversion for 16 hours without any crash.
  • Simplified loading of the model removing obsolete session-handling code

Please note:
I know that the code looks completely different now (hopefully more readable) and is probably not 1:1 compatible with the remaining code in your repository, but I tried to put all the relevant changes into this PR and make the code as self-contained as possible to allow you to update the solution as you see fit.

Thanks for sharing the code-base with us. I hope that this PR is of some help to you.

apacha added 3 commits August 30, 2022 10:58
…w type-hints and improved the code-style a little bit by running an auto-formatter on the entire file.
… efficient way and adding support for batch-conversion with multiple GPUs.
@vahidrezanezhad
Copy link
Member

Dear @apacha ,

Few things like possible OOM error , binarization with collection of models and exact processing time improvement will be tested before merging your PR. By the way as Qurator team we would like to thank you for your efforts for improving sbb_binarization tool.

@vahidrezanezhad
Copy link
Member

In order to batch-binarize thousands of images, I've rewritten the prediction script to allow us to predict around 1500-2000 images per hour on a decent machine with two GPUs.

The proposed changes include:

  • An efficient way to compute the image patches instead of a very inefficient loop
  • Complete removal of the prediction on the down-scaled image as the results are pretty much always worse
  • Batch-prediction code that can binarize an entire directory into a given output directory while preserving the folder structure and skipping images that have already been binarized, to allow stopping and continuing the conversion
  • Multiprocessing batch-prediction across multiple GPUs using the mpire library
  • A fix for the memory-leak that caused mass-binarization to very quickly crash because we were running out of memory on the GPU. With this fix, we are already running the conversion for 16 hours without any crash.
  • Simplified loading of the model removing obsolete session-handling code

Please note: I know that the code looks completely different now (hopefully more readable) and is probably not 1:1 compatible with the remaining code in your repository, but I tried to put all the relevant changes into this PR and make the code as self-contained as possible to allow you to update the solution as you see fit.

Thanks for sharing the code-base with us. I hope that this PR is of some help to you.

Dear @apacha ,

I got this error with your pull request

Traceback (most recent call last):
File "/home/vahid/Documents/sbb_binarization/bin_envnew/bin/sbb_binarize", line 8, in
sys.exit(main())
File "/home/vahid/Documents/sbb_binarization/bin_envnew/lib64/python3.6/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/home/vahid/Documents/sbb_binarization/bin_envnew/lib64/python3.6/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/vahid/Documents/sbb_binarization/bin_envnew/lib64/python3.6/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/vahid/Documents/sbb_binarization/bin_envnew/lib64/python3.6/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/vahid/Documents/sbb_binarization/bin_envnew/lib64/python3.6/site-packages/sbb_binarize/cli.py", line 15, in main
SbbBinarizer(model_dir).run(image_path=input_image, use_patches=patches, save=output_image)
TypeError: init() takes 1 positional argument but 2 were given

@apacha
Copy link
Contributor Author

apacha commented Sep 15, 2022

You get this exception as I've rewritten the SbbBinarizer class to split loading the model from initializing the object, see
https://github.com/qurator-spk/sbb_binarization/pull/48/files#diff-3b757c0dba09b4add9cf8c0566db9834fdb19e6af8e5aaf538e8810b1ac613eeR164
I'll update the cli.py as well

@bertsky
Copy link
Contributor

bertsky commented Feb 14, 2023

See apacha#1 for an update of the PR.

@bertsky
Copy link
Contributor

bertsky commented Feb 14, 2023

Not sure if this is the right place for a discussion, but IMO this is not the right approach for efficient prediction yet. We should define a tf.data pipeline, allowing pipelining between our (intensive!) CPU pre- and postprocessing and the GPU side. On that occasion, multithreading on the CPU side (reshaping and contour finding with OpenCV) should be attempted, too...

Also, perhaps at least some of the OpenCV stuff can be ported to cv2.cuda calls – I know, it would still involve multiple additional CPU-GPU transfers (since the cv2.cuda is not in the same graph as the tf/keras part). But at least the GPU would be utilised a bit more.

@bertsky
Copy link
Contributor

bertsky commented Feb 15, 2023

Sorry, in my previous comment I was thinking more about Eynollah than the Binarizer (hence the heavy CPU part). And @apacha's PR does already speed up by an order of magnitude. I can see minor differences between result from tf.image.extract_patches and the old numpy tiling. But the quality is still very good. (I can provide examples before/after, both for the old ensemble model and the 2021 version – let me know if you are interested.) My argument was only meant about the direction of future work we should undertake.

@bertsky
Copy link
Contributor

bertsky commented Feb 15, 2023

tf.data pipelining with heavy CPU processing itself seems to be hard to get right: to get true parallelisation, one probably needs tfaip...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants