Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection Error Fix #607

Closed
wants to merge 1 commit into from
Closed

Conversation

Saurav-D
Copy link

@Saurav-D Saurav-D commented Oct 14, 2020

Adding on to the PR: #555.
Context: .flush() in _EventLoggerThread create a new connection each time, if there is fluctuation in connection S3 or GCS throws an error and since it is not handled the thread will hang and since the queue is full the training will also hang. The try block added will prevent the thread from getting stuck, instead it waits for the connection to appear again. Since it's a while loop the training wont resume till the connection is established again. Connection variable will make sure the print happens only once.

Refer this issue for more details: #606

I'm unsure if this is the right place to catch the error maybe it can be done individually in GCS and S3 writer.

@codecov-io
Copy link

codecov-io commented Oct 14, 2020

Codecov Report

Merging #607 into master will decrease coverage by 0.17%.
The diff coverage is 56.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #607      +/-   ##
==========================================
- Coverage   80.73%   80.56%   -0.18%     
==========================================
  Files          39       39              
  Lines        2824     2835      +11     
==========================================
+ Hits         2280     2284       +4     
- Misses        544      551       +7     
Impacted Files Coverage Δ
tensorboardX/event_file_writer.py 89.74% <56.25%> (-5.54%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34d1616...7cab42f. Read the comment docs.

@Saurav-D Saurav-D closed this Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants