Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle rate limiter & github server-side errors #63

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 29 additions & 10 deletions tap_github.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import singer.bookmarks as bookmarks
import singer.metrics as metrics
import collections
import time

from singer import metadata

Expand Down Expand Up @@ -33,6 +34,9 @@ class AuthException(Exception):
class NotFoundException(Exception):
pass

class HttpException(Exception):
pass

def translate_state(state, catalog, repositories):
'''
This tap used to only support a single repository, in which case the
Expand Down Expand Up @@ -83,19 +87,34 @@ def get_bookmark(state, repo, stream_name, bookmark_key):
return None

def authed_get(source, url, headers={}):
with metrics.http_request_timer(source) as timer:
session.headers.update(headers)
resp = session.request(method='get', url=url)
if resp.status_code == 401:
raise AuthException(resp.text)
if resp.status_code == 403:
raise AuthException(resp.text)
if resp.status_code == 404:
raise NotFoundException(resp.text)
for _ in range(0, 3): # 3 attempts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be replaced with the backoff library?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @luandy64 i'm not familiar with that library, but i'll look into it as soon as possible and tidy up the PR.

Thanks and sorry for the delay!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at this tonight and see if I can put the branch up to speed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luandy64 @osterman could it be that the backoff library doesn't provide an easy way to use headers like: X-RateLimit-Remaining, X-RateLimit-Reset to determine how long to back off: litl/backoff#38?

with metrics.http_request_timer(source) as timer:
session.headers.update(headers)
resp = session.request(method='get', url=url)

# Handle github's rate limited responses
remaining = resp.headers.get('X-RateLimit-Remaining')
time_to_reset = resp.headers.get('X-RateLimit-Reset', time.time() + 60)
if remaining is not None and remaining == 0:
time.sleep(time.time() - time_to_reset)
continue # next attempt

# Handle github's possible failures as retries
if resp.status_code == 502 or resp.status_code == 503:
continue # next attempt

if resp.status_code == 401:
raise AuthException(resp.text)
if resp.status_code == 403:
raise AuthException(resp.text)
if resp.status_code == 404:
raise NotFoundException(resp.text)

timer.tags[metrics.Tag.http_status_code] = resp.status_code
return resp

raise HttpException(resp.text)

def authed_get_all_pages(source, url, headers={}):
while True:
r = authed_get(source, url, headers)
Expand Down Expand Up @@ -187,7 +206,7 @@ def do_discover():

def get_all_releases(schemas, repo_path, state, mdata):
# Releases doesn't seem to have an `updated_at` property, yet can be edited.
# For this reason and since the volume of release can safely be considered low,
# For this reason and since the volume of release can safely be considered low,
# bookmarks were ignored for releases.

with metrics.record_counter('releases') as counter:
Expand Down