Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix missing error page and improper redirects #303

Merged
merged 4 commits into from
Jul 23, 2021

Conversation

lordlabuckdas
Copy link
Contributor

As encountered in #230. some websites do not have a 404 template configured or redirect to the home page. To fix, this

  1. An empty page is created
  2. Or if the website redirects, it is caught by comparing the requested and returned URL (ref feature: redirects #302).

Other fixes:

  • The get_headers(response_headers: CIMultiDict) -> headers: list method now returns the content type as well. This was done to optimize the working of Cloner and prevent a call to get_content_type which iterates through the list of headers to fetch the content type. From
    get_headers(response_headers: CIMultiDict) -> headers: list to
    get_headers(response_headers:CIMultiDict) -> list[headers: list, content_type: str]

  • Moved the Content-Type header from headers key in meta info to a separate key
    From

    "/about": {
        "hash": "abcd",
        "headers": [{"Server": "Apache"}, {"Content-Type": "text/html"}]
    }

    to

    "/about": {
        "hash": "abcd",
        "headers": [{"Server": "Apache"}],
        "content_type": "text/html"
    }

    This was done because tanner_handler.py#L111 and snare_helpers.py#L96 already check for it and overwrite the Content-Type header.

snare/cloner.py Outdated
@@ -299,6 +294,13 @@ async def run(self):
async def close(self):
if not self.runner:
raise Exception("Error initializing cloner!")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just spot we have initialization error in close function 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strangely enough, that check is there just to be thorough and satisfy linter warnings. If you are referring to an instance where the error was actually raised, could you please share the steps to replicate it? I've tried it with multiple websites and it seems to work fine.

snare/cloner.py Outdated
@@ -299,6 +294,13 @@ async def run(self):
async def close(self):
if not self.runner:
raise Exception("Error initializing cloner!")
error_file_name, error_file_hash = self.runner._make_filename(self.runner.error_page)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need an empty page? Why not letting aiohttp handle this?
https://docs.aiohttp.org/en/latest/web_exceptions.html

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree instead of generating a new empty page we should just let aiohttp handle that

@afeena afeena requested a review from mzfr July 20, 2021 21:11
Comment on lines +72 to 75
if status_code == 404:
raise web.HTTPNotFound(headers=headers)

return web.Response(body=content, status=status_code, headers=headers)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Special if-check for status code 404 because returning web.Response with status_code=404 did not work as intended.

@lordlabuckdas lordlabuckdas requested a review from afeena July 23, 2021 06:44
@afeena afeena merged commit 8f4919e into mushorg:develop Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants