Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop checking main entry processability when it is already found #425

Merged
merged 2 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Upgrade to wombat 3.8.6 (#334)
- Fix wombat setup settings (especially `isSW`) (#293)

### Fixed

- Stop checking main entry processability when it is already found (#424)

## [2.1.3] - 2024-11-01

### Changed
Expand Down
6 changes: 5 additions & 1 deletion src/warc2zim/converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,11 @@ def gather_information_from_warc(self):

status_code = get_status_code(record)
if not can_process_status_code(status_code):
if record.rec_type == "response" and self.main_path == zim_path:
if (
not main_page_found
and record.rec_type == "response"
and self.main_path == zim_path
):
raise UnprocessableWarcError(
f"Main URL returned an unprocessable HTTP code: {status_code}"
)
Expand Down
8 changes: 8 additions & 0 deletions test-website/Caddyfile
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,14 @@

respond /502-response 502

respond // "Hello you" 400

respond /double-slash/test1 "Hello you" 200
respond /double-slash//test1 400

respond /double-slash/test2 "Hello you v1" 200
respond /double-slash//test2 "Hello you v2" 200

redir /301-internal-redirect-ok /internal_redirect_target.html 301
redir /301-external-redirect-ok https://www.example.com 301
redir /302-internal-redirect-ok /internal_redirect_target.html 302
Expand Down
2 changes: 1 addition & 1 deletion test-website/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
FROM caddy:2.6.1-alpine
LABEL org.opencontainers.image.source https://github.com/openzim/warc2zim
LABEL org.opencontainers.image.source=https://github.com/openzim/warc2zim

COPY Caddyfile /etc/caddy/Caddyfile

Expand Down
26 changes: 26 additions & 0 deletions test-website/content/double-slash.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<!DOCTYPE html>
<html lang="en">

<head>
<meta charset="utf-8">
<title>Test website</title>
<link rel="apple-touch-icon" sizes="180x180" href="./icons/apple-touch-icon.png">
<link rel="icon" type="image/png" sizes="32x32" href="./icons/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="./icons/favicon-16x16.png">
<link rel="manifest" href="./icons/site.webmanifest">
<link rel="shortcut icon" href="./icons/favicon.ico">
</head>

<body>

<h2>Double slash in URLs</h2>

<a href=".//">.//</a>
<a href="./double-slash//test1">./double-slash//test1</a>
<a href="./double-slash/test1">./double-slash/test1</a>
<a href="./double-slash/test2">./double-slash/test2</a>
<a href="./double-slash//test2">./double-slash//test2</a>

</body>

</html>
1 change: 1 addition & 0 deletions test-website/content/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
<li><a href="./http-equiv-redirect.html">Redirect with http-equiv meta directive</a></li>
<li><a href="./image-srcset.html">Image with srcset</a></li>
<li><a href="./form-get.html">Form GET</a></li>
<li><a href="./double-slash.html">Double Slash</a></li>
</ul>
</body>

Expand Down