Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request returning no error code and only Index.html file #79

Closed
pedromerry opened this issue Mar 17, 2024 · 2 comments
Closed

Request returning no error code and only Index.html file #79

pedromerry opened this issue Mar 17, 2024 · 2 comments

Comments

@pedromerry
Copy link

pedromerry commented Mar 17, 2024

I'm trying to execute the wayback backpack command to download the 3rd July 2023 snapshot of the "https://projects.ttlexceeded.com/" web page, with no success. The command returns no errors and only downloads a single index.html. When visiting the snapshot on the browser through Web archive I can see the full web page perfectly. Can you help me out? I'm using the '--follow-redirects' switch and don't understand what's happening. Thanks!!
image

@jsvine
Copy link
Owner

jsvine commented Mar 23, 2024

Hi @pedromerry, I'm not sure I understand the specific issue being raised. How does what you see differ from what you'd expect to see?

When I run these commands, I get what I would expect to see:

❯ waybackpack https://projects.ttlexceeded.com/ --follow-redirects --from-date 20230702 --to-date 20230704 --dir wb-test
INFO:waybackpack.pack: Fetching https://projects.ttlexceeded.com/ @ 20230703013039
INFO:waybackpack.pack: Writing to wb-test/20230703013039/projects.ttlexceeded.com/index.html
❯ tree wb-test/
wb-test/
└── 20230703013039
    └── projects.ttlexceeded.com
        └── index.html

2 directories, 1 file

Opening index.html:

Screenshot 2024-03-23 at 12 10 25 PM

Or are you expecting waybackpack to recursively spider every page on that subdomain? If so, unfortunately, that's not part of waybackpack's features; you can try, however, the code in this pull request/fork.

@pedromerry
Copy link
Author

Hello,
Thank you very much for the response, I think the phrase "download the entire Wayback Machine archive for a given URL" got me confused, and as you say, understood it would download recursively all linked files from index.html within the subdomain. I will proceed to close the issue then.
Many thanks,
Pedro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants