Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to open TCP connection to web.archive.org:443 (No connection could be made because the target machine actively refused it. - connect(2) for "web.archive.org" port 443) (Errno::ECONNREFUSED) #304

Open
ghiathkamel opened this issue Aug 1, 2024 · 5 comments

Comments

@ghiathkamel
Copy link

Hello

The tool not working anymore

Getting snapshot pages....................C:/Ruby26-x64/lib/ruby/2.6.0/net/http.rb:949:in rescue in block in connect': Failed to open TCP connection to web.archive.org:443 (No connection could be made because the target machine actively refused it. - connect(2) for "web.archive.org" port 443) (Errno::ECONNREFUSED) from C:/Ruby26-x64/lib/ruby/2.6.0/net/http.rb:946:in block in connect'
from C:/Ruby26-x64/lib/ruby/2.6.0/timeout.rb:93:in block in timeout' from C:/Ruby26-x64/lib/ruby/2.6.0/timeout.rb:103:in timeout'
from C:/Ruby26-x64/lib/ruby/2.6.0/net/http.rb:945:in connect' from C:/Ruby26-x64/lib/ruby/2.6.0/net/http.rb:930:in do_start'
from C:/Ruby26-x64/lib/ruby/2.6.0/net/http.rb:919:in start' from C:/Ruby26-x64/lib/ruby/2.6.0/open-uri.rb:337:in open_http'
from C:/Ruby26-x64/lib/ruby/2.6.0/open-uri.rb:756:in buffer_open' from C:/Ruby26-x64/lib/ruby/2.6.0/open-uri.rb:226:in block in open_loop'
from C:/Ruby26-x64/lib/ruby/2.6.0/open-uri.rb:224:in catch' from C:/Ruby26-x64/lib/ruby/2.6.0/open-uri.rb:224:in open_loop'
from C:/Ruby26-x64/lib/ruby/2.6.0/open-uri.rb:165:in open_uri' from C:/Ruby26-x64/lib/ruby/2.6.0/open-uri.rb:736:in open'
from C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in get_raw_list_from_api' from C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in block in get_all_snapshots_to_consider'
from C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in times' from C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in get_all_snapshots_to_consider'
from C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:131:in get_file_list_all_timestamps' from C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:158:in get_file_list_by_timestamp'
from C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in `file_list_by_timestamp'

@phil-hudson
Copy link

+1

@niclake
Copy link

niclake commented Aug 6, 2024

Came here to see if anyone else had the same issues. Did web.archive.org nuke the endpoint recently?

@ghiathkamel
Copy link
Author

Came here to see if anyone else had the same issues. Did web.archive.org nuke the endpoint recently?

I think they blocked the downloader

@SHiLLySiT
Copy link

SHiLLySiT commented Aug 9, 2024

This tool is listed on the Archive Wiki so I'd be interested to hear if this was an intended blocking of the tool.

EDIT: Tool hasn't been blocked, but rather I think hasn't been updated to reflect changes on the Wayback Machine. This other issue provides instructions on how to use a fork has the necessary fixes applied.

@GregLeonhardt
Copy link

GregLeonhardt commented Aug 18, 2024

It appears that the wayback server has been overwhelmed by download activity and they are actively attempting to reduce traffic. I have made the following modifications to wayback_machine_downloader to slow it down which significantly reduces but does not eliminate the problem. To enable the downloading of all pages a retry was added for the few connection refused errors that still occur. I suspect that slowing it down even more would also eliminate the errors but this is a compromise between speed and playing nice.

First locate the ruby file by running the following command:
gem env

The source file "wayback_machine_downloader.rb" should be located in one of the GEM PATHS.

With your editor of choice open wayback_machine_downloader.rb

    unless File.exist? file_path
      begin
        structure_dir_path dir_path
        open(file_path, "wb") do |file|
          begin
            URI("https://web.archive.org/web/#{file_timestamp}id_/#{file_url}").open("Accept-Encoding" => "plain") do |uri|
              file.write(uri.read)
            end
          rescue OpenURI::HTTPError => e
            puts "(1) - #{file_url} # #{e}"
            if @all
              file.write(e.io.read)
              puts "(2) - #{file_path} saved anyway."
            end
          rescue StandardError => e
            puts "(3) - #{file_url} # #{e}"
            sleep(30)                                                <<< INSERT
            retry                                                    <<< INSERT
          end
        end
      rescue StandardError => e
        puts "(4) - #{file_url} # #{e}"
      ensure
        if not @all and File.exist?(file_path) and File.size(file_path) == 0
          File.delete(file_path)
          puts "(5) - #{file_path} was empty and was removed."
        end
      end
      semaphore.synchronize do
        @processed_file_count += 1
        puts "(6) - #{file_url} -> #{file_path} (#{@processed_file_count}/#{file_list_by_timestamp.size})"
      end
      sleep(2)                                                       <<< INSERT
    else
      semaphore.synchronize do
        @processed_file_count += 1
        puts "(7) - #{file_url} # #{file_path} already exists. (#{@processed_file_count}/#{file_list_by_timestamp.size})"
      end
    end
  end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants