goscrape

A web scraper built with Golang. It downloads the content of a website or blog and allows you to read it offline.

Features and advantages over existing tools like wget, httrack, Teleport Pro:

Free and open source
Available for all platforms that Golang supports
JPEG and PNG images can be converted down in quality to save disk space
Excluded URLS will not be fetched (unlike wget)
No incomplete temp files are left on disk
Downloaded asset files are skipped in a new scraper run
Assets from external domains are downloaded automatically
Sane default values

Limitations:

No GUI version, console only

Installation

You need to have Golang installed, otherwise follow the guide at https://golang.org/doc/install.

go get github.com/cornelk/goscrape

Usage

goscrape http://website.com

Options

Scrape a website and create an offline browsable version on the disk

Usage:
  goscrape http://website.com [flags]

Flags:
      --config string         config file (default is $HOME/.goscrape.yaml)
  -d, --depth uint            download depth, 0 for unlimited (default 10)
  -x, --exclude stringArray   exclude URLs with PERL Regular Expressions support
  -h, --help                  help for goscrape
  -i, --imagequality int      image quality, 0 to disable reencoding
  -n, --include stringArray   only include URLs with PERL Regular Expressions support
  -o, --output string         output directory to write files to
  -t, --timeout uint          time limit in seconds for each http request to connect and read the request body
  -u, --user string           user[:password] to use for authentication
  -v, --verbose               verbose output

Dependencies

github.com/gorilla/css css file tokenizer
github.com/h2non/filetype image format identification
github.com/hashicorp/go-multierror multi error wrapping
github.com/headzoo/surf virtual web browser
github.com/PuerkitoBio/goquery HTML document traversal
github.com/spf13/cobra command line handling
github.com/spf13/viper configuration
go.uber.org/zap logging

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
scraper		scraper
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

goscrape

Installation

Usage

Options

Dependencies

About

Releases

Packages

Languages

License

shevah/goscrape

Folders and files

Latest commit

History

Repository files navigation

goscrape

Installation

Usage

Options

Dependencies

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages