Trawler

A job scheduler and analysis tool for webscraping (and other) tasks.

Datasources

Curently the following datasources are implemented:

tiktok get video metadata per hashtag, download them and analyse the text using easyOCR
gab (nazi-twitter) crawl posts for user
onionlist download tor-catalogue from onionlist.org
google dorking fint interesting files and download them
facebook posts and reactions scrape facebook posts, comments and reactions (like, heart, etc)

Features

simple configuration of actions/datasources, also from 3rd party modules/repos
job monitoring and scheduling
schedule jobs
sqlite, csv and json browser
separation of datasets/artifacts (one archive per crawl)
scalable amount of workers (also on other machines)

Architecture

Frontend and API

GUI to create and schedule jobs
Displays pending, running and done jobs
Display csv and sqlite datasets

Worker(s)

Can be distributed (workers and c&c on different locations/servers)
Jobs are managed through json files (and can be distrubuted with an adapter like pouchDB)
Multithreaded

Install

Using Docker-Compose

Install using docker-compose by running:

docker-compose up

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
datasources		datasources
models		models
public		public
src		src
tests		tests
utils		utils
.env.template		.env.template
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
api.js		api.js
babel.config.js		babel.config.js
ecosystem.config.js		ecosystem.config.js
package-lock.json		package-lock.json
package.json		package.json
start.sh		start.sh
vue.config.js		vue.config.js
worker.js		worker.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trawler

Datasources

Features

Architecture

Frontend and API

Worker(s)

Install

Using Docker-Compose

About

Releases

Packages

Languages

digitalmethodsinitiative/trawler

Folders and files

Latest commit

History

Repository files navigation

Trawler

Datasources

Features

Architecture

Frontend and API

Worker(s)

Install

Using Docker-Compose

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages