Skip to content

cyface-de/crawler

Repository files navigation

API Crawler & Processor

Build Status

This project contains two programs:

  • A crawler (or "scraper") which schedules regular execution of the API crawler included in this project.

  • A processor which can be executed on demand to extract all source-destination relations from the crawled data.

These programs were used to monitor the locations of rentable e-scooters in the Dresden, Germany, and extract the source-destination relations from that data.

Changes between versions are found in the Release Section.

The project uses Gradle as the build system.

Deployment

Using Docker

Both programs can be executed in a similar way using docker-compose:

  • Change to the executable folder (e.g. ./executables/crawler)

  • Adjust the settings in docker-compose.yml. In order to crawl data from the API you need to generate an API token.

  • Execute ../../gradlew copyToDockerBuildFolder to build the jar and arrange the docker files locally.

  • Execute docker-compose build to build a new Docker image locally with the most recent jar.

  • Execute docker-compose up -d to start the database(s) and executable.

Using Command-Line

Execute ./gradlew publishAllPublicationsToLocalRepository to build the jar files locally. You can find them in ./build/repo/.

Crawler

Execute java -jar crawler-{$VERSION}-all.jar -lt "LIME_API_TOKEN" or without parameters to see the all, including the optional, parameters.

Processor

Execute java -jar processor-{$VERSION}-all.jar or with another random parameter to see the all, including the optional, parameters.

Licensing

Copyright 2021 Cyface GmbH

This file is part of the Cyface Crawler.

The Cyface Crawler is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

The Cyface Crawler is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with the Cyface Crawler. If not, see http://www.gnu.org/licenses/.