Skip to content
Change the repository type filter

All

    Repositories list

    • Software stack with latest Scrapy and updated deps
      Dockerfile
      BSD 3-Clause "New" or "Revised" License
      206220Updated Nov 29, 2024Nov 29, 2024
    • More flexible and featured Frontera scheduler for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      53521Updated Nov 29, 2024Nov 29, 2024
    • Scrapy entrypoint for Scrapinghub job runner
      Python
      BSD 3-Clause "New" or "Revised" License
      162570Updated Nov 29, 2024Nov 29, 2024
    • Python
      BSD 3-Clause "New" or "Revised" License
      151321Updated Nov 27, 2024Nov 27, 2024
    • Python Social Auth - Application - Django
      Python
      BSD 3-Clause "New" or "Revised" License
      380201Updated Nov 18, 2024Nov 18, 2024
    • Page Object pattern for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      2811994Updated Nov 18, 2024Nov 18, 2024
    • python parser for human readable dates
      Python
      BSD 3-Clause "New" or "Revised" License
      4652.6k28849Updated Nov 12, 2024Nov 12, 2024
    • spidermon

      Public
      Scrapy Extension for monitoring spiders execution.
      Python
      BSD 3-Clause "New" or "Revised" License
      98534407Updated Nov 11, 2024Nov 11, 2024
    • extruct

      Public
      Extract embedded metadata from HTML markup
      Python
      BSD 3-Clause "New" or "Revised" License
      1138553814Updated Nov 8, 2024Nov 8, 2024
    • Formasaurus tells you the type of an HTML form and its fields using machine learning
      HTML
      48700Updated Nov 7, 2024Nov 7, 2024
    • Extract price amount and currency symbol from a raw text string
      Python
      BSD 3-Clause "New" or "Revised" License
      50316179Updated Nov 6, 2024Nov 6, 2024
    • Parse numbers written in natural language
      Python
      BSD 3-Clause "New" or "Revised" License
      23109126Updated Oct 23, 2024Oct 23, 2024
    • web-poet

      Public
      Web scraping Page Objects core library
      Python
      BSD 3-Clause "New" or "Revised" License
      15951613Updated Oct 16, 2024Oct 16, 2024
    • andi

      Public
      Library for annotation-based dependency injection
      Python
      BSD 3-Clause "New" or "Revised" License
      52231Updated Oct 16, 2024Oct 16, 2024
    • A python binding for crfsuite
      Python
      MIT License
      221771453Updated Oct 1, 2024Oct 1, 2024
    • streamparse lets you run Python code against real-time streams of data. Integrates with Apache Storm.
      Python
      Apache License 2.0
      218201Updated Sep 20, 2024Sep 20, 2024
    • splash

      Public
      Lightweight, scriptable browser as a service with an HTTP API
      Python
      BSD 3-Clause "New" or "Revised" License
      5124.1k37726Updated Aug 2, 2024Aug 2, 2024
    • A Postgres-backed ContentsManager implementation for IPython
      Python
      Apache License 2.0
      83201Updated Jul 18, 2024Jul 18, 2024
    • Crawl Frontier HCF backend
      Python
      BSD 3-Clause "New" or "Revised" License
      5721Updated Jul 17, 2024Jul 17, 2024
    • shublang

      Public
      Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
      Python
      BSD 3-Clause "New" or "Revised" License
      815236Updated Jul 9, 2024Jul 9, 2024
    • An opinionated fork of the Drone CI system
      Go
      Other
      373005Updated Jul 7, 2024Jul 7, 2024
    • varanus

      Public
      A command line spider monitoring tool
      Python
      7822Updated Jul 6, 2024Jul 6, 2024
    • scrapyrt

      Public
      HTTP API for Scrapy spiders
      Python
      BSD 3-Clause "New" or "Revised" License
      162837246Updated Jun 28, 2024Jun 28, 2024
    • portia

      Public
      Visual scraping for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      1.4k9.3k11119Updated Jun 26, 2024Jun 26, 2024
    • scikit-learn inspired API for CRFsuite
      Python
      215200Updated Jun 18, 2024Jun 18, 2024
    • Python
      MIT License
      2403Updated Jun 17, 2024Jun 17, 2024
    • autologin

      Public
      A project to attempt to automatically login to a website given a single seed
      Python
      Apache License 2.0
      431102Updated Jun 17, 2024Jun 17, 2024
    • Python wrapper for the Intercom API.
      Python
      Other
      143101Updated Jun 17, 2024Jun 17, 2024
    • luigi

      Public
      Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
      Python
      Apache License 2.0
      2.4k401Updated Jun 7, 2024Jun 7, 2024
    • mrjob

      Public
      Run MapReduce jobs on Hadoop or Amazon Web Services
      Python
      Other
      587001Updated Jun 6, 2024Jun 6, 2024