An AI-powered web scraper built for Syria analysts and observers.
Explore the docs »
·
Report Bug
·
Request Feature
Table of Contents
Tracking news and press releases is time-consuming, particularly when dealing with press releases and propaganda from the regime and non-state actors. As an analyst and perennial Syria-watcher, I want to build tools that ease this process - to streamline the gathering, translating and curating of data so more time can be spent analyzing trends and sentiments rather than monitoring social media feeds. Enter this project.
Syria Daily Brief is a web scraper designed to gather posts, articles and press announcements from a variety of Arabic-language sources run by governments, non-state actors and independent press. This project is designed to complement the work of Syria-focused analysts, particularly those who deal with large quantities of open-source data.
The project features a backend API written in Flask/Python and a frontend UI build in React/Javascript/TailwindCSS. It also features a PostgreSQL database with SQLAlchemy as an ORM and Marshmallow for json schema validation, as well as numerous other Python libraries such as Beautiful Soup, Pandas, OpenAI, etc to handle various features. While SDB is a standalone API in and of itself, you can view the frontend GUI here.
This project is currently in active development - check back soon for its first stable release.
Additional libraries/tools: Marshmallow, SQLAlchemy, BeautifulSoup4
-
Collect data from a spectrum of Arabic-language websites:
- Specify timespan for data collection (Last 24 hrs, last week, last 6 months, etc)
- Gather data from a single source or cast a wide net to all available websites/outlets
- Expand data collection to dozens of sources
-
Manage collected data:
- Explore collected data through responsive UI
- Search, tag and filter entries
- Export data to .csv, .xlsx formats
-
Machine translations and summaries:
- Utilize GPT-3.5/GPT-4 to summarize Arabic datasets
- Get quick translations via ArgosTranslate
-
Use responsive UI to view, edit and manage data:
- Save scraped data to personalized collections
- View entries in sortable, editable database
- Search data and tag entries of interest
- Personalize data collection operations from frontend
- Deploy project as offline, cross-platform Electron.js app
Distributed under the MIT License. See LICENSE.txt
for more information.
Justin Clark - @JustinClarkJO - jclarksummit AT gmail DOT com
Project Link: https://github.com/jclark1913/syria-daily-brief