This lightweight tool is designed to extract comprehensive information from Bugzilla repositories, automating the process of querying Bugzilla’s REST API, ensuring consistent and thorough data collection across different instances of Bugzilla. Along with this tool, this repository helps the research community by providing a robust dataset with all the information about key-projects of Eclipse and Mozilla. This repository contains a Command Line Interface (CLI) that facilitates the extraction of issue reports in a simple and easy-to-understand manner.
- Automated Data Extraction: Automatically fetches detailed bug report information, comments, attachments, and historical changes.
- Customizable Queries: Specify repository, classification, product, and component of interest. Filter issues based on status and resolution.
- Ease of Use: Simple commands to run and configure the tool.
- Error Handling: Handles API rate limits and retries failed requests up to 3 times to ensure data consistency.
The dataset generated by this tool includes:
- Detailed information from Bugzilla repositories for Eclipse and Mozilla projects.
- Historical data from the inception of Bugzilla usage up to November 2024.
- Structured directories for each project, product, and component, with CSV files containing all bug details.
Specifically, the fields provided for each bug are: Issue URL, ID, Alias, Classification, Component, Product, Version, Platform, Op sys, Status, Resolution, Depends on, Dupe of, Blocks, Groups, Flags, Severity, Priority, Deadline, Target Milestone, Creator, Creator Detail, Creation time, Assigned to, Assigned to detail, CC, CC detail, Is CC accessible, Is confirmed, Is open, Is creator accessible, Summary, Description, URL, Whiteboard, Keywords, See also, Last change time, QA contact,History/Activity Log, Comments, Attachments.
Our dataset contains the bug reported for 9 popular products/components from Eclipse and the Core component of Mozilla. Below, we show the selected products along with the number of reports obtain for each one of them.
Repository | Product / Component | Number of reports |
---|---|---|
Eclipse | Platform | 122.497 |
Eclipse | JDT | 63.266 |
Eclipse | CDT | 23.371 |
Eclipse | BIRT | 23.308 |
Eclipse | PDE | 17.639 |
Eclipse | Equinox | 14.559 |
Eclipse | Mylyn | 13.906 |
Eclipse | TPTP | 10.579 |
Eclipse | Papyrus | 13.253 |
Mozilla | Core | 522.355 |
Total number of issues: | 823.733 |
Dataset can be found on Zenodo:
$ git clone https://github.com/lNoelia/Issuex
$ cd Issuex
python -m venv venv
source venv/bin/activate
cp example.env .env
After this step, you should edit this file and specify the repository and project you want to obtain the issues from.
pip install -r requirements.txt
pip install -e ./
Run the tool and specify the status and resolution of the issues to be extracted.
issuex run
Optionally, you can use --from-date "YYYY-MM-DD" to obtain the issues that were created from that date until today.
The default setting will automatically get all the issues from the given repository and the classification, product and component specified in the configuration file.
issuex run:default