-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Our project was proposed by Dr. Rothermel, head of the Department of Computer Science at North Carolina State University.
As part of his job, Dr. Rothermel needs a way to evaluate the department’s faculty and potential hires. A good metric for the performance of a faculty member is the number of citations that their research papers have received. Dr. Rothermel has been collecting this data manually, searching for the number of times papers are cited on Google Scholar individually. He then was placing the data in an Excel spreadsheet. From there, Dr. Rothermel had been using Excel’s graphing features to evaluate different authors.
The data collection process is tedious; consequently, Dr. Rothermel proposed the creation of a tool to automatically retrieve citation data for papers and to provide a suite of analytical tools for evaluating this data.
Our solution is to create a web application to retrieve and visualize the data. The application accepts lists of papers for authors under the user’s consideration and stores this information in the application’s database. Alternatively, given an author, the system can search Google Scholar for papers. From there, the application scrapes Google Scholar to automatically keep the citation counts for these papers up to date in the background, appending to a history of citation counts. The user can graph author data through the application or alternatively export that data to a CSV file for use in other analytical applications like Microsoft Excel. The user can also create tags and apply these tags to different authors, which allows filtering authors by different categories during analysis.
In this document, the full set of our application’s requirements is described under the following categories: submitting and validating papers, gathering paper metadata, working with tags, displaying statistics, and exporting data. A database schema and a REST API has been implemented in the backend of the application. This database contains entities for Authors, Papers, Tags, Tasks, Issues, and Citation Records. The REST API includes CRUD routes for all these objects as well as some other miscellaneous routes. Importantly, our system utilizes a third-party proxying API for scraping Google Scholar and successfully parses citation data from the results. Apache is set up as a reverse proxy, serving both the frontend and backend. Pages for viewing author details, working with tags, viewing papers, viewing tasks, and creating graphs have been added. And all this has been tested in 39 backend unit tests using pytest with 88% coverage and 14 passing Selenium frontend tests.
The backend runs a custom task management system to schedule and execute tasks for creating papers and updating citation counts, offloading scraping latency onto times when the user isn’t using the app. We have a page for viewing active tasks, as well as a page for viewing data collection issues that can be resolved by the user or dismissed.
Finally, accessing our system requires Shibboleth authentication. Authorized Unity IDs can be specified in the environment variables.
Our sponsor is the Department of Computer Science at NC State. NC State is an R1 institution, which is the highest possible marking for the amount of research activity an institution has. NC State is also a very large university with a substantial number of faculty engaging in such research. The Department of Computer Science is no exception to that rule. Dr. Gregg Rothermel is the head of the Department of Computer Science at NC State. As department head, Dr. Rothermel has the important responsibility of evaluating the faculty, and specifically the performance and overall impact of the research faculty in the computer science department. Sponsor Contact Info: Dr. Gregg Rothermel Computer Science Department Head and Professor 3308 EB II Phone: 919-513-0348 [email protected]
Senior Design Team 6 Github Repo
-
- FR1 Submitting Papers
- FR2 Validating Papers
- FR3 Gathering Paper Metadata
- FR4 Editing Author Tags
- FR5 Displaying Statistics
- FR6 Exporting Data
- Non-Functional
- Constraints
-
Iteration Contents
-
Technologies
- MySQL
- SQLAlchemy
- Docker
- Flask
- BeautifulSoup
- ScraperAPI
- Nginx for testing
- Apache
- React
- MaterialUI
- Bootstrap
- ToastUI