GSScrapeNode

A web scraper for Google Scholar utilizing Node.js

Introduction

A methodology to grab all titles that reference a seed article was necessary for a research project, but Google does not provide a way to do this. Thus, GSScrape was born with the goal of automatically scraping Google Scholar for titles. As it stands today, May 24, 2016, GSScrape is only semi-automatic, and requires a user to navigate to each results page and click a button to extract titles to a file.

GSScrapeNode is a version of GSScrape implemented through a Node.js based webserver and a Google Chrome extension. It is built this way because extracting titles from a Google Scholar web page is trivial, but outputting the titles to a file is not. The file output aspect is what generated a need for a webserver. In a nutshell, GSScrapeNode will fetch the titles from a Google Scholar results page and post them to a local webserver. The webserver will then write the titles to a file.

Installation

You'll need to be running linux with both npm and nodejs installed. Both are available using apt-get. You will also need Google Chrome.
Clone this repo to wherever. That path will be referred to as "scraper_path" in the following steps.
In Chrome load the unpacked extension located in scraper_path/extension.
In the scraper_path directory, run the command npm install. This will install the express and body-parser node modules
Next, run nodejs server.js
You should see a message indicating the server is listening on localhost:8080. You may change the listening port as needed.
Navigate to a Google Scholar page with results and click the GSScrapeNode extension button. In scraper_path, there will be a new file called titles.txt that will hold the titles from that page. You may do this on as many pages as needed, and all titles will be appended to titles.txt, one title per line.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
extension		extension
README.md		README.md
package.json		package.json
server.js		server.js
titles.txt		titles.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GSScrapeNode

A web scraper for Google Scholar utilizing Node.js

Introduction

Installation

About

Releases

Packages

Contributors 2

Languages

BenjiFischman/GSScrapeNode

Folders and files

Latest commit

History

Repository files navigation

GSScrapeNode

A web scraper for Google Scholar utilizing Node.js

Introduction

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages