scrapR

R package to extract data from PDF figures

Installation

The easiest way to install the development version of scrapR is to use the devtools package:

# install.packages("devtools")
library(devtools)
install_github("adamkucharski/scrapR")
library(scrapR)

# load dependencies
# install.packages("readr")
# install.packages("grImport")
# install.packages("magrittr")
library(grImport)
library(readr)

Note that the dependency grImport requires the ghostscript PDF interpreter to be installed. You can check which version you have installed (if any) by running $ gs -v on the command line. If required, installation can be done via homebrew with $ brew install ghostscript.

Example

First you need a figure to extract data from. If you want a simple test figure, you can run:

simulate_PDF_data()

to generate a simulated set of lines and output as figure1.pdf.

Next, navigate to the directory containing your PDF figure and import the data:

load_PDF_data(file_name="figure1.pdf")

This will output a raw RDS file and a figure ([FIGURENAME].guide.pdf) with the different vector components labelled with numbers.

If the data fails to import, it's probably because the vector graphic has too many surrounding features. In this case, use an editor like Affinity/Illustrator etc. to delete unnecessary surrounding content, making sure to leave the lines with data you want and at least four tick marks (2 on x-axis, 2 on y-axis), which will be used to calibrate the scale.

Once you've run load_PDF_data(), edit/create [FIGURE NAME].guide.csv so numbers match up with two x-axis tick marks and two y-axis tick marks, and specify which data you want to extract:

point	value	axis
5	5	x
10	30	x
13	200	y
16	800	y
2	NA	data
18	NA	data

Then extract the data using the RDS file and guide CSV file.

extract_PDF_data(file_name = "figure1.pdf")

The resulting data for the line(s) will be output as [FIGURENAME].csv, with each line grouped by index. The above function also has an option to adjust for x and/or y axes on logarithmic scale.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
R		R
man		man
scripts		scripts
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrapR

Installation

Example

About

Releases

Packages

Languages

License

adamkucharski/scrapR

Folders and files

Latest commit

History

Repository files navigation

scrapR

Installation

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages