rvest

Overview

rvest helps you scrape information from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup.

library(rvest)
lego_movie <- read_html("http://www.imdb.com/title/tt1490017/")

rating <- lego_movie %>% 
  html_nodes("strong span") %>%
  html_text() %>%
  as.numeric()
rating
#> [1] 7.8

cast <- lego_movie %>%
  html_nodes("#titleCast .primary_photo img") %>%
  html_attr("alt")
cast
#>  [1] "Will Arnett"     "Elizabeth Banks" "Craig Berry"    
#>  [4] "Alison Brie"     "David Burrows"   "Anthony Daniels"
#>  [7] "Charlie Day"     "Amanda Farinos"  "Keith Ferguson" 
#> [10] "Will Ferrell"    "Will Forte"      "Dave Franco"    
#> [13] "Morgan Freeman"  "Todd Hansen"     "Jonah Hill"

poster <- lego_movie %>%
  html_nodes(".poster img") %>%
  html_attr("src")
poster
#> [1] "https://m.media-amazon.com/images/M/MV5BMTg4MDk1ODExN15BMl5BanBnXkFtZTgwNzIyNjg3MDE@._V1_UX182_CR0,0,182,268_AL_.jpg"

Installation

Install the release version from CRAN:

install.packages("rvest")

Or the development version from GitHub

# install.packages("devtools")
devtools::install_github("tidyverse/rvest")

Key functions

The most important functions in rvest are:

Create an html document from a url, a file on disk or a string containing html with read_html().
Select parts of a document using CSS selectors: html_nodes(doc, "table td") (or if you’ve a glutton for punishment, use XPath selectors with html_nodes(doc, xpath = "//table//td")). If you haven’t heard of selectorgadget, make sure to read vignette("selectorgadget") to learn about it.
Extract components with html_name() (the name of the tag), html_text() (all text inside the tag), html_attr() (contents of a single attribute) and html_attrs() (all attributes).
(You can also use rvest with XML files: parse with xml(), then extract components using xml_node(), xml_attr(), xml_attrs(), xml_text() and xml_name().)
Parse tables into data frames with html_table().
Extract, modify and submit forms with html_form(), set_values() and submit_form().
Detect and repair encoding problems with guess_encoding() and repair_encoding().
Navigate around a website as if you’re in a browser with html_session(), jump_to(), follow_link(), back(), forward(), submit_form() and so on. (This is still a work in progress, so I’d love your feedback.)

To see examples of these function in use, check out the demos.

Inspirations

Python: RoboBrowser, Beautiful Soup.

Code of Conduct

Please note that the rvest project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Name		Name	Last commit message	Last commit date
Latest commit History 274 Commits
.github		.github
R		R
demo		demo
inst		inst
man		man
pkgdown/favicon		pkgdown/favicon
revdep		revdep
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.covrignore		.covrignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
rvest.Rproj		rvest.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rvest

Overview

Installation

Key functions

Inspirations

Code of Conduct

About

Releases

Packages

Languages

License

Rinochek/rvest

Folders and files

Latest commit

History

Repository files navigation

rvest

Overview

Installation

Key functions

Inspirations

Code of Conduct

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages