-
Notifications
You must be signed in to change notification settings - Fork 0
/
ep_npl_sample.Rmd
58 lines (51 loc) · 1.21 KB
/
ep_npl_sample.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
title: "BQ Patent Exploration"
output: github_document
---
```{r, echo = FALSE, message = FALSE, warning = FALSE}
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
warning = FALSE,
message = FALSE,
echo = TRUE
)
```
### Connect to BQ
```{r}
library(tidyverse)
library(DBI)
library(bigrquery)
con <- dbConnect(
bigrquery::bigquery(),
project = "cogent-tangent-279810"
)
```
### Query
Get a sample of 10,000 non-paten literature citation from patents provided by the European Patent Office since 2015.
```{sql, connection = con, output.var="rand_npl"}
SELECT
meta.publication_number,
meta.country_code,
cite.type,
cite.npl_text,
rand() as rand
FROM
`patents-public-data.patents.publications_201912` AS meta,
UNNEST(citation) AS cite
WHERE
meta.country_code in (
"EP"
)
AND meta.publication_date > 20150101
AND cite.npl_text != ''
AND NOT REGEXP_CONTAINS(cite.npl_text, "^See|None")
AND NOT cite.npl_text = "No further relevant documents disclosed"
AND rand() < 0.01
ORDER BY rand LIMIT 10000
```
### Backup
```{r}
rand_npl
write_csv(rand_npl, "data/rand_npl.csv")
```