Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.bib still failing - consider some other method of reading bibfiles #9

Open
SteveLane opened this issue Mar 23, 2018 · 6 comments
Open

Comments

@SteveLane
Copy link

(Hi @mjwestgate - rather than reopen issue #2 I thought I'd start a new one, reopen if you want)

bibfiles are tricky beasts! I wanted to try out your package for a new project, but can't get the bibfiles in. RefManageR reads them in ok as far as I can tell. Can you use that package (or a different one) to read them in, then parse from there into the format you require?

I tried using the github version, with the following bib entry:

@Article{Grigg2004-tr,
  title = {{An overview of risk-adjusted charts}},
  author = {O Grigg and V Farewell},
  journal = {Journal of the Royal Statistical Society: Series A (Statistics in
             Society)},
  volume = {167},
  number = {3},
  pages = {523--539},
  month = {aug},
  year = {2004},
  url = {http://doi.wiley.com/10.1111/j.1467-985X.2004.0apm2.x},
  issn = {0964-1998, 1467-985X},
  doi = {10.1111/j.1467-985X.2004.0apm2.x},
}

I should note, that I tried directly using revtools:::read_bib (as read_bibliography wouldn't work with one citation, I assume due to the ris/bib checking).

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=C                 LC_NUMERIC=C              
 [3] LC_TIME=C                  LC_COLLATE=C              
 [5] LC_MONETARY=C              LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] revtools_0.2.2     RefManageR_0.14.12

loaded via a namespace (and not attached):
 [1] NLP_0.1-10           Rcpp_0.12.11         compiler_3.4.1      
 [4] plyr_1.8.4           bindr_0.1            tools_3.4.1         
 [7] digest_0.6.12        memoise_1.1.0        lubridate_1.6.0     
[10] jsonlite_1.5         tibble_1.3.3         gtable_0.2.0        
[13] viridisLite_0.2.0    pkgconfig_2.0.1      rlang_0.1.1         
[16] bibtex_0.4.2         shiny_1.0.3          parallel_3.4.1      
[19] bindrcpp_0.2         withr_1.0.2          dplyr_0.7.1         
[22] httr_1.2.1           stringr_1.2.0        xml2_1.1.1          
[25] devtools_1.13.2      topicmodels_0.2-6    htmlwidgets_0.8     
[28] shinydashboard_0.6.1 stats4_3.4.1         ade4_1.7-6          
[31] grid_3.4.1           glue_1.1.1           data.table_1.10.4   
[34] R6_2.2.2             plotly_4.7.0         ggplot2_2.2.1       
[37] purrr_0.2.2.2        tidyr_0.6.3          magrittr_1.5        
[40] scales_0.4.1         modeltools_0.2-21    htmltools_0.3.6     
[43] assertthat_0.2.0     xtable_1.8-2         mime_0.5            
[46] colorspace_1.3-2     httpuv_1.3.5         stringi_1.1.5       
[49] lazyeval_0.2.0       munsell_0.4.3        slam_0.1-40         
[52] tm_0.7-1
@mjwestgate
Copy link
Owner

Thanks Steve. You're right that this is a tough problem. I can get to this in a few days, but in the meantime, revtools::start_review_window also accepts a data.frame, so you could import using a different method and just use revtools for visualisation. The columns you would need to include in your data.frame are:

  • 'label' (a unique ID for each row)
  • 'author' (all authors in a single string, separated by ' and ')
  • 'year' (accepts numeric or character)
  • 'title'
  • 'journal'
  • 'abstract' (if available)
    Hope this helps for now - more to follow.

@SteveLane
Copy link
Author

No rush from me, was mainly playing. I think it's a great idea, and I look forward to updates.

Thanks for the tip, I'll give it a go.

@mjwestgate
Copy link
Owner

Hi Steve - this took me a while to get back to, but I've updated this so that 1. read_bibliography detects .bib files more reliably, and 2. read_bib actually functions for the (fairly basic) cases that I've tried. If you get time to check it out and find more bugs then let me know. I'm going to keep checking this over the next week or so, so I won't close this issue just yet.

@dfalster
Copy link

Hi Martin, Thanks for great seminar yesterday and exciting package. I also encountered an error (using cran version) reading in bib files, but after seeing this issue installed latest from GH and was able to read in a bib file and start an analysis.

However, the read_bibliography function failed on another bib file I tried. This one had some custom sections and text in it. I looked into the failure and the parsing of the file via your regex expressions may have produced some unexpected results. This made me wonder: can you use the results of bibtex::read.bib and work with that? As you may know, the resulting bibentry has fields you can extract, e.g. bib[[1]]$title etc:

> str(bib[[1]])
Class 'bibentry'  hidden list of 1
 $ Bruna-2010:List of 7
  ..$ title  : chr "Scientific Journals Can Advance Tropical Biology and Conservation by Requiring Data Archiving"
  ..$ volume : chr "42"
  ..$ doi    : chr "10.1111/j.1744-7429.2010.00652.x"
  ..$ journal: chr "Biotropica"
  ..$ author :Class 'person'  hidden list of 1
  .. ..$ :List of 5
  .. .. ..$ given  : chr [1:2] "Emilio" "M."
  .. .. ..$ family : chr "Bruna"
  .. .. ..$ role   : NULL
  .. .. ..$ email  : NULL
  .. .. ..$ comment: NULL
  ..$ year   : chr "2010"
  ..$ pages  : chr "399--401"
  ..- attr(*, "bibtype")= chr "Article"
  ..- attr(*, "key")= chr "Bruna-2010"

At least then you could offload the challenge of firstly reading in a bibfile?

@SteveLane
Copy link
Author

I think there's going to be issues no matter what method is used to read in the bibfiles...

For example, I tried each of read_bibliography, bibtex::read.bib and RefManageR::ReadBib to read in the following bibliography, and none of them could get the 'author' correct:

@MISC{biosec-act-2015,
  title  = "{Biosecurity Act 2015}",
  author = "{Department of Agriculture and Water Resources}",
  month  =  jun,
  year   =  2015,
  url    = "https://www.legislation.gov.au/Details/C2015A00061"
}

@ConnorEsterwood
Copy link

I'm currently running into this issue as well ... or a variant of it ... seems like my .bib file is pulling a function error? Error in if (any(col_n < 3)) { : missing value where TRUE/FALSE needed

That might be just due to an ugly .bib but I'm not really sure ... gonna try and just switch my data exports to .csv or .ris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants