Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml_add_parent produces a segfault in for loop #339

Open
AleKoure opened this issue May 14, 2021 · 3 comments
Open

xml_add_parent produces a segfault in for loop #339

AleKoure opened this issue May 14, 2021 · 3 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@AleKoure
Copy link

AleKoure commented May 14, 2021

By developing a plumber API with xml2 I fall into the following error under a small stress test. I reproduce a minimal example in my local machine.

The following code chunk produces an error,

library(xml2)

xx <- function() {
  x <- read_xml("<fruits><apple color='red'></apple></fruits>")
  xml_add_parent(x, read_xml("<food></food>"))
  print(as.character(x))
}

for(i in 1:1000)xx()
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<food>\n  <fruits>\n    <apple color=\"red\"/>\n  </fruits>\n</food>\n"
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<food>\n  <fruits>\n    <apple color=\"red\"/>\n  </fruits>\n</food>\n"
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<food>\n  <fruits>\n    <apple color=\"red\"/>\n  </fruits>\n</food>\n"

 *** caught segfault ***
address 0x55ff44000000, cause 'memory not mapped'

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=el_GR.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=el_GR.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=el_GR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=el_GR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xml2_1.3.2         plumber_1.1.0.9000

loaded via a namespace (and not attached):
 [1] compiler_4.0.4   magrittr_2.0.1   R6_2.5.0         later_1.2.0     
 [5] promises_1.2.0.1 tools_4.0.4      swagger_3.33.1   Rcpp_1.0.6      
 [9] stringi_1.6.1    jsonlite_1.7.2   webutils_1.1     lifecycle_1.0.0 
[13] rlang_0.4.11    

you can bypass it for example by using xml_add_child and xml_replace instead.

@erp31
Copy link

erp31 commented Jun 2, 2021

Hi, I'm also experiencing the problem of R crashing when xml_add_parent is used in combination with other code. As a minimal example it crashes when the code below is run three times. When I originally found the problem I was only calling xml_add_parent once in a script with many other function calls. However, I don't know how to create a minimal example for that I'm afraid.

library(xml2)

# Create XML document
doc <- read_xml("<parent><child1>Hello</child1></parent>")

# Check current elements
children <- xml_children(doc)

new_node <- read_xml('<new_node>New text</new_node>')
xml_add_parent(children, new_node)

# Show that the parent node has been added
doc
#> {xml_document}
#> <parent>
#> [1] <new_node>New text<child1>Hello</child1></new_node>

If I run the above in a loop then it causes R to crash e.g.:

library(xml2)

for (i in 1:3){  
  # Create XML document
  doc <- read_xml("<parent><child1>Hello</child1></parent>")
  
  # Check current elements
  children <- xml_children(doc)
  #expect_equal(xml_text(children), c("Hello"))
  
  new_node <- read_xml('<new_node>New text</new_node>')
  xml_add_parent(children, new_node)
  
  doc

}

reprex produces this:

This reprex appears to crash R.
See standard output and standard error for more details.

Standard output and error

*** caught segfault ***
  address 0x5610c8000000, cause 'memory not mapped'
An irrecoverable exception occurred. R is aborting now ...

OR this:

This reprex appears to crash R.
See standard output and standard error for more details.

Standard output and error

free(): invalid pointer

Thanks to AleKoure for pointing out the workaround and helping me locate which part of my code was crashing my R session.

Created on 2021-06-02 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.4 (2021-02-15)
#>  os       CentOS Linux 8              
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       UTC                         
#>  date     2021-06-02                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  backports     1.2.1   2020-12-09 [2] CRAN (R 4.0.4)
#>  cli           2.4.0   2021-04-05 [2] CRAN (R 4.0.4)
#>  crayon        1.4.1   2021-02-08 [2] CRAN (R 4.0.4)
#>  digest        0.6.27  2020-10-24 [2] CRAN (R 4.0.4)
#>  ellipsis      0.3.1   2020-05-15 [2] CRAN (R 4.0.4)
#>  evaluate      0.14    2019-05-28 [2] CRAN (R 4.0.4)
#>  fansi         0.4.2   2021-01-15 [2] CRAN (R 4.0.4)
#>  fs            1.5.0   2020-07-31 [2] CRAN (R 4.0.4)
#>  glue          1.4.2   2020-08-27 [2] CRAN (R 4.0.4)
#>  highr         0.9     2021-04-16 [2] CRAN (R 4.0.4)
#>  htmltools     0.5.1.1 2021-01-22 [2] CRAN (R 4.0.4)
#>  knitr         1.32    2021-04-14 [2] CRAN (R 4.0.4)
#>  lifecycle     1.0.0   2021-02-15 [2] CRAN (R 4.0.4)
#>  magrittr      2.0.1   2020-11-17 [2] CRAN (R 4.0.4)
#>  pillar        1.6.0   2021-04-13 [2] CRAN (R 4.0.4)
#>  pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 4.0.4)
#>  purrr         0.3.4   2020-04-17 [2] CRAN (R 4.0.4)
#>  reprex        2.0.0   2021-04-02 [2] CRAN (R 4.0.4)
#>  rlang         0.4.10  2020-12-30 [2] CRAN (R 4.0.4)
#>  rmarkdown     2.7     2021-02-19 [2] CRAN (R 4.0.4)
#>  sessioninfo   1.1.1   2018-11-05 [2] CRAN (R 4.0.4)
#>  stringi       1.5.3   2020-09-09 [2] CRAN (R 4.0.4)
#>  stringr       1.4.0   2019-02-10 [2] CRAN (R 4.0.4)
#>  styler        1.4.1   2021-03-30 [2] CRAN (R 4.0.4)
#>  tibble        3.1.1   2021-04-18 [2] CRAN (R 4.0.4)
#>  utf8          1.2.1   2021-03-12 [2] CRAN (R 4.0.4)
#>  vctrs         0.3.7   2021-03-29 [2] CRAN (R 4.0.4)
#>  withr         2.4.2   2021-04-18 [2] CRAN (R 4.0.4)
#>  xfun          0.22    2021-03-11 [2] CRAN (R 4.0.4)
#>  xml2        * 1.3.2   2020-04-23 [2] CRAN (R 4.0.4)
#>  yaml          2.2.1   2020-02-01 [2] CRAN (R 4.0.4)
#> 

@chainsawriot
Copy link

chainsawriot commented Nov 1, 2021

for (i in 1:500){  
  print(i)
  doc <- xml2::read_xml("<a><b>a</b></a>")
  children <- xml2::xml_children(doc)
  xml2::xml_add_parent(children, xml2::read_xml('<c>d</c>'))
}

On my machine, this one can go to 60 and a segfault is triggered.

The trigger is xml_add_parent. xml_add_child and xml_add_sibling won't trigger the segfault.

@hadley hadley added the bug an unexpected problem or unintended behavior label Feb 28, 2022
@alexverse
Copy link

Possible solution is adding .copy = TRUE in xml_replace() makes the function stable for iterations, but I guess this will have some impact on performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants