Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster_label_prop() crashes R when using fixed labels #1434

Open
ababaian opened this issue Jul 17, 2024 · 6 comments · May be fixed by #1582
Open

cluster_label_prop() crashes R when using fixed labels #1434

ababaian opened this issue Jul 17, 2024 · 6 comments · May be fixed by #1582

Comments

@ababaian
Copy link

ababaian commented Jul 17, 2024

What happens, and what did you expect instead?

I'm using the LPA algorithm implemented in cluster_label_prop on an undirected graph of moderate size (1042 vertices, 1124 edges) and when fixing the labels on a subset of the vertices, the function crashes R.

To reproduce

Minimal reproducible example file: lpa.g.debug.zip

# Load the graph, g
# Load the label data.frame, lab.df
load("lpa.g.debug.RData")

# where
lab.df <- data.frame( names  = V(g)$name,
                      component = V(g)$component,
                      label  = factor( vertex_attr( g, 'scientific_name' ) ),
                      int.label  = -1 ,
                      fixed  = !V(g)$type )

lab.df$int.label[ lab.df$fixed ] <- as.numeric( lab.df$label[ lab.df$fixed ] )


# Works
lpa <- cluster_label_prop(g,
                          weights = E(g)$vrank,
                          initial = lab.df$int.label)

# R Crashes
lpa <- cluster_label_prop(g,
                          weights = E(g)$vrank,
                          initial = lab.df$int.label,
                          fixed   = lab.df$fixed)

System information

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8    LC_PAPER=en_CA.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Toronto
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] igraph_2.0.3

loaded via a namespace (and not attached):
 [1] plotly_4.10.4      utf8_1.2.4         generics_0.1.3     tidyr_1.3.1        gplots_3.1.3.1     bitops_1.0-7       xml2_1.3.6         KernSmooth_2.23-24 gtools_3.9.5       stringi_1.8.4     
[11] digest_0.6.36      magrittr_2.0.3     caTools_1.18.2     evaluate_0.24.0    grid_4.4.1         pkgload_1.4.0      fastmap_1.2.0      jsonlite_1.8.8     processx_3.8.4     ps_1.7.6          
[21] httr_1.4.7         purrr_1.0.2        fansi_1.0.6        viridisLite_0.4.2  scales_1.3.0       lazyeval_0.2.2     cli_3.6.3          rlang_1.1.4        munsell_0.5.1      reprex_2.1.1      
[31] withr_3.0.0        yaml_2.3.8         tools_4.4.1        dplyr_1.1.4        colorspace_2.1-0   ggplot2_3.5.1      DT_0.33            vctrs_0.6.5        R6_2.5.1           lifecycle_1.0.4   
[41] stringr_1.5.1      fs_1.6.4           htmlwidgets_1.6.4  downloadthis_0.3.3 callr_3.7.6        clipr_0.8.0        pkgconfig_2.0.3    pillar_1.9.0       gtable_0.3.5       data.table_1.15.4 
[51] glue_1.7.0         xfun_0.45          tibble_3.2.1       tidyselect_1.2.1   rstudioapi_0.16.0  knitr_1.47         htmltools_0.5.8.1  rmarkdown_2.27     compiler_4.4.1     roxygen2_7.3.2    
@ababaian
Copy link
Author

I've narrowed down the issue, there are NA values being passed on to the intitial numeric vector of cluster_label_prop(). When the labels are not fixed this does not seem to be an issue, but if the NA integer are set as fixed, R crashes.

# R Crashes (as before)
lpa <- cluster_label_prop(g,
                          weights = E(g)$vrank,
                          initial = lab.df$int.label,
                          fixed   = lab.df$fixed)

# R does not crash
na.label <- is.na(lab.df$int.label)
lab.df$int.label[ na.label ] <- 99

lpa <- cluster_label_prop(g,
                          weights = E(g)$vrank,
                          initial = lab.df$int.label,
                          fixed   = lab.df$fixed)

# R Crashes
# lab.df$fixed[ 1 ] is TRUE
lab.df$int.label[ 1 ] <- NA

lpa <- cluster_label_prop(g,
                          weights = E(g)$vrank,
                          initial = lab.df$int.label,
                          fixed   = lab.df$fixed)

@szhorvat
Copy link
Member

@Antonov548 Can you run this using ASAN and see if it's still present in the dev version (to become 2.0.4)? When not using ASAN, a lack of crash does not indicate that there is no bug (I can't repro it on my machine, but that doesn't mean no bug).

@ababaian
Copy link
Author

Is there a good way to get you system log information from the time the crash happens which could help diagnose the problem?

@szhorvat
Copy link
Member

szhorvat commented Jul 18, 2024

I'm not sure, I don't think so.

It's good to note that passing NA values to igraph functions is almost never valid (certainly not here). The exceptions are storing attributes (NA values can be stored) and where a NA scalar has special meaning (e.g. weights=NA).

That said, there should not be a crash.

I believe the major issue here is that the R interface does not do any validation when converting to an integer vector (igraph_vector_int_t). See:

rigraph/src/rinterface_extra.c

Lines 3380 to 3388 in 128182d

igraph_error_t R_SEXP_to_vector_int_copy(SEXP sv, igraph_vector_int_t *v) {
igraph_integer_t n = Rf_xlength(sv);
double *svv=REAL(sv);
IGRAPH_CHECK(igraph_vector_int_init(v, n));
for (igraph_integer_t i = 0; i<n; i++) {
VECTOR(*v)[i] = (igraph_integer_t)svv[i];
}
return IGRAPH_SUCCESS;
}

This is related to #1140, but for vectors rather than scalars. I noted it with a yellow mark in #840. Note that doing it for vectors may have a noticeable performance impact.

@krlmlr I would not make this issue block 2.0.4. A proper fix will be very time consuming.

@ababaian
Copy link
Author

Agreed, the NA was actually an error on my side upstream of LPA, but having it take down the whole R session was annoying. Throwing an error if there are NA in int.label seems prudent, but if the error is not easily reproducible, then that may cause other errors in systems where it's working.

@szhorvat
Copy link
Member

Yes, of course this should be fixed. The problem is that the proper fix is time-consuming, and requires a lot of care, as it involves reviewing some of the fundamental glue code between R and C, and not just this single function. This is why I recommended not blocking the next release on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants