Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression of re-styling of unicode characters #1199

Closed
cicdguy opened this issue May 6, 2024 · 13 comments
Closed

Regression of re-styling of unicode characters #1199

cicdguy opened this issue May 6, 2024 · 13 comments

Comments

@cicdguy
Copy link

cicdguy commented May 6, 2024

Hello,

I believe we are seeing a regression of #847 in R 4.4.0 on Linux. MacOS does not have this issue, and I haven't tested it on Windows.

Steps to reproduce

Start a shell using the rocker/verse:4.4.0 image

docker run -it --rm --platform=linux/amd64 rocker/verse:4.4.0 sh

Observe the OS version

cat /etc/os-release

Install styler from CRAN

R -s -e 'install.packages("styler", repos = "https://cloud.r-project.org", quiet = T, Ncpus = 8)'

Create a simple file containing unicode characters

echo 'a <- "R² μ ≥"' > ex.R

Style the file

R -s -e 'styler::style_file("ex.R")'

Observe the re-styled file

cat ex.R

Supplemental Information

Running utils::getParseData(parse(text = 'suit <- "♠"')) in the container gives me:

  line1 col1 line2 col2 id parent       token terminal           text
7     1    1     1   13  7      0        expr    FALSE
1     1    1     1    4  1      3      SYMBOL     TRUE           suit
3     1    1     1    4  3      7        expr    FALSE
2     1    6     1    7  2      7 LEFT_ASSIGN     TRUE             <-
4     1    9     1   13  4      6   STR_CONST     TRUE "\342\231\240"
6     1    9     1   13  6      7        expr    FALSE

But running on my MacOS laptop gives me:

  line1 col1 line2 col2 id parent       token terminal text
7     1    1     1   11  7      0        expr    FALSE
1     1    1     1    4  1      3      SYMBOL     TRUE suit
3     1    1     1    4  3      7        expr    FALSE
2     1    6     1    7  2      7 LEFT_ASSIGN     TRUE   <-
4     1    9     1   11  4      6   STR_CONST     TRUE  ""
6     1    9     1   11  6      7        expr    FALSE
@lorenzwalthert
Copy link
Collaborator

Thanks for the good repro. If I am not mistaken, utils::getParseData() is the problem here and styler has nothing to do with it? @IndrajeetPatil maybe you can jump in.

@cicdguy
Copy link
Author

cicdguy commented May 6, 2024

Yes indeed. This is likely an R-related issue as seen previously as well.

Seeking advice here - is there some way styler can somehow ignore Unicode characters?

@IndrajeetPatil
Copy link
Collaborator

I can check this tomorrow on my Ubuntu machine, but it is a bit strange that, if indeed there has been this regression in R >= 4.4, the encoding test doesn't fail either on release or devel version: #1200.

@IndrajeetPatil
Copy link
Collaborator

I can't reproduce this locally on Ubuntu either.

Here is a reprex with session info:

utils::getParseData(parse(text = 'suit <- "♠"'))
#>   line1 col1 line2 col2 id parent       token terminal text
#> 7     1    1     1   11  7      0        expr    FALSE    
#> 1     1    1     1    4  1      3      SYMBOL     TRUE suit
#> 3     1    1     1    4  3      7        expr    FALSE    
#> 2     1    6     1    7  2      7 LEFT_ASSIGN     TRUE   <-
#> 4     1    9     1   11  4      6   STR_CONST     TRUE  "♠"
#> 6     1    9     1   11  6      7        expr    FALSE

Created on 2024-05-06 with reprex v2.1.0

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 (2024-04-24)
#>  os       Ubuntu 22.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en_US.UTF-8
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Berlin
#>  date     2024-05-06
#>  pandoc   3.1.11 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package     * version     date (UTC) lib source
#>  P cli           3.6.2       2023-12-11 [?] RSPM (R 4.4.0)
#>  P digest        0.6.35      2024-03-11 [?] RSPM (R 4.4.0)
#>  P evaluate      0.23        2023-11-01 [?] RSPM (R 4.4.0)
#>  P fastmap       1.1.1       2023-02-24 [?] RSPM (R 4.4.0)
#>  P fs            1.6.4       2024-04-25 [?] RSPM
#>  P glue          1.7.0       2024-01-09 [?] RSPM (R 4.4.0)
#>    htmltools     0.5.8.1     2024-04-04 [1] RSPM (R 4.4.0)
#>    knitr         1.46        2024-04-06 [1] RSPM (R 4.4.0)
#>  P lifecycle     1.0.4       2023-11-07 [?] RSPM (R 4.4.0)
#>  P magrittr      2.0.3       2022-03-30 [?] RSPM (R 4.4.0)
#>  P purrr         1.0.2       2023-08-10 [?] RSPM (R 4.4.0)
#>  P R.cache       0.16.0      2022-07-21 [?] RSPM (R 4.4.0)
#>  P R.methodsS3   1.8.2       2022-06-13 [?] RSPM (R 4.4.0)
#>  P R.oo          1.26.0      2024-01-24 [?] RSPM (R 4.4.0)
#>  P R.utils       2.12.3      2023-11-18 [?] RSPM (R 4.4.0)
#>  P reprex        2.1.0       2024-01-11 [?] RSPM
#>  P rlang         1.1.3       2024-01-10 [?] RSPM (R 4.4.0)
#>  P rmarkdown     2.26        2024-03-05 [?] RSPM (R 4.4.0)
#>  P rstudioapi    0.16.0      2024-03-24 [?] RSPM
#>  P sessioninfo   1.2.2       2021-12-06 [?] RSPM (R 4.4.0)
#>    styler        1.10.3.9000 2024-05-06 [1] Github (r-lib/styler@4b24ff6)
#>  P vctrs         0.6.5       2023-12-01 [?] RSPM (R 4.4.0)
#>  P withr         3.0.0       2024-01-16 [?] RSPM (R 4.4.0)
#>  P xfun          0.43        2024-03-25 [?] RSPM (R 4.4.0)
#>  P yaml          2.3.8       2023-12-11 [?] RSPM (R 4.4.0)
#>
#>  [1] /home/indra/.cache/R/renv/library/enetpipeline-ebbe6db5/linux-ubuntu-jammy/R-4.4/x86_64-pc-linux-gnu
#>  [2] /home/indra/.cache/R/renv/sandbox/linux-ubuntu-jammy/R-4.4/x86_64-pc-linux-gnu/9a444a72
#>  [3] /usr/lib/R/library
#>
#>  P ── Loaded and on-disk path mismatch.
#>
#> ──────────────────────────────────────────────────────────────────────────────

@cicdguy Can you please post a reprex with session info so we can check what's different between our/GitHub and your machines?

@IndrajeetPatil
Copy link
Collaborator

Hmm, I can reproduce the output you are seeing in the container:

# echo 'a <- "R² μ ≥"' > ex.R
# R -s -e 'styler::style_file("ex.R")'
Styling  1  files:
 ex.R i 
----------------------------------------
Status	Count	Legend 
v 	0	File unchanged.
i 	1	File changed.
x 	0	Styling threw an error.
----------------------------------------
Please review the changes carefully!
# cat ex.R
a <- "R<U+00B2> <U+03BC> <U+2265>"

Can this be an issue in Rocker's image? Can anyone reproduce this without using this image?

@IndrajeetPatil
Copy link
Collaborator

@eitsupi Maybe you have some idea as to what might be going on here?

@eitsupi
Copy link

eitsupi commented May 6, 2024

Perhaps it is a locale issue? See rocker-org/rocker-versioned2#802.
Try setting the environment variable LANG=en_US.UTF-8.

@cicdguy
Copy link
Author

cicdguy commented May 6, 2024

It is indeed a locale issue. Setting LANG=en_US.UTF-8 works like a charm. I guess I'll just set this on the containers going forward.
Thank you all! 🙏🏽

@eitsupi
Copy link

eitsupi commented May 7, 2024

Sorry for bothering you.
I have triggered a new build, so will fix this.

@IndrajeetPatil
Copy link
Collaborator

Thanks for the quick reply and fix, @eitsupi. You Rock(er)! 🤘

cicdguy added a commit to insightsengineering/r.pkg.template that referenced this issue May 7, 2024
See r-lib/styler#1199 and
rocker-org/rocker-versioned2#802 for reference.

This adds the `LANG=en_US.UTF-8` environment variable in the event that
it is not set.
@lorenzwalthert
Copy link
Collaborator

Ok, but it’s still a problem in base r and setting the locale is more of a workaround, no?

@IndrajeetPatil
Copy link
Collaborator

Ok, but it’s still a problem in base r and setting the locale is more of a workaround, no?

No, it was an issue with the Rocker image of base-R, not in the base-R itself. This is why the issue was reproducible neither locally nor on GitHub, but only in Docker containers using the said image. But the image has already been fixed, so this is no longer an issue.

@lorenzwalthert
Copy link
Collaborator

So parsing a <- "R² μ ≥" is expected to give something meaningful only if LANG is set?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants