Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special Character in Sheet-Name destroys the Excel-File #518

Open
joshuasami opened this issue Jan 20, 2025 · 1 comment
Open

Special Character in Sheet-Name destroys the Excel-File #518

joshuasami opened this issue Jan 20, 2025 · 1 comment

Comments

@joshuasami
Copy link

Describe the bug

When using the openxlsx package to modify an Excel workbook, if there is a worksheet with a name that includes an ampersand (&), it results in a corrupted file. The workbook becomes unopenable in Excel due to invalid XML generated by unescaped special characters in the sheet name.

To Reproduce

A minimal reproducible example:

wb <- createWorkbook()
addWorksheet(wb, sheetName = "Test1 & Test2")
saveWorkbook(wb, "test.xlsx", overwrite = TRUE)

wb <- loadWorkbook("test.xlsx")
saveWorkbook(wb, file = "test.xlsx", overwrite = TRUE)

Steps:

  1. Run the R script above.
  2. Attempt to open test.xlsx in Excel.
  3. Excel reports that the file is corrupted or cannot be opened.

Additional context

The issue seems to occurs because the ampersand (&) is a special character in XML and must be escaped as &amp; when included in XML attribute values. The openxlsx package seems to insert the sheetName directly into the XML without escaping special characters, leading to malformed XML and a corrupted Excel file.

Proposed Solution

I couldn't fully understand the full Workbook-Class, but I tried around a little bit and when you add a function, which changes the special characters, the files work again e.g. :

# Function to escape special XML characters
xmlEscape <- function(txt) {
  txt <- gsub("&", "&amp;", txt, fixed = TRUE)
  txt <- gsub("<", "&lt;", txt, fixed = TRUE) 
  txt <- gsub(">", "&gt;", txt, fixed = TRUE)
  txt <- gsub("'", "&apos;", txt, fixed = TRUE)
  txt <- gsub('"', "&quot;", txt, fixed = TRUE)
  return(txt)
}

# Example modification in addWorksheet function
Workbook$methods(addWorksheet = function(sheetName, ...) {
  
  # Escape the sheet name before inserting into XML
  sheetName<- xmlEscape(sheetName)

# ... existing code ...

I added the xmlEscape-Function, to the Methods: addChartSheet, setSheetName, and addWorksheet and this made opening existing Excel-Files with a & in the Sheet-Name possible again. When creating a new Sheet, the & gets displayed as &amp;. So maybe someone, who understands the package better, can find a solution for that. :)

Best wishes,
Josh

@JanMarvin
Copy link
Collaborator

Hi @joshuasami , there should be a function to escape xml, if you want to, you can open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants