Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a look at PdfConverter #357

Open
ylussaud opened this issue Jun 26, 2019 · 5 comments
Open

Have a look at PdfConverter #357

ylussaud opened this issue Jun 26, 2019 · 5 comments

Comments

@ylussaud
Copy link
Collaborator

If it works fine, it would be nice to generate a docx or a pdf according to the file extension of the output document.

@ylussaud
Copy link
Collaborator Author

ylussaud commented Oct 2, 2019

It is not part of the POI project and need new dependencies:

fr.opensagres.xdocreport
fr.opensagres.poi.xwpf.converter.core
2.0.2

@ylussaud
Copy link
Collaborator Author

The converter uses iText which is LGPL that can be an other problem.

@ylussaud ylussaud added this to the 3.1.0 milestone Dec 13, 2019
@ylussaud ylussaud modified the milestones: 3.1.0, 3.1.1 Jun 29, 2020
@ejuliot
Copy link
Member

ejuliot commented Dec 14, 2020

POI already has a built-in support for DOCX to PDF conversion. Loot at https://stackoverflow.com/questions/43363624/converting-docx-into-pdf-in-java (org.apache.poi.xwpf.converter.pdf.PdfConverter)

@ylussaud
Copy link
Collaborator Author

As stated above PdfConverter is not part of apache POI but fr.opensagres.poi.xwpf.converter.core that support apache POI 4.0.1. M2Doc is using apache POI 4.1.0 and will move to next versions.

@ylussaud ylussaud modified the milestones: 3.1.1, 3.1.2 Jan 6, 2021
@ylussaud ylussaud modified the milestones: 3.2.0, 4.0.0 Apr 16, 2021
@ylussaud ylussaud modified the milestones: 3.2.2, 3.2.3 Sep 22, 2022
@ylussaud ylussaud modified the milestones: 3.3.0, 3.3.1 May 2, 2023
@ylussaud ylussaud modified the milestones: 3.3.1, 3.3.2 Sep 19, 2023
@ylussaud ylussaud modified the milestones: 3.3.2, 3.3.3 Dec 4, 2023
@ylussaud
Copy link
Collaborator Author

ylussaud commented Sep 5, 2024

The LGPL licence is not an issue, there is LGPL code in the Orbit update site. At the moment both M2Doc and fr.opensagres.poi.xwpf.converter.pdf 2.0.0 depend on POI 5.2.3 so I was able to tests the pdf conversion.

There are the following issues:

  • when a table is present it sometimes throws an NPE:
fr.opensagres.poi.xwpf.converter.core.XWPFConverterException: java.lang.NullPointerException: Cannot invoke "org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTblGrid.getGridColList()" because "grid" is null
	at fr.opensagres.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:71)
	at fr.opensagres.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:39)
	at fr.opensagres.poi.xwpf.converter.core.AbstractXWPFConverter.convert(AbstractXWPFConverter.java:42)

To solve this issue we need to add a CTTblGrid tot the created XWPFTable. This implies knowing the width of each column. A width has been added to the MCell (see #472) but I'm not sure we will be able to compute a width when importing from HTML. I'm opening this issue #525.

java.lang.StackOverflowError
	at java.base/java.lang.StringBuffer.<init>(StringBuffer.java:133)
	at com.lowagie.text.pdf.BidiLine.createArrayOfPdfChunks(Unknown Source)
	at com.lowagie.text.pdf.BidiLine.createArrayOfPdfChunks(Unknown Source)
	at com.lowagie.text.pdf.BidiLine.processLine(Unknown Source)
	at com.lowagie.text.pdf.ColumnText.go(Unknown Source)
	at com.lowagie.text.pdf.ColumnText.goComposite(Unknown Source)
	at com.lowagie.text.pdf.ColumnText.go(Unknown Source)
	at com.lowagie.text.pdf.ColumnText.go(Unknown Source)
	at com.lowagie.text.pdf.PdfPRow.writeCells(Unknown Source)
	at com.lowagie.text.pdf.PdfPTable.writeSelectedRows(Unknown Source)
	at com.lowagie.text.pdf.PdfPTable.writeSelectedRows(Unknown Source)
	at com.lowagie.text.pdf.PdfPTable.writeSelectedRows(Unknown Source)
	at com.lowagie.text.pdf.ColumnText.goComposite(Unknown Source)
	at com.lowagie.text.pdf.ColumnText.go(Unknown Source)
	at com.lowagie.text.pdf.ColumnText.go(Unknown Source)
	at com.lowagie.text.pdf.PdfDocument.addPTable(Unknown Source)
	at com.lowagie.text.pdf.PdfDocument.add(Unknown Source)
	at com.lowagie.text.Document.add(Unknown Source)
	at fr.opensagres.xdocreport.itext.extension.ExtendedDocument.add(ExtendedDocument.java:114)
	at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.flushTable(StylableDocument.java:374)
	at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.pageBreak(StylableDocument.java:141)
	at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.columnBreak(StylableDocument.java:120)
	at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.simulateText(StylableDocument.java:230)
	at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.pageBreak(StylableDocument.java:160)
	at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.columnBreak(StylableDocument.java:120)
  • some differences between the word document and the pdf document:
    • some bullets from bullet list are missing (HTML ul test)
    • ...

Overall the output pdf is pretty close to the word document if it don't use MTable.

@ylussaud ylussaud removed this from the 3.3.3 milestone Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants