Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic conversion to LaTeX source #249

Open
berceanu opened this issue Jul 8, 2019 · 17 comments
Open

Automatic conversion to LaTeX source #249

berceanu opened this issue Jul 8, 2019 · 17 comments

Comments

@berceanu
Copy link

berceanu commented Jul 8, 2019

For some research fields, like math or theoretical physics, one must submit the LaTeX source of the manuscript for publication in, eg. APS journals. It would therefore be very convenient if manubot could also have a LaTeX + BibTeX conversion mode for such cases. iirc pandoc supports conversion to TeX?

@agitter
Copy link
Member

agitter commented Jul 8, 2019

This concept makes sense for submitting to journals that prefer LaTeX over DOCX for submissions. It should be feasible because you are correct that pandoc supports conversion to tex. We could add a BUILD_TEX option to the build script that generates the tex output when requested, similar to the optional DOCX output

manubot/manubot#68 discussed some earlier attempts to generate tex. We would need to work on a stable way to get LaTeX working in the continuous integration environment or use a Docker environment for this step.

In the short term, we could also work on an example pandoc command to guide users who want to do this outside of the build script as a final step before journal submission.

@berceanu
Copy link
Author

berceanu commented Jul 8, 2019

Perhaps docker is the way to go, I see it is also used in VScode LaTeX extension

@dhimmel
Copy link
Member

dhimmel commented Jul 8, 2019

There are two steps here I believe:

  1. Convert the Markdown manuscript to a .tex file, for example by using pandoc --to=latex. Should we also output a .bib file with the reference metadata, or should this be included in the .tex?

  2. Rendering the .tex file as a PDF. This is where using a Docker image probably makes sense.

It sounds like 1 is what is necessary for submission to journals, although 2 would be nice so the LaTeX compiler could detect errors and you could view the output PDF to make sure everything converted properly.

@slochower do you have an implementation of either of these steps? How much will users need to customize these steps? How does customization work / at what stage... isn't there some way to apply a template/style for a specific journal?

@dhimmel
Copy link
Member

dhimmel commented Jul 8, 2019

The big benefits of adding LaTeX support that I see are:

  1. journal submission via .tex.

  2. another route to create PDFs, using existing infrastructure for branded PDFs. This could help Manubot become the primary document generation system for journals which require stylized PDFs.

  3. enabling latexdiff to track changes between manuscript versions.

@berceanu
Copy link
Author

berceanu commented Jul 8, 2019

One can also consult eg. arXiv and PRL guides for LaTeX submission.

@slochower
Copy link
Collaborator

@slochower do you have an implementation of either of these steps? How much will users need to customize these steps? How does customization work / at what stage... isn't there some way to apply a template/style for a specific journal?

Point 1 is (relatively) easy, as you say. We can just use pandoc. I used something like this (with a custom template file):

if [ "$BUILD_LATEX" = "true" ];
then
  echo "Exporting LATEX manuscript"
  pandoc \
    --from=markdown \
    --to=latex \
    --filter=pandoc-fignos \
    --filter=pandoc-eqnos \
    --filter=pandoc-tablenos \
    --filter=pandoc-img-glob \
    --bibliography=$BIBLIOGRAPHY_PATH \
    --csl=$CSL_PATH \
    --template=build/assets/nih4.tex \
    --metadata link-citations=true \
    --number-sections \
    --resource-path=.:content \
    -s --output=output/manuscript.tex \
    $INPUT_PATH

fi

IIRC, --resource-path was necessary so that the image path embedded in manuscript.tex matched the image location in our folder structure.

Point 2, also as you point out, is a little more tricky. I implemented it this way:

if [ "$BUILD_PDF_VIA_LATEX" = "true" ];
  then
  echo "Exporting LATEX (PDF) manuscript"
  FONT="Helvetica"
  COLORLINKS="true"
  pandoc \
    --from=markdown \
    --filter=pandoc-eqnos \
    --filter=pandoc-tablenos \
    --filter=pandoc-img-glob \
    --filter=pandoc-chemfig \
    --filter=pandoc-fignos \
    --lua-filter=build/latex-color.lua \
    --bibliography=$BIBLIOGRAPHY_PATH \
    --csl=$CSL_PATH \
    --template=build/assets/nih4.tex \
    --metadata link-citations=true \
    --resource-path=.:content:../content \
    --pdf-engine=xelatex \
    --variable mainfont="${FONT}" \
    --variable sansfont="${FONT}" \
    --variable colorlinks="${COLORLINKS}" \
    --output=output/manuscript.pdf \
    $INPUT_PATH

fi

But I did not have this running via CI (only locally). Here I used pandoc-img-glob to move the images to a temporary directory with the tex for compilation and changed the --resource-path accordingly. Getting something like this to work would probably require docker or waiting a long time for an apt-get install texlive (or similar) to run on Travis. I used xelatex because I wanted the grant application to be in Helvetica, FWIW.

Regarding latexdiff. I implemented a quick-and-dirty solution that might be useful in the future. You can see it here.

@dhimmel dhimmel transferred this issue from manubot/manubot Jul 8, 2019
@agitter
Copy link
Member

agitter commented Jul 8, 2019

Another benefit of supporting LaTeX could be enabling Manubot-based writing of documents that have precise formatting requirements, like grant applications or university dissertations. I haven't tested this so it's unclear to me how much the pandoc tex template helps with that or whether the final formatting steps would have to be manual after the content is finalized. Perhaps this is the same idea as "branded PDFs" above.

@slochower
Copy link
Collaborator

slochower commented Jul 8, 2019

@agitter agreed, although I found the pandoc template system cumbersome. See this existing list: https://github.com/jgm/pandoc/wiki/User-contributed-templates. There are many $if$-$endif$ blocks.

@dhimmel
Copy link
Member

dhimmel commented Jul 30, 2019

@slochower for the BUILD_PDF_VIA_LATEX step, would it be possible to take the output .tex file from the earlier pandoc --to=latex command and pass it directly to the latex compiler? Is there any benefit to running pandoc twice? I was envisioning that once we had a .tex and .bib file, we would no longer need to use pandoc.

@slochower
Copy link
Collaborator

I think using the output of pandoc --to=latex should work, modulo the figure paths. I think when I did this earlier, I specified the figures in the current path, e.g., [Caption.](figure.png) in the Markdown (as usual). If you covert this to LaTeX (without having pandoc make the PDF for you), I think you'll need to either symlink the figures to the .tex directory or vice versa.

@dhimmel
Copy link
Member

dhimmel commented Aug 5, 2019

Here's a useful resource on different ways to install LaTeX on Travis CI. It mentions tectonic, which seems to be a more user-friendly xelatex (although I don't have a good understanding of how all the LaTeX infrastructure fits together).

@dhimmel
Copy link
Member

dhimmel commented Aug 5, 2019

I've got the pandoc --to=pdf --pdf-engine=xelatex workflow to produce a PDF. However, I'd like to see what Pandoc does to generate the LaTeX, so that we can potentially replicate it in an output/latex directory that could contain a standalone LaTeX source. Pandoc creates a temporary directory as part of the LaTeX processing, but there is no builtin way to retain that directory (see jgm/pandoc#2288).

Setting --pdf-engine-opt=-output-directory=output/latex did write some files including the pdf to output/latex/input.*, but then Pandoc erred with Error producing PDF.. Possible related discussion at jgm/pandoc#4721.

@habi
Copy link

habi commented Oct 27, 2020

Should we also output a .bib file with the reference metadata, or should this be included in the .tex?

Usually, from what I've seen submission systems that can ingest .tex files require the bibliography to be included in the file, so I'd vote for the latter.

@habi
Copy link

habi commented Nov 3, 2020

I'd also really like to have a LaTeX file as output, so I don't have to fiddle with pandoc myself.
I'm happy to do some more manual work with it (e.g. applying a template myself), but having a .tex file with bibliography would speed up the process greatly.

If manubot generates a .tex file somewhere, then https://github.com/xu-cheng/latex-action might be of help, which I've used to compile such a .tex to a PDF with GitHub Actions.

dhimmel added a commit to dhimmel/manubot-rootstock that referenced this issue Nov 3, 2020
dhimmel added a commit to dhimmel/manubot-rootstock that referenced this issue Nov 3, 2020
@dhimmel
Copy link
Member

dhimmel commented Nov 3, 2020

having a .tex file with bibliography would speed up the process greatly.

@habi I propose the simplest possible LaTeX export in #384. In #256, I tried to get the LaTeX to compile and render as PDF, which proved challenging. But perhaps having a .tex file will help you to a sufficient extent. So please check our #384 and let us know whether it works for your application.

dhimmel added a commit that referenced this issue May 29, 2021
trangdata added a commit to greenelab/iscb-diversity-manuscript that referenced this issue Aug 5, 2021
* setup.bash: interactive script to guide setup

merges manubot/rootstock#417
closes manubot/rootstock#401

* Add "gh repo create" to SETUP.md

merges manubot/rootstock#419
closes manubot/rootstock#418

Co-authored-by: Daniel Himmelstein <[email protected]>
Co-authored-by: Anthony Gitter <[email protected]>

* BUILD_LATEX for basic LaTeX manuscript

merges manubot/rootstock#384
refs manubot/rootstock#249

* Pandoc 2.14: update HTML plugins, CSL style, citekey syntax

merges manubot/rootstock#427

Co-authored-by: Daniel Himmelstein <[email protected]>
Co-authored-by: Anthony Gitter <[email protected]>

Co-authored-by: nfry321 <[email protected]>
Co-authored-by: Tiago Lubiana <[email protected]>
Co-authored-by: Daniel Himmelstein <[email protected]>
Co-authored-by: Anthony Gitter <[email protected]>
Co-authored-by: Vincent Rubinetti <[email protected]>
@agitter
Copy link
Member

agitter commented Sep 12, 2021

I'm sharing some notes from our Manubot manuscript that we exported to LaTeX for a conference submission. There are more details at greenelab/covid19-review#943.

We customized a LaTeX template for the conference style. Like @slochower, we also found the templating system cumbersome. The Manubot metadata didn't perfectly fit the template expectations so we had a bidirectional process of modifying the template and the metadata. Getting the authors to show up correctly was the trickiest part. We created a new metadata.yaml file using a Python script, in part because we were already modifying metadata.yaml programmatically because this conference submission was one piece of a larger project.

It's very helpful that newer versions of pandoc can convert the CSL JSON file Manubot produces into a .bib file. I typically submit a .bib file to a conference or journal instead of embedding references in a .tex file. We added this conversion step to the build script. We used a regex to strip out the note fields from the .bib file. We also used custom pandoc settings in a new yaml file, including cite-method: natbib for the references.

We didn't try to build a PDF with continuous integration. We used Manubot to get 95% of the way to submission automatically and then fine-tuned LaTeX issues in Overleaf before submitting.

@stain
Copy link

stain commented Oct 27, 2021

Here's an alternative way, if possibly more buggy:

If you use markdown module for lualatex (which is easily enabled in Overleaf), you can operate in "dual" mode by having an alternative document.tex that uses \markdownInput{../content/10.introduction.md} etc.

There will be some fights over things like figures and internal references.

See https://github.com/stain/ro-crate-paper/blob/master/latex/ro-crate.tex and workarounds for manubot in https://github.com/stain/ro-crate-paper/blob/master/build/build.sh#L14

This allowed us to edit the manuscript in Overleaf, while also having Manubot rendering using the Overleaf-GitHub sync

You have been warned - this approach will let you conform to the journal style - but will also come with lots of new caveats.

ploegieku added a commit to ploegieku/2023-functional-homology-paper that referenced this issue Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants