The One with the Thoughts of Frans

Pandoc Markdown Over Straight LaTeX

I familiarized myself with LaTeX because I like HTML better than word processors. In fact, I disprefer word processors. LibreOffice Writer can do a fairly decent job of WYSIWYM (What You See Is What You Mean), but in many ways I like it less than HTML. So why don’t I just use HTML, you ask? Quite simply, HTML isn’t necessarily the best option for print.

Prince does a great job generating printable PDFs, but even though writing straight HTML is easy enough and adds many benefits, I mostly only prefer it over your run of the mill text editing software. Besides, I wanted to profit from BibTeX reference management, which tends to come along with LaTeX.

Clearly then, LaTeX has some nice features. Unfortunately, it shares many of HTML’s flaws and adds some others: \emph{} is at best marginally easier to type than <em></em>, but I find it somewhat harder to read. Besides which, converting LaTeX to other formats like HTML can be a pain.

On the good side, LaTeX and HTML also share many features. Both depend on plain-text files, which is great because you can open them on any system, and because you can use versioning software. Binary blobs and compressed zip files are also more prone to data loss in case of damage. The great thing about versioning software isn’t necessarily that you can go back to a former version, but the knowledge that you can go back. Normally I’m always busy commenting out text or putting it at the bottom, but when it’s versioned I feel much more free about just deleting it. Maybe I’ll put some of it back in later, but it lets the machine take the work off of my hands. I know, Writer, Word, et cetera can do this too, but did I mention I prefer plain text anyway?

Where LaTeX really shines is its reference management, math support without having to use incomprehensible gibberish like MathML or some odd equation editor, and its typographical prowess. On top of the shared features with HTML, those features are why I looked into LaTeX in the first place. So how can I get those features without being bothered by the downsides of HTML and LaTeX? As it turns out, the answer is Pandoc’s variant of Markdown.

In practice, I rarely need more than what Pandoc’s Markdown can give me. It’s HTML-focused, which I like because I know HTML, but you can insert math (La)TeX-style between $ characters. It also comes with its own citation reference system, which it changes to BibLaTeX citations upon conversion to LaTeX. As these things go, I wasn’t the first with this idea.

Of course it won’t do to repeat myself on the command line constantly, so I wrote a little conversion helper script:

#!/bin/bash
#generate-pdf.sh

BASENAME=your-text-file-without-extension
# I compiled an updated version of Pandoc locally.
PANDOC_LOCAL=~/.cabal/bin/pandoc

if [ -x $PANDOC_LOCAL ];
then
   PANDOC=$PANDOC_LOCAL
else
   PANDOC=pandoc
fi

# Output to HTML5.
$PANDOC \
$BASENAME.md \
--to=html5 \
--mathml \
--self-contained \
--smart \
--csl modern-language-association-with-url.csl \
--bibliography $BASENAME-bibliography.bib \
-o $BASENAME.html

# Output to $BASENAME-body.tex
# $BASENAME.tex has this file as input
$PANDOC \
$BASENAME.md \
--smart \
--biblatex \
--bibliography $BASENAME-bibliography.bib \
-o $BASENAME-body.tex

# Pandoc likes to output p.~ or pp.~ in its \autocite, but I just want the numbers.
sed -i 's/\\autocite\[p.~/\\autocite\[/g' $BASENAME-body.tex
sed -i 's/\\autocite\[pp.~/\\autocite\[/g' $BASENAME-body.tex
# It would probably suffice to just do this but I don't want any nasty surprises:
#sed -i 's/p.~//g' $BASENAME-body.tex
#sed -i 's/pp.~//g' $BASENAME-body.tex

# If ever bored, consider adding something to change \autocite[1-2] into \autocite[1--2]

# Generate the PDF.
lualatex $BASENAME
biber $BASENAME
lualatex $BASENAME
lualatex $BASENAME

# Remove these files after the work is done.
rm  \
$BASENAME.aux \
$BASENAME.bbl \
$BASENAME.blg \
$BASENAME.bcf \
$BASENAME.run.xml \
$BASENAME.toc \
#$BASENAME-body.tex

Something that may not be immediately obvious from the script is that I’ve also got a $BASENAME.tex file. This contains all of my relevant settings, but instead of the main content it contains \input{basename-body.tex}. There are some prerequisites for working with Pandoc-generated LaTeX, for instance:

%for pandoc table output (needs ctable for 1.9; longtable for 1.10)
\usepackage{longtable}

I haven’t yet made up my mind on what to do about splitting up chapters in different files, but it hasn’t bothered me yet.

There you have it. That’s my way of keeping thing simple while still profiting from LaTeX typesetting.

1 Comment

  1. An impressive use of Pandoc Markdown can be found on http://rhythmus.be/md2indd/.

    March 18, 2017 @ 11:44Permalink
    Frans

RSS feed for comments on this post· TrackBack URI

Leave a Comment

You must be logged in to post a comment.