Web Analytics

tips@manuel:~$

How to convert almost any document from the Linux console

I was looking for an online tool to convert some Markdown (.md) files to Microsoft Word’s format (.docx) and I found something even better. Pandoc is an open source application that, as its name suggests (“pan” meaning “all, every”), can convert between almost any document format imaginable. Its usage is very simple. We install it from the repositories of our Linux distribution; example, for Arch Linux it’d be:

# pacman -S pandoc

And the syntax to use it is this:

$ pandoc -o <output file> <input file>

The parameters would make more sense reversed in my opinion, but it does the trick. For my case it was then:

$ pandoc -o my_document.docx my_document.md

Pandoc will automatically detect the formats based on the file extensions, but in case we’d need to specify them we can also do it this way:

$ pandoc -t <output format> -f <input format> -o <output file> <input file>

Once again exemplifying with my case:

$ pandoc -t docx -f markdown -o my_document.docx my_document.md

The conversion was almost instantaneous. I was expecting the resulting format to have been all jumbled, but no, it was perfect (though it’s hard to misread Markdown).

Pandoc even does have an online tool though it’s mostly for testing purposes and it only has a tiny sample of the features available in the installable tool. For instance, the option to convert from Markdown to Microsoft Word is not there, even though the actual tool was perfectly capable of doing so.

This is a list of all the input and output formats supported at the moment of writing this post by version 2.10.1 of Pandoc:

Input formats

  • commonmark (CommonMark Markdown)
  • commonmark_x (CommonMark Markdown with extensions)
  • creole (Creole 1.0)
  • csv (CSV table)
  • docbook (DocBook)
  • docx (Word docx)
  • dokuwiki (DokuWiki markup)
  • epub (EPUB)
  • fb2 (FictionBook2 e-book)
  • gfm (GitHub-Flavored Markdown), or the deprecated and less accurate markdown_github ;use markdown_github only if you need extensions not supported in gfm .
  • haddock (Haddock markup)
  • html (HTML)
  • ipynb (Jupyter notebook)
  • jats (JATS XML)
  • jira (Jira / Confluence wiki markup)
  • json (JSON version of native AST)
  • latex (LaTeX)
  • markdown (Pandoc’s Markdown)
  • markdown_mmd (MultiMarkdown)
  • markdown_phpextra (PHP Markdown Extra)
  • markdown_strict (original unextended Markdown)
  • mediawiki (MediaWiki markup)
  • man (roff man)
  • muse (Muse)
  • native (native Haskell)
  • odt (ODT)
  • opml (OPML)
  • org (Emacs Org mode)
  • rst (reStructuredText)
  • t2t (txt2tags)
  • textile (Textile)
  • tikiwiki (TikiWiki markup)
  • twiki (TWiki markup)
  • vimwiki (Vimwiki)

Output formats

  • asciidoc (AsciiDoc) or asciidoctor (AsciiDoctor)
  • beamer (LaTeX beamer slide show)
  • commonmark (CommonMark Markdown)
  • commonmark_x (CommonMark Markdown with extensions)
  • context (ConTeXt)
  • docbook or docbook4 (DocBook 4)
  • docbook5 (DocBook 5)
  • docx (Word docx)
  • dokuwiki (DokuWiki markup)
  • epub or epub3 (EPUB v3 book)
  • epub2 (EPUB v2)
  • fb2 (FictionBook2 e-book)
  • gfm (GitHub-Flavored Markdown), or the deprecated and less accurate markdown_github ;use markdown_github only if you need extensions not supported in gfm .
  • haddock (Haddock markup)
  • html or html5 (HTML, ie HTML5 / XHTML polyglot markup)
  • html4 (XHTML 1.0 Transitional)
  • icml (InDesign ICML)
  • ipynb (Jupyter notebook)
  • jats_archiving (JATS XML, Archiving and Interchange Tag Set)
  • jats_articleauthoring (JATS XML, Article Authoring Tag Set)
  • jats_publishing (JATS XML, Journal Publishing Tag Set)
  • jats (alias for jats_archiving )
  • jira (Jira / Confluence wiki markup)
  • json (JSON version of native AST)
  • latex (LaTeX)
  • man (roff man)
  • markdown (Pandoc’s Markdown)
  • markdown_mmd (MultiMarkdown)
  • markdown_phpextra (PHP Markdown Extra)
  • markdown_strict (original unextended Markdown)
  • mediawiki (MediaWiki markup)
  • ms (roff ms)
  • muse (Muse),
  • native (native Haskell),pandoc 2.10.1July 23, 20204
  • odt (OpenOffice text document)
  • opml (OPML)
  • opendocument (OpenDocument)
  • org (Emacs Org mode)
  • pdf (PDF)
  • plain (plain text),
  • pptx (PowerPoint slide show)
  • rst (reStructuredText)
  • rtf (Rich Text Format)
  • texinfo (GNU Texinfo)
  • textile (Textile)
  • slideous (Slideous HTML and JavaScript slide show)
  • slidy (Slidy HTML and JavaScript slide show)
  • dzslides (DZSlides HTML5 + JavaScript slide show),
  • revealjs (reveal.js HTML5 + JavaScript slide show)
  • s5 (S5 HTML and JavaScript slide show)
  • tei (Simple TEI)
  • xwiki (XWiki markup)
  • zimwiki (ZimWiki markup)

Via: Joe Leech