doesn't that make you wonder?

John MacFarlane jgm at berkeley.edu
Mon Oct 17 19:47:15 EDT 2011


+++ Emmanuel Bégué [Oct 17 11 11:06 ]:


> [But is it really true that there are inconsistent results for what

> may be called "Core Markdown" (as described in the original Gruber

> posts)?]


Yes, many. Gruber's informal syntax description underdescribes the syntax,
and different implementations have resolved ambiguities differently. So, for
example, the official syntax description says that asterisks and underscores
indicate emphasis. But it doesn't give clear rules about nesting.
Consider, for example, this input:

*test **test* test**

discount and sundown render it as:

<em>test **test</em> test**

PHP markdown, peg-markdown/multimarkdown, pandoc, and lunamark render it as:

*test <strong>test* test</strong>

maruku renders it as:

<em>test</em><em>test</em> test**

Markdown.pl is not a good guide, since it gives invalid HTML on this
input:

<em>test <strong>test</em> test</strong>

The other renderings could all be defended on the basis of the syntax
description. It simply isn't explicit enough.

It's easy to come up with other cases where different implementations
vary, e.g.

[hi `there](/url)`

The markdown syntax description doesn't say whether [ or ` takes
precedence. Discount, sundown, and lunamark let the [ take precedence,
while PHP markdown, peg-markdown, multimarkdown, and pandoc let the ` take
precedence. Both decisions are defensible, I think. Markdown.pl lets the `
take precedence, but nothing in the syntax description says this is the way
to go, and as noted above, Markdown.pl cannot be counted on for reliable
guidance.

These cases are unlikely to occur in real documents, but there are other
differences that come up quite frequently. So, some implementations (e.g.
pandoc, lunamark, peg-markdown, multimarkdown) require sublists to be indented
four spaces, while others (e.g. discount, sundown, Markdown.pl, PHP markdown)
do not. I have argued before on this list that the "four space rule" is
implicit in the markdown syntax specification. But it's not quite explicit,
hence the trouble. So you may find that what is a nested list on one markdown
implementation is a single-level list on another.

I made the case long ago that there should be a formal grammar for
markdown, but the idea was never popular on this list. I wrote
peg-markdown and lunamark to show that a formal specification was
possible. Not everyone will agree with the way these grammars resolve
ambiguities in the spec. And some may think that instead of a formal
grammar, we need an official parsing algorithm (such as HTML 5 has).
But until there is some kind of agreed-upon formal specification,
implementations will inevitably diverge, even on "core" syntax.

John



More information about the Markdown-Discuss mailing list