Universal syntax for Markdown

Fletcher T. Penney fletcher at fletcherpenney.net
Mon Aug 15 20:53:24 EDT 2011

I've trimmed John's original message to respond to particular points.

On Aug 13, 2011, at 6:00 PM, John MacFarlane wrote:

> 1. Many people have mentioned "rule #1" - markdown should look natural and

> readable just by itself. I strongly agree. In my own tinkerings, I've also

> insisted on another principle, which Fletcher also articulated, but which I

> think would not be accepted by everyone on this list:


> Format-independence: Markdown is not just for writing HTML.


> I've seen people on this list say, "why do you need extension X, when you

> can just include raw HTML?" To which I reply: "Because I want to be

> able to convert my document to LaTeX, where the raw HTML won't do much good."

> It's true that John Gruber presented markdown primarily as a readable shortcut

> to HTML. But I don't see why we should keep thinking about it this way, when

> tools like pandoc and multimarkdown can easily convert markdown to a variety

> of formats. Indeed, one of the main reasons I write in markdown whenever I

> can is that I'm not tied to a single output format. I can have a canonical

> document that can be converted reliably to just about any text format.

This was the original reason I created MultiMarkdown --- it was too obvious of a next step *not* to create a means to convert Markdown to other output formats. By definition, MMD will never give up this functionality if that is a requirement of a "standard."

> 2. We really need to clarify the rules for indented lists. As I've

> argued before on this list, the markdown documentation at least strongly

> implies that sublists need to be indented by four spaces, but many

> implementations (including Markdown.pl) don't insist on this.

MMD 3.0 was derived from peg-markdown --- I have been very happy with the decisions made for the grammar and rules definitions thus far. In my mind, there may still be a few ambiguities, but I am willing to side with peg-markdown with almost all of them.

> 3. I think most people agree that changing from ordered to bulleted

> list markers should start a new list (discussed earlier on this mailing

> list).


> 4. I think the opening number of an ordered list should be significant.

I could agree with the opening number being significant, but I think that ignoring other numbers in the list is a feature, not a bug.

> 5. My own preference would be to require a blank line before a heading

> or blockquote, to avoid unexpected results.


> 5. Tables -- here there's a significant divergence between pandoc, PHP

> markdown extra, and multimarkdown. A limitation of pandoc's tables is

> that they require a monospaced font, since they rely on column

> alignment. The advantage is that they look exactly like tables. In

> addition, they allow table cells that contain whole paragraphs, and

> even arbitrary block-level content -- whereas, if I understand the

> documentation correctly, PHP markdown extra only allows simple tables

> with one-line cells. The philosophical differences here may be too

> deep for convergence.

I think table convergence is going to be tricky, and widely dependent on which principles are held to be most important:

* flexibility
* ease of creation for a person
* ease of creation for a computer
* ease of reading in monospace font
* ease of reading in variable width font
* ability to survive when opened and saved in text editor that is not based on monospaced fonts/assumptions

> 6. Metadata -- multimarkdown's system is simple, flexible, and

> readable. One reservation I have about it is that it is

> English-centric -- nobody wants to write 'Title' at the beginning

> of a Swedish document -- but that could be solved by localization.

> It also seems a bit pedantic to have to say 'Title' if that's all you have.

> Pandoc's system is convenient and doesn't use English keywords, but it's not

> flexible enough, and I've been thinking about alternatives.

Very few of the metadata keys are "hard-coded". Some of those that are hard-coded are no more language dependent than HTML (e.g. Title). Certainly it could be localized as desired by any user. The downside of that, however, would be reduced compatibility.

> 7. Image/link attributes -- the difficulty here is respecting

> format-independence. Saying that an image is 200px is not going

> to be helpful if you're targeting both HTML and LaTeX.

Actually, MMD does a pretty good job of handling many image attributes across all supported output formats.

> 8. Citations -- I think multimarkdown's citation system is a step

> in the right direction, but too unambitious to make part of a standard.

> We put a lot of thought into a good markdown citation format on

> pandoc-discuss, and came up with this:

> http://johnmacfarlane.net/pandoc/README#citations

> This gives you automatic bibliographies and citations, with configurable

> styles -- you can even move between footnote styles and parenthesized

> inline references -- and still looks pretty natural.

There's definitely a high barrier here. For my purposes, I either wanted something as flexible and powerful as BibTeX, or I wanted something easy. I stuck with supporting BibTeX when power was needed, and sticking to what is basically a "free text" format otherwise, which allows the user limitless flexibility at the cost of having to manually specify the appearance and formatting of your entries.

Perhaps something like Citeproc could be a good option...

> 9. Definition lists -- Pandoc is pretty similar to PHP Markdown Extra,

> but only supports one term per definition. HTML definition lists support

> multiple terms, but this doesn't make sense in many other output

> formats, and I don't think it's necessary.

Perhaps, but I've had many users who needed this flexibility in definition lists.

> 10. Nesting/precedence -- this is probably less of a concern in

> practice, but there seems to be no standard for parsing nested

> inline elements. For example, consider the input

> '[hi `there] friend`](/url)'. Markdown.pl parses this as a link,

> and discount doesn't. I don't see anything in the Markdown syntax

> description that resolves the ambiguities here. Similarly for

> nested emph and strong -- Michel Fortin's MDTest suite contains

> some opinionated tests for these, but I'm not sure what the principle

> behind them is.

I agree that a consistent definition would be helpful, and I think something like your PEG could be a good basis for this.


Fletcher T. Penney
fletcher at fletcherpenney.net

More information about the Markdown-Discuss mailing list