Markdown validity Re: Agreeing on "Historical Markdown"

Waylan Limberg waylan.limberg at icloud.com
Sat Jul 12 15:31:05 EDT 2014


> On Jul 12, 2014, at 2:52 PM, Michel Fortin <michel.fortin at michelf.ca> wrote:
> [snip]
> When you have a question like this, just try it Babelmark 2:
> http://johnmacfarlane.net/babelmark2/?normalize=1&text=%3Cdiv%3E

Yes, that's what we all do. And to answer your other question, notice that only two of the implementations on Babelmark2 failed. Remember, most of these implementations were written to be run on web servers. We can't have our web servers crashing just because a user submitted invalid markdown. What a parser doesn't understand is just passes through. What it misunderstands is garbles but it is specifically designed to never choke.

As Michel alluded to, most parsers are simply a series of regular expression substitutions which are run in a predetermined order. If a regex never matches a part of the text, then that part passes through untouched. Yes, that means the HTML is parsed by regex - which we all know is a bad idea -- but it is not really parsed in the way that browsers parse HTML. The regex just finds anything surrounded by angle brackets and ignores it. With the exception of the limited block level stuff, we don't even care if there are opening and/or closing tags. Yes, that can result in improperly nested stuff, but that is the authors fault and the parser should not bring the whole server down for that. The Author can (should?) preview in a browser and fix it before publishing.

However, I should point out that while the above describes most parsers (as most are more or less direct ports of markdown.pl - which works this way), there are a few that use other methods under the hood. For example, a few generate a parse tree which is then fed into a renderer (I believe Pandoc works like that, which allows it to output many more formats than just HTML), but they are the rare exception.

Waylan


More information about the Markdown-Discuss mailing list