Markdown validity Re: Agreeing on "Historical Markdown"

Sat Jul 12 19:48:49 EDT 2014

> On Jul 12, 2014, at 6:23 PM, Sean Leonard <dev+ietf at seantek.com> wrote:
> 
> On 7/12/2014 12:31 PM, Waylan Limberg wrote:
>>> On Jul 12, 2014, at 2:52 PM, Michel Fortin <michel.fortin at michelf.ca> wrote:
>>> [snip]
>>> When you have a question like this, just try it Babelmark 2:
>>> http://johnmacfarlane.net/babelmark2/?normalize=1&text=%3Cdiv%3E
>> Yes, that's what we all do. And to answer your other question, notice that only two of the implementations on Babelmark2 failed. Remember, most of these implementations were written to be run on web servers. We can't have our web servers crashing just because a user submitted invalid markdown. What a parser doesn't understand is just passes through. What it misunderstands is garbles but it is specifically designed to never choke.
>> 
>> As Michel alluded to, most parsers are simply a series of regular expression substitutions which are run in a predetermined order. If a regex never matches a part of the text, then that part passes through untouched. Yes, that means the HTML is parsed by regex - which we all know is a bad idea -- but it is not really parsed in the way that browsers parse HTML. The regex just finds anything surrounded by angle brackets and ignores it. With the exception of the limited block level stuff, we don't even care if there are opening and/or closing tags. Yes, that can result in improperly nested stuff, but that is the authors fault and the parser should not bring the whole server down for that. The Author can (should?) preview in a browser and fix it before publishing.
>> 
>> However, I should point out that while the above describes most parsers (as most are more or less direct ports of markdown.pl - which works this way), there are a few that use other methods under the hood. For example, a few generate a parse tree which is then fed into a renderer (I believe Pandoc works like that, which allows it to output many more formats than just HTML), but they are the rare exception.
> 
> I see.
> 
> Here is a real-world example of what I was citing:
> http://johnmacfarlane.net/babelmark2/?text=Hello+I+am+some+*text*.%0A%3Cdiv%3EHello+%3Ca+href%3D%22http%3A%2F%2Fwww.example.com%2F%22%3Ethat+is+nice%3C%2Fa%3E+chance+%26+circumstance%26hellip%3B%0A%0AThe+end.
> 
> Truly, it looks like there is great diversity in Markdown-land.
> 
> Ok, so any standard mentioning Historical Markdown cannot say that any particular behavior is normative when it comes to HTML validity. Some check for HTML (island) validity and behave differently; others don't. The end...I guess.

Yes, but select "normalize" (which normalizes insignificant white space in the output), and the number of variations decreases. Unfortunately, there is absolutely no standardization in how the various implementations handle white space (I don't think I've seen two that match exactly in every corner case). Either way though, hit the "preview" button (top right of output) to see how the browser renders the output and all but a couple render in the browser exactly the same.

And that is what makes markdown so great. You don't need to know or understand HTML to write it if you are using markdown. And if you have only an elementary knowledge of HTML, you can break into HTML on those few occasions when markdown won't do what you need.

Waylan