Backtick Hickup

Michel Fortin michel.fortin at michelf.com
Mon Aug 13 11:20:49 EDT 2007


Le 2007-08-12 à 23:23, Allan Odgaard a écrit :


> I would have expected it to see first two back-ticks, then scan

> forward until another two back-ticks are seen (since the open-token

> defines the close-token) and thus give this output:

>

> <p>Backtick: <code>\</code>`</p>

>

> I know most Markdown parsers do not follow conventional parser

> wisdom, but IMO this is also the interpretation that suits an

> incremental tokenizer/parser best compared to your interpretation,

> which requires a look-ahead to potentially the end of the document,

> each time one or more back-ticks are seen.


The look-ahead is until the end of the paragraph, not the end of the
document; at least it is in PHP Markdown and Markdown.pl (haven't
tested others) which first break the content into paragraphs, then
apply span-level rules inside them.

There is a lot of look-aheads in Markdown: emphasis won't be applied
if asterisks or underscores can't be matched in pairs; links won't be
links if there's no suitable parenthesis after the closing bracket,
Setext-style headers need the line of hyphens or equal signs
following its content, the parsing mode for list items depends on
whether or not it contains a blank line, etc.

There's no way to do a truly incremental parsing of Markdown... well,
you could in a way, but you'd have to mutate many parts of the output
document while parsing (like HTML parsers do in browsers), or to
delay the output of ambigus parts until the end of the document; all
this surely defeats the purpose of an incremental parser. The worst
"look-ahead" (or most complex "mutations") would be for reference-
style links which can have their definitions absolutely anywhere in
the document. Interestingly, that's probably one of the most
appreciated features of Markdown.


Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/




More information about the Markdown-Discuss mailing list