evolving the spec (was: forking Markdown.pl?)

John Fraser john at attacklab.net
Mon Mar 3 14:37:14 EST 2008

On Mar 3, 2008, at 7:30 AM, Michel Fortin wrote:

> Allan Odgaard wrote:

>> 4. A regexp which is pushed onto a stack when entering the context of

>> this rule, and popped again when leaving this rule.


>> The fourth item here is really the interesting part, because it is

>> what made Markdown nesting work (99% of the time) despite this being

>> 100% rule-driven.


> I'm not sure that the regular expression in 4 does, beside being

> pushed and popped from the stack (perhaps it's the end of block

> expression), but overall it looks pretty good, and is pretty similar

> to how I'm currently approaching the problem. There are a couple of

> subtleties I'm not sure if these rules can catch though.

I assume Allan let the grammar refer back to this stack as if it were
an ordinary rule, so you could use the stack to collect levels of
indentation. It's like a limited kind of parameterization. I'd been
planning to use recursive transformation to handle nesting, since it
makes memoization easier and ought to be a little more readable. But
I'll try Allan's idea if mine gets hairy.

I like the direction you're both going, and I'm hoping we can come up
with a definition that doesn't use any English at all. Admittedly,
that'll be a lot easier for a version that does change some behavior
at the edges -- like ditching Markdown's 'undocumented *precedence'
rules* (<http://six.pairlist.net/pipermail/markdown-discuss/2007-August/000746.html

I'm going to build my own little prototype to experiment with this
stuff (<http://six.pairlist.net/pipermail/markdown-discuss/2008-February/001042.html
>). My goal is to come up with a formal grammar that doubles as a
(slow) reference implementation. You'll feed a grammar and an input
file into a generic text-munging tool, which will spit out either the
transformed output or an AST. The tool will be small, easy to port,
and completely general -- you could use it to implement html2txt or
smartypants or an HTML sanitizer, for example. That's the plan,
anyway; we'll how the first iteration turns out.

> The way I see it, rules need to be parametrized so the above can be

> changed without having to define 2^(number of syntax elements)

> rules, such as EmphasisWithinLink, LinkWihtinEmphasis,

> CodeSpanWithinLinkWithinEmphasis, and so on.

Since I'm doing something packrat-ish, I'm hoping I can use lookahead
to keep the rules from exploding.

John Fraser

More information about the Markdown-Discuss mailing list