evolving the spec (was: forking Markdown.pl?)
John Fraser
john at attacklab.net
Mon Mar 3 14:37:14 EST 2008
On Mar 3, 2008, at 7:30 AM, Michel Fortin wrote:
> Allan Odgaard wrote:
>> 4. A regexp which is pushed onto a stack when entering the context of
>> this rule, and popped again when leaving this rule.
>>
>> The fourth item here is really the interesting part, because it is
>> what made Markdown nesting work (99% of the time) despite this being
>> 100% rule-driven.
>
> I'm not sure that the regular expression in 4 does, beside being
> pushed and popped from the stack (perhaps it's the end of block
> expression), but overall it looks pretty good, and is pretty similar
> to how I'm currently approaching the problem. There are a couple of
> subtleties I'm not sure if these rules can catch though.
I assume Allan let the grammar refer back to this stack as if it were
an ordinary rule, so you could use the stack to collect levels of
indentation. It's like a limited kind of parameterization. I'd been
planning to use recursive transformation to handle nesting, since it
makes memoization easier and ought to be a little more readable. But
I'll try Allan's idea if mine gets hairy.
I like the direction you're both going, and I'm hoping we can come up
with a definition that doesn't use any English at all. Admittedly,
that'll be a lot easier for a version that does change some behavior
at the edges -- like ditching Markdown's 'undocumented *precedence'
rules* (<http://six.pairlist.net/pipermail/markdown-discuss/2007-August/000746.html
>).
I'm going to build my own little prototype to experiment with this
stuff (<http://six.pairlist.net/pipermail/markdown-discuss/2008-February/001042.html
>). My goal is to come up with a formal grammar that doubles as a
(slow) reference implementation. You'll feed a grammar and an input
file into a generic text-munging tool, which will spit out either the
transformed output or an AST. The tool will be small, easy to port,
and completely general -- you could use it to implement html2txt or
smartypants or an HTML sanitizer, for example. That's the plan,
anyway; we'll how the first iteration turns out.
> The way I see it, rules need to be parametrized so the above can be
> changed without having to define 2^(number of syntax elements)
> rules, such as EmphasisWithinLink, LinkWihtinEmphasis,
> CodeSpanWithinLinkWithinEmphasis, and so on.
Since I'm doing something packrat-ish, I'm hoping I can use lookahead
to keep the rules from exploding.
John Fraser
More information about the Markdown-Discuss
mailing list