evolving the spec (was: forking Markdown.pl?)
michel.fortin at michelf.com
Mon Mar 3 07:30:01 EST 2008
Allan Odgaard wrote:
> Though without changing a lot of edge-case behavior, I find it hard
> to see Markdown using such rule-based implementation, so personally
> I am favoring a new Markdown-inspired language.
For my part, I'm currently trying to specify parsing rules Markdown
Extra, and make the specification usable to parse Markdown too. The
idea is to preserve the way it is working now, but to handle edge
cases in a consistent and predictable manner. What I want to achieve
is interoperability between implementations for the current Markdown
and Markdown Extra languages, not creating a new look-alike language.
> The problem so far has been that the formal syntax normally used to
> define grammars does not support Markdown’s notion of embedding, but
> as mentioned here http://six.pairlist.net/pipermail/markdown-discuss/2008-February/001002.html
> I have had some success with a rule-based implementation that uses
> a stack for aggregating rules that needs to be applied to the
> current line before it is handed to the regular parser -- this
> allows a specification without code and which is unambiguous to edge-
> cases since the rules are exhaustive, unlike a document written in
I'd like to point out a thing: you can always write in english what
you can with a formal grammar; if you write things correctly, they'll
be precise and unambiguous. This has the disadvantage of being more
verbose, but the advantage that you don't need to learn a new
"language", which is the grammar.
That said, I'm currently looking at how to specify Markdown formally.
Whether to use a grammar or english, that is to be decided later. I'm
looking at the general form of a rule, and I find the post you linked
above gives a pretty good insight at what I need. Each rule in your
lost rule-based implementation had this (quoting):
> 1. A regexp that makes the parser enter the context the rule
> represents (e.g. block quote, list, raw, etc.).
> 2. A list of which rules are allowed in the context of this rule.
> 3. A regexp for leaving the context of this rule.
> 4. A regexp which is pushed onto a stack when entering the context of
> this rule, and popped again when leaving this rule.
> The fourth item here is really the interesting part, because it is
> what made Markdown nesting work (99% of the time) despite this being
> 100% rule-driven.
I'm not sure that the regular expression in 4 does, beside being
pushed and popped from the stack (perhaps it's the end of block
expression), but overall it looks pretty good, and is pretty similar
to how I'm currently approaching the problem. There are a couple of
subtleties I'm not sure if these rules can catch though.
In my idea, you'd have parametrized rules. For instance, the list of
allowed rules (2) should change depending on the context: you
shouldn't have a link within a link, but you can have emphasis in your
link; therefore, the emphasis rule when within a link shouldn't have a
link rule in it's list of sub rules (2). You also need a way for the
regular expression in 3 to be variable depending on what you caught in
1 (to match the same number of backticks in a code span for instance;
to catch a matching closing HTML tag, etc.).
The way I see it, rules need to be parametrized so the above can be
changed without having to define 2^(number of syntax elements) rules,
such as EmphasisWithinLink, LinkWihtinEmphasis,
CodeSpanWithinLinkWithinEmphasis, and so on.
michel.fortin at michelf.com
More information about the Markdown-Discuss