Markdown Extra Spec: Parsing Section
John MacFarlane
jgm at berkeley.edu
Mon May 12 18:14:32 EDT 2008
+++ Michel Fortin [May 11 08 22:26 ]:
> Le 2008-05-11 à 20:55, Jacob Rus a écrit :
>
>> You should write it in something closer to a BNF-like format. The
>> current version is about 10x more verbose than necessary, and it makes
>> reading the spec considerably more difficult.
>
> The reason I'm doing it like this is that I doubt everything will be
> expressible in a BNF format.
You can come pretty close with a PEG grammar:
http://github.com/jgm/peg-markdown/tree/master/markdown_parser.leg#L236
I have implemented the basic markdown syntax + the footnote syntax from
PHP markdown extra, and so far I've found only two things that can't be
cleanly expressed using a PEG:
1. Indented block contexts like lists and blockquotes. Here I use a
multi-pass approach. The first pass takes, say, a list item
1. my list item
- with
- nested list
and returns a listitem with "raw" contents
my list item
- with
- nested list
which are piped through the markdown parser again.
2. Inline code. PEG can't express "a row of backticks, followed
by a string of characters not containing an equally long row of
backticks, followed by an equally long row of backticks."
It can express, for particular values of N, "a row of N backticks,
followed by a string of characters not containing a row of N
backticks, followed by a row of N backticks." So if you have
a fixed limit on the number of backticks that can start a stretch
of inline code, you're okay. peg-markdown sets this limit at 5,
which should be enough for most purposes. But one could set it
higher without much of a performance penalty.
The PEG representation is concise, precise, and readable.
But the big advantage is that it can be converted automatically into a
fast parser. This means that you can be sure that your markdown program
really does implement the formal specification. An informal English
specification won't give you that.
John
More information about the Markdown-Discuss
mailing list