Backtick Hickup
Eric Astor
eastor1 at swarthmore.edu
Mon Aug 27 17:02:34 EDT 2007
Michel Fortin wrote:
> As to how to parse it with an incremental parser, I assume you could do
> that:
>
> text: this
> mark: **
> text: is
> mark: `
> (switch tokenizer into "raw" mode until it sees a backtick)
> text: raw** text
> mark: `
> (take last text token, remove backtick marks, and make a code span)
> (switch back tokenizer into "span" mode)
> end reached in span
>
> The hard part comes when no matching backtick is found (assuming
> non-paired backticks do not constitute code). Here's what I suggest for
> the same case with no ending backtick:
>
> text: this
> mark: **
> text: is
> mark: `
> (switch tokenizer into "raw" mode until it sees a backtick)
> text: raw** text
> end reached in raw
> (reparse last text token in "span" mode)
> text: raw
> mark: **
> (take tokens between the two ** marks and put them in emphasis,
> the two marks are removed)
> text: text
> end
>
> Note that in this case backtracking is limited to the last token, which
> is itself limited in length by the current block (paragraph, list item,
> ...). I have no idea how that could fit any formal grammar language
> however.
Well - has anyone else looked into ANTLR 3.0 at all? The LL(*) grammar
language it uses (an EBNF) allows for full backtracking support, and
unspecified lookahead as far as necessary. It's fairly well-optimized,
as I understand it, taking advantage of some of the packrat-parsing
ideas to save handling a single text section repeatedly...
I suspect Markdown might be formally specifiable in ANTLR v3, and I'd
bet that even if it's not, it's very close. If it is - getting Markdown
parsers into various languages would just be a matter of helping develop
new ANTLR v3 language-translation backends.
- Eric Astor
More information about the Markdown-Discuss
mailing list