Backtick Hickup

Eric Astor eastor1 at swarthmore.edu
Mon Aug 27 17:02:34 EDT 2007


Michel Fortin wrote:

> As to how to parse it with an incremental parser, I assume you could do

> that:

>

> text: this

> mark: **

> text: is

> mark: `

> (switch tokenizer into "raw" mode until it sees a backtick)

> text: raw** text

> mark: `

> (take last text token, remove backtick marks, and make a code span)

> (switch back tokenizer into "span" mode)

> end reached in span

>

> The hard part comes when no matching backtick is found (assuming

> non-paired backticks do not constitute code). Here's what I suggest for

> the same case with no ending backtick:

>

> text: this

> mark: **

> text: is

> mark: `

> (switch tokenizer into "raw" mode until it sees a backtick)

> text: raw** text

> end reached in raw

> (reparse last text token in "span" mode)

> text: raw

> mark: **

> (take tokens between the two ** marks and put them in emphasis,

> the two marks are removed)

> text: text

> end

>

> Note that in this case backtracking is limited to the last token, which

> is itself limited in length by the current block (paragraph, list item,

> ...). I have no idea how that could fit any formal grammar language

> however.


Well - has anyone else looked into ANTLR 3.0 at all? The LL(*) grammar
language it uses (an EBNF) allows for full backtracking support, and
unspecified lookahead as far as necessary. It's fairly well-optimized,
as I understand it, taking advantage of some of the packrat-parsing
ideas to save handling a single text section repeatedly...

I suspect Markdown might be formally specifiable in ANTLR v3, and I'd
bet that even if it's not, it's very close. If it is - getting Markdown
parsers into various languages would just be a matter of helping develop
new ANTLR v3 language-translation backends.

- Eric Astor


More information about the Markdown-Discuss mailing list