Re: Formal Grammar — some thoughts

Allan Odgaard 29mtuz102 at sneakemail.com
Sun Jul 30 21:29:50 EDT 2006


On 30/7/2006, at 22:34, Michel Fortin wrote:


> [...] I'd like to point out that in my view John's implementation

> is already doing tokenization in some form [...]


Well, this here [1] is what people generally refer to when speaking
of tokenizing input.


> [...] For example, let's create a link with a new "tokenized" way

> from this:

>

> __some text [with a link__ oh!](somewhere)

>

> [...] See? No invalid nesting anymore!


Now try the same on these two lines of text:

This `is raw [text`](#)

This is a [`link](#) and more text`

If you choose to replace links with an md5 first, then the result of
converting the first line will be wrong, whereas if you choose to
convert raw first, the second line will be wrong.

This is easy to handle with a real parser, actually, even a regexp
can do it. There is little need for this multi-pass content
obfuscation paradigm currently being used ;)


> [...] This is far from having a formal grammar, but it shows that a

> lot more could be done by reusing the current approach.


Well, yes, a lot more can be done. But I think the energy would be
better spent trying to move toward a more formal grammar and more
standard parsing mechanisms. This is quite a challenge, and it can’t
be done without revising some parts of the syntax, OTOH the
problematic parts (e.g. nested block elements) is often not handled
consistently (or properly) by the current implementation, so I’d
think it would be possible to tweak this a bit.


[1] http://en.wikipedia.org/wiki/Lexer



More information about the Markdown-Discuss mailing list