Re: Formal Grammar — some thoughts
Allan Odgaard
29mtuz102 at sneakemail.com
Sat Jul 29 18:02:05 EDT 2006
On 29/7/2006, at 23:22, Eric Astor wrote:
>> 1. interpreting tokens as literal text when end token is missing,
>> example: `this is __not starting bold`.
>
> This is actually simple to deal with in most formal grammars -
> since formal
> grammars are recursive, you simply define bold (for example) as:
> bold := ('__' SPAN '__') | ('**' SPAN '**')
Well, yes, you can put that in your formal grammar, but the generated
parser will have a problem. Parsers generally tokenize the text and
then go through it token-by-token selecting which rule to pick.
So this parser will only see the `__` token (not what follows) and
will then pick the bold rule. If we have defined SPAN as not
containing any `\n`, then when it reaches end-of-line it will give
the error that it sees `\n` but expected `__`.
Given a sufficiently large look-ahead (in parser terms, i.e. looking
at the next n tokens) and defining some dummy rules to deal with
isolated `__` it could possibly be pulled off, but it could likely
still be fooled.
A slightly related problem is the ambiguity when seeing `___` in the
text. That will be tokenized as the two tokens `__` and `_`, i.e.
first start bold, then italic. But the entire line could be: `___bold
and__ only italic_`.
I.e. in this particular case it should have been tokenized as `_` and
`__`.
A workaround would be using `*` for either the bold or italic. I.e.
the strict parser would disallow three consecutive `*` or `_` if and
only if bold has a longer span than italic.
More information about the Markdown-Discuss
mailing list