Re: Formal Grammar — some thoughts
    Allan Odgaard 
    29mtuz102 at sneakemail.com
       
    Sat Jul 29 18:02:05 EDT 2006
    
    
  
On 29/7/2006, at 23:22, Eric Astor wrote:
>> 1. interpreting tokens as literal text when end token is missing,
>> example: `this is __not starting bold`.
>
> This is actually simple to deal with in most formal grammars -  
> since formal
> grammars are recursive, you simply define bold (for example) as:
> bold := ('__' SPAN '__') | ('**' SPAN '**')
Well, yes, you can put that in your formal grammar, but the generated  
parser will have a problem. Parsers generally tokenize the text and  
then go through it token-by-token selecting which rule to pick.
So this parser will only see the `__` token (not what follows) and  
will then pick the bold rule. If we have defined SPAN as not  
containing any `\n`, then when it reaches end-of-line it will give  
the error that it sees `\n` but expected `__`.
Given a sufficiently large look-ahead (in parser terms, i.e. looking  
at the next n tokens) and defining some dummy rules to deal with  
isolated `__` it could possibly be pulled off, but it could likely  
still be fooled.
A slightly related problem is the ambiguity when seeing `___` in the  
text. That will be tokenized as the two tokens `__` and `_`, i.e.  
first start bold, then italic. But the entire line could be: `___bold  
and__ only italic_`.
I.e. in this particular case it should have been tokenized as `_` and  
`__`.
A workaround would be using `*` for either the bold or italic. I.e.  
the strict parser would disallow three consecutive `*` or `_` if and  
only if bold has a longer span than italic.
    
    
More information about the Markdown-Discuss
mailing list