Backtick Hickup
    Michel Fortin 
    michel.fortin at michelf.com
       
    Tue Aug 14 10:45:57 EDT 2007
    
    
  
Le 2007-08-13 à 21:27, Allan Odgaard a écrit :
> On Aug 13, 2007, at 10:20 AM, Michel Fortin wrote:
>
>> Le 2007-08-12 à 23:23, Allan Odgaard a écrit :
>>
>>> I would have expected it to see first two back-ticks, then scan  
>>> forward until another two back-ticks are seen (since the open- 
>>> token defines the close-token) and thus give this output:
>>>
>>>     <p>Backtick: <code>\</code>`</p>
>>> [...]
>> [snip]
>
> Regardless of how much look-ahead most parsers currently use, do  
> you disagree with my interpretation?
I disagree.
> If so, can provide a more formal definition of how you believe the  
> spec should be read?
The code block start with a certain number of consecutive backticks  
and end with the same number of those backticks, an no more. This  
means that if you need to get a one-backtick code span you can write  
it this way: `` ` ``; and to get a five-backtick code span you can  
write this: ` ````` `. A space as the first or last character of the  
code span gets ignored.
Note that this way you don't need 11 backticks around a code span  
containing a run of 10 backticks somewhere in it. Your interpretation  
of the syntax would require that:
     (mine)   ` `````````` `
     (your's) ``````````` `````````` ```````````
> Basically I read it as code-spans can be matched using this regexp:  
> (`+) ?.*? ?\1
That's mostly it. For reference, this is the regex from PHP Markdown:
     {
         (?<!\\)     # Character before opening ` can't be a backslash
         (`+)        # $1 = Opening run of `
         (.+?)       # $2 = The code block
         (?<!`)
         \1          # Matching closer
         (?!`)
     }xs
It looks pretty much the same as yours, except there is a one- 
character look-ahead and another one-character look-behind around the  
closing run of backticks to ensure the marker is indeed the same  
length as the opening one. Also, leading and taling spaces are taken  
care of in the callback instead.
(There's also a check for a backslash at the start, although I just  
realised that this needs work as it doesn't give a correct result for  
an escaped litteral backslash like this: \\`code`.)
> Although in practice we may 1) require at least one character  
> inside the code-span (so `` on its own is not a zero-character code  
> span)
Indeed, there needs to be at least one character inside a code span  
otherwise you wouldn't be able to differentiate the opening run of  
backticks from the closing one. If the sole character is a space  
character, it will get stripped so you can still make empty code spans.
> and 2) we may want to limit them to “markdown paragraphs” which are  
> roughly defined as ending when there are two consecutive newlines,  
> making the pattern: (`+) ?(.|\n(?!\n))+? ?\1
Well, that rule will work in the general case, although if your  
paragraph is inside a blockquote it may become trickier:
     > Paragraph `code?
     >
     > Paragraph `end of code ?
This should result in no code span, although there is technically no  
completely blank line between the two. In your expression, I'd try  
replacing the pattern matching the blank line with something that can  
vary depending on the context; it's the only way it can scale to  
nested block elements, I think.
Also note that code spans are allowed in headers and list items  
(including those with span-level content) which have different block- 
ending rules:
     Header `code?
     -------------
     ### Header `end of code? ###
     Paragraph.
Those not-separated-by-a-blank-line headers are not really  
documented, and John has said he's considering getting rid of that  
which probably explains why they aren't. Perhaps they're not worth  
supporting (and this example is certainly ugly), but currently, using  
Markdown.pl and PHP Markdown, this will parse as you'd read it: two  
headers, one paragraph, no code span.
As for list items, I think this should constitute two list items, as  
current Markdown.pl and PHP Markdown do:
     *   List item `code?
     *   List item `end of code?
There's nothing explicit in the code about that, but I still think it  
makes sense. The logic being that while glancing at the document it  
looks like two list items, so it should really be two list items.
Going a little further, this one is trickier:
     *   List item `code?
         *   List item `end of code?
Markdown.pl gives completely bogus output while trying to create a  
sublist. PHP Markdown creates the sublist fine and no code span. I'm  
really not sure here whether creating a sublist or a code span is the  
best output. That said, it's certainly the very edge of an edge case.  
If we're to define a formal syntax, let's not start there.
Disclaimer: All this is only *my* interpretation of Markdown. If John  
Gruber decides otherwise, then I'll follow his interpretation instead.
Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/
    
    
More information about the Markdown-Discuss
mailing list