Backtick Hickup
Michel Fortin
michel.fortin at michelf.com
Tue Aug 14 10:45:57 EDT 2007
Le 2007-08-13 à 21:27, Allan Odgaard a écrit :
> On Aug 13, 2007, at 10:20 AM, Michel Fortin wrote:
>
>> Le 2007-08-12 à 23:23, Allan Odgaard a écrit :
>>
>>> I would have expected it to see first two back-ticks, then scan
>>> forward until another two back-ticks are seen (since the open-
>>> token defines the close-token) and thus give this output:
>>>
>>> <p>Backtick: <code>\</code>`</p>
>>> [...]
>> [snip]
>
> Regardless of how much look-ahead most parsers currently use, do
> you disagree with my interpretation?
I disagree.
> If so, can provide a more formal definition of how you believe the
> spec should be read?
The code block start with a certain number of consecutive backticks
and end with the same number of those backticks, an no more. This
means that if you need to get a one-backtick code span you can write
it this way: `` ` ``; and to get a five-backtick code span you can
write this: ` ````` `. A space as the first or last character of the
code span gets ignored.
Note that this way you don't need 11 backticks around a code span
containing a run of 10 backticks somewhere in it. Your interpretation
of the syntax would require that:
(mine) ` `````````` `
(your's) ``````````` `````````` ```````````
> Basically I read it as code-spans can be matched using this regexp:
> (`+) ?.*? ?\1
That's mostly it. For reference, this is the regex from PHP Markdown:
{
(?<!\\) # Character before opening ` can't be a backslash
(`+) # $1 = Opening run of `
(.+?) # $2 = The code block
(?<!`)
\1 # Matching closer
(?!`)
}xs
It looks pretty much the same as yours, except there is a one-
character look-ahead and another one-character look-behind around the
closing run of backticks to ensure the marker is indeed the same
length as the opening one. Also, leading and taling spaces are taken
care of in the callback instead.
(There's also a check for a backslash at the start, although I just
realised that this needs work as it doesn't give a correct result for
an escaped litteral backslash like this: \\`code`.)
> Although in practice we may 1) require at least one character
> inside the code-span (so `` on its own is not a zero-character code
> span)
Indeed, there needs to be at least one character inside a code span
otherwise you wouldn't be able to differentiate the opening run of
backticks from the closing one. If the sole character is a space
character, it will get stripped so you can still make empty code spans.
> and 2) we may want to limit them to “markdown paragraphs” which are
> roughly defined as ending when there are two consecutive newlines,
> making the pattern: (`+) ?(.|\n(?!\n))+? ?\1
Well, that rule will work in the general case, although if your
paragraph is inside a blockquote it may become trickier:
> Paragraph `code?
>
> Paragraph `end of code ?
This should result in no code span, although there is technically no
completely blank line between the two. In your expression, I'd try
replacing the pattern matching the blank line with something that can
vary depending on the context; it's the only way it can scale to
nested block elements, I think.
Also note that code spans are allowed in headers and list items
(including those with span-level content) which have different block-
ending rules:
Header `code?
-------------
### Header `end of code? ###
Paragraph.
Those not-separated-by-a-blank-line headers are not really
documented, and John has said he's considering getting rid of that
which probably explains why they aren't. Perhaps they're not worth
supporting (and this example is certainly ugly), but currently, using
Markdown.pl and PHP Markdown, this will parse as you'd read it: two
headers, one paragraph, no code span.
As for list items, I think this should constitute two list items, as
current Markdown.pl and PHP Markdown do:
* List item `code?
* List item `end of code?
There's nothing explicit in the code about that, but I still think it
makes sense. The logic being that while glancing at the document it
looks like two list items, so it should really be two list items.
Going a little further, this one is trickier:
* List item `code?
* List item `end of code?
Markdown.pl gives completely bogus output while trying to create a
sublist. PHP Markdown creates the sublist fine and no code span. I'm
really not sure here whether creating a sublist or a code span is the
best output. That said, it's certainly the very edge of an edge case.
If we're to define a formal syntax, let's not start there.
Disclaimer: All this is only *my* interpretation of Markdown. If John
Gruber decides otherwise, then I'll follow his interpretation instead.
Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/
More information about the Markdown-Discuss
mailing list