Bug: INS/DEL in block context (1.04b)

John Gruber gruber at fedora.net
Thu Apr 29 19:48:30 EDT 2004


Jay Allen <markdown at openwire.com> wrote on 04/28/04 at 5:14p:

> 2) Hack the Markdown source code on line 289 (v1.04b) from:
> 
>          my $block_tag_re = 
> qr/p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script/;
> 
> to
>          my $block_tag_re = 
> qr/p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|ins|del/;
> 
> #2 may also have some unintended consequences since it seems that it 
> will always treat an ins or del as a block-level element.

I think I see a way around this. Right now, Markdown performs two
searches for block-level HTML tags. The first tries to be a little
smart, and looks for nested instances of the outermost tag. Thus
it'll match the following as a single block:

    <div>
        <div>
        blah
        </div>
    </div>

The second search is very naive, and just matches from `<tag>` to
`</tag>` for any block-level tag. Thus, if the above nested div
hadn't already been escaped by the first match, the second match
would incorrectly match from the first `<div>` to the first
`</div>`, which would leave the closing `</div>` just hanging there.

Right now, both of these matches use the same pattern to identify
block level tags -- the `$block_tag_re` variable mentioned above.

I'm pretty sure that we can Do The Right Thing most of the time with
`<ins>` and `<del>` if we use two different patterns for the two matches. 
For the first match, we'll use:

    qr/p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|ins|del/;

For the second match, we'll use:

    qr/p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|/;


This does not solve the problem completely. But, it solves it about
as well as Markdown deals with other inline HTML blocks. The problem
is simply that Markdown.pl is a very simplistic HTML parser. Someday
I'd like to fix this, but not now.

With I've just implemented, Markdown will turn the following input:

    <ins>
    Blocky.
    </ins>
    
    <ins>Spanny.</ins>

into:

    <ins>
    Blocky.
    </ins>
    
    <p><ins>Spanny.</ins></p>

I.e., if you put the ins (or del) tags on lines by themselves,
starting at the left margin, then Markdown.pl will treat them as
block-level tags. Otherwise, they're treated as span-level tags.

* * *

This unintelligence of Markdown.pl's HTML parser is something I'd
like to document. The spec for Markdown, the formatting language,
should specify that a truly compliant implementation will Do The
Right Thing in more circumstances than Markdown.pl, my
implementation, currently handles.

* * *

I've been really, really busy with other things so far this month,
and I've gotten out of the swing with Markdown. I'm going to package
up my current development build as a beta, but I'm only going to
publish it here on this list. Stand by.

-J.G.


More information about the Markdown-discuss mailing list