[ANN] Markdown 1.0.2b1

John Gruber gruber at fedora.net
Mon Feb 28 14:02:36 EST 2005


Markdown 1.0.2b1 is available for download and testing:

  <http://daringfireball.net/projects/downloads/Markdown_1.0.2b1.zip>

So far, 1.0.2 is very narrowly focused on just two bugs. The first
is rather simple, and has apparently been around all along. If you
had pairs of backticks inside an HTML tag attribute, like this:

    <span attr="text with `backticks` here"> foo </span>

those backticks would end up being turned into `<code>` spans.
Wrong, of course. However, as far as I'm aware, no one has ever
encountered this bug in the field, because I'm not sure there's any
practical reason to put backticks in a tag attribute. It's an easy
fix, however.

The second bug is that backslash-escaped backticks were not being
treated as raw backtick characters, but instead were still
triggering code spans. For example:

    This was a problem:  \`escaped backticks\`

That should have produced:

    <p>This was a problem:  `escaped backticks`</p>

but instead was producing:

    <p>This was a problem:  \<code>escaped backticks\</code></p>


This bug crept into 1.0.1 when I changed the rules for code spans
and blocks, so that their contents were always treated literally. In
Markdown 1.0 and prior, some backslash escape sequences were
processed inside code spans and blocks. This was a royal pain in the
ass if you needed to include code samples that contained
backslashes, which is actually somewhat common.

There's a new archive of MarkdownTest which contains tests for these
cases:

  <http://daringfireball.net/projects/downloads/MarkdownTest_1.0.2.zip>

The test script is unchanged, only the tests are updated.



Explanation of Changes, for Those Who Care
------------------------------------------

In the old code, I had a routine called _EscapeSpecialChars(). The
gist of this routine is that it separates everything into tags and
non-tags. For example, given this:

    <span attr="*asterisks*>  *italics*  </span>

"<span attr="*asterisks*>" is a tag.

"  *italics*  " is a non-tag.

</span> is a tag.

For tags, we protect special characters that have meaning in
Markdown, such as * and _. For non-tags, we process backslash escape
sequences, such as "\*", which is what you use if you want a literal
asterisk in non-tag text.

The way _EscapeSpecialChars() used to work is that we look at each
of these individual tokens, and for each that was a tag, we'd do the
protect special characters thing, and for each that was a non-tag,
we'd process backslash escape sequences.

Originally, in Markdown 1.0 and prior, we ran _EscapeSpecialChars()
before we ran _DoCodeSpans(). We did this so we could escape
backticks so they could be used literally. In Markdown 1.0.1, we
started running _DoCodeSpans(), so that backslashes within code
spans were treated literally instead of as backslash escape
sequences.

This is where the bug came in, because now we weren't able to
backslash-escape the backticks themselves.

We still need to do code spans before we do backslash escapes, so as
to avoid changing backslashes in code examples, but we need to do
_EscapeSpecialChars() before we run _DoCodeSpans(), because we need
to protect backticks in tag attributes beforehand.

So:

1a. Added something to _EscapeSpecialChars() to also protect
    literal backslashes (\) within tag attributes

1b. Added something to _EscapeSpecialChars() to protect literal backticks (`)

1c. Renamed _EscapeSpecialChars() to _EscapeSpecialCharsWithinTagAttributes()
    because that's all that it's doing now.

2.  Moved the backslash processing code to its own step, after
    _EscapeSpecialCharsWithinTagAttributes() and _DoCodeSpans() have run

3.  Changed the pattern for code spans to make sure the opening `
    isn't preceded by a backslash

    (Edge case: if you really need a literal '\' followed
    immediately by a <code> tag with no whitespace in between,
    you'll need to use HTML tags instead of Markdown.)



More information about the Markdown-Discuss mailing list