Code blocks and backslashes

John Gruber gruber at fedora.net
Sun Dec 12 23:40:05 EST 2004


Mark Lawrence <lawrence at unified-eng.com> wrote on 11/01/04 at 11:20am:

> I've run across a problem with code blocks not being literal when the
> code contains certain backslash sequences. I had a code block that
> contained the line
> 
>     \\l'3.5i'
> 
> (it's part of a Perl here-document that's writing out troff code). When
> Markdown created the HTML, the double-backslash was converted to single.
> 
> Tracing the code, I find that the _EncodeBackslashEscapes subroutine is
> the culprit. It's called by _EscapeSpecialChars, which, in turn, is
> called by Markdown just before _RunBlockGamut. _EncodeBackslashEscapes
> makes 13 transformations of backslash sequences, including \\.
> 
> This means that code blocks that include any of those 13 character
> sequences have to be altered, adding extra backslashes. I think this
> contradicts the basic idea of having the HTML version and the text
> version look the same. It also increases the likelihood of errors in
> code blocks and prevents the straightforward cutting and pasting of
> working code.

This really bothered me, so I decided to change this for the next
release of Markdown. (This is why I never released 1.0.1 earlier
last week, as I previously indicated I would.)

I've changed this by moving the call to `_EscapeSpecialChars()` from
the top-level `Markdown()` sub into `_RunSpanGamut()`. The old way
meant that backslash escapes were processed very early, before any
other constructs such as lists, paragraphs, or code blocks were
parsed. They weren't processed inside inline HTML tags, but they
were processed everywhere else, including inside Markdown code spans
and code blocks.

The downside to this is that code spans and blocks weren't really
WYSIWYG (or what passes for "WYSIWYG" in terms of plain text). To
understand what was going on, you had to know about the 13 special
backslash-escapable character sequences in Markdown (which is now up
to 15, by the way), which in turn pretty much meant you had to read
Markdown's syntax documentation to the very end.

In practice, I don't think this caused problems frequently. But in
those times when you wanted a code block (or span) which contained
one of those sequences, it was really annoying. For example, Mark's
troff example above. Or, an example I've run into personally, when
you want to insert a code sample that includes regular expressions.
Something like

    \*

is common in regex patterns, for any instance when you want to match
a real asterisk. In Markdown, you had to remember to write it as:

    \\*

which is confusing and hard to remember. And even if you do
understand it, and remember to do it, it prevented you from simply
pasting the code snippet into the article. You had to go through and
escape all the backslashes.

There were a couple of reasons why I did it this way originally. A
big one was is that you could use "\`" inside a code span to get a
literal backtick. However, this isn't necessary, since you can also
use multiple backticks as delimiters:

    `` `this is backtick-quoted` ``

turns into:

    <code>`this is backtick-quoted`</code>

So we don't *have* to support this:

    `\`this is backtick-quoted\``

It was also the case that you used to be able to create link
definitions and indent them as much as you want. Thus you could
write this segment:

---

This is a paragraph with a [link] [1].

    [1]: http://example.com/  "This link definition is indented 4 spaces"

---

And the link would work. I changed that earlier during the 1.0.1
beta cycle, such that link definitions are now required to fall
within 3 spaces of the left margin. The link definition in the
preceding example will now be turned into a code block.

With the old rules, however, I had to allow for backslash escapes
with code blocks so that I could write the Markdown syntax
documentation in Markdown itself. Within the documentation, example
link definitions looked like this:

    \[1]: http://example.com/  "Title"

But the leading backslashes didn't appear in the output -- they were
there only to keep them from being treated as actual, rather than
example, link definitions.

That state of affairs was just terrible, and really
un-Markdown-like. Thinking about it this week, I'm surprised I
didn't get more complaints about this "feature". Code blocks and
spans should be completely literal, and now, starting with the next
release, they are.

This means, however, that any old code blocks and spans written for
use with previous version of Markdown are now broken. My sincere
apologies to anyone who has to go back and fix old content to work
with the new rules. But there's no doubt in my mind that this is the
right thing for Markdown to do.

I don't think this is a common situation, because it only affects
code blocks and spans, and only then when they contain 13 specific
backslash-escape sequences, most of which aren't likely to occur in
the wild.

But common or not, it's a significant change in that it definitely
breaks the old syntax rules. I've written a few test cases and this
change doesn't seem to have any other unwanted side effects, but
everyone who tries this version ought to double-check anything they
write (or have written in the past) involving backslashes.

(1.0.1b8 will be posted shortly, with this change...)

-J.G.



More information about the Markdown-discuss mailing list