Interesting Issue

John Gruber gruber at fedora.net
Fri Dec 10 00:21:06 EST 2004


Lou Quillio <public at quillio.com> wrote on 12/09/04 at 10:16am:

> But it can't read your mind without mis-reading mine, ya know?  So
> it has to have rules, and a user has to know them.  Easy, intuitive
> rules, but rules in fact.  Endless cleverness seems to chase "no
> rules" or a synergy of all possible rules.  Not possible.

Right. That's my main concern here.

There's no question that Markdown could be at least a little more
clever in terms of figuring when `*` and `_` are being used
literally, rather than to denote emphasis, even when they aren't
backslash-escaped.

The problem is that such cleverness will be hard to express in terms
of simple syntax rules. And the more clever we try to get, the
harder it will be to explain and remember in terms of writing
syntax. Essentially, the syntax rules would have to be a
plain-English description of the algorithmic cleverness.

The benefit to the current syntax rules is that even though they're
not the least bit clever, they are quite simple. If you write this:

    ... just change $my_var to something different than $your_var.

and then when you publish it you see that there's something wrong --
that you're getting an italicized run from 'var' through '$your',
and the underscores in the variable names are missing -- I think
it's pretty easy to figure out what went wrong.

You slap your head and say, "Oh, right, underscores are used for
emphasis in Markdown, I need to backslash-escape them or use code
spans around the variable names."

I've done this myself, usually with paths or file names with
underscores. The root problem is, as I mentioned in a message
earlier in the thread, that when *writing*, it's natural to use *'s
and _'s as both emphasis _and_ as literal punctuation characters,
just like I've done in this sentence. It's not that Markdown's rule
is hard to remember or understand, it's just that it's not natural
in the flow of writing.

If we do something clever, that's the problem we're trying to solve
-- the root problem of people using these characters as both
emphasis delimiters and as punctuation without even thinking about.

But no matter how clever we are, this is not a problem that can be
solved completely, short of software that actually parses and
*comprehends* written language. Because that's why we, as humans,
can tell the difference between the underscores in
"un_fucking_believable" and "my_function_name" without even thinking
about syntax rules. It's because we comprehend.

Needless to say, Markdown.pl will not be taking the verbal portion
of the SAT anytime soon.

Thus, because we're not going to solve this problem completely,
cleverness might make the overall situation *worse*, because when it
does fail, it might be extremely confusing and difficult to figure
out *why*. For one thing, I'm sure most people never even read the
syntax documentation. Part of the appeal of Markdown is that you can
just look at examples of it and see how to write it yourself.

That said, this idea from Rad Geek is intriguing:

Rad Geek <technophilia at radgeek.com> wrote on 12/09/04 at 4:47am:

> One possibility would be to sacrifice a bit of symmetry and implement  
> separate rules for emphasis by asterisk and emphasis by underscore. Right  
> now we have the same rules for all four of these cases:
> 
> 1.   Asterisk word/phrase emphasis: `You *misheard* me.`
> 2.   Underscore word/phrase emphasis: `You _misheard_ me.`
> 3.   Asterisk emphasis within word: `You *mis*heard me.`
> 4.   Underscore emphasis within word: `You _mis_heard me.`

Because the rules aren't *too* complicated, and this solves the two
most common places where this problem occurs: variable and file
names with mid-word underscores.

The problem is especially bad with file/path names with underscores,
because semantically, I don't think code tags are quite appropriate.
That means you've got to use backslashes, which are ugly and which
are really hard to remember to do when you in the flow of writing.

This might be the right balance, even though it does introduce an inconsistency between `*` and `_`.

-J.G.


More information about the Markdown-discuss mailing list