Conversion of special characters to entities

James Bennett ubernostrum at gmail.com
Wed Mar 9 17:16:50 EST 2005


On Wed, 9 Mar 2005 19:43:54 +0100, Lasar Liepins <lasar at liepins.net> wrote:
> So is this something that would make sense to be built into MarkDown, or
> is there a good reason to not do it?

While it's true that escaping to numeric entities goes a long way
toward preventing character-encoding problems (escaping to named
entities can cause some problems with XHTML, though), I'd say there
are a couple good reasons to stay away from it:

1. For characters such as vowels with umlauts there's little to no
chance of a character-encoding mismatch: these characters are
represented identically in most commonly-used character sets (e.g.,
ISO-8859-1, Windows-1252 and UTF-8).

2. For characters which are not in ISO-8859-1, it is extremely likely
that the document's charset will be UTF-8, which should never require
the use of entities.

3. Keeping these characters "as-is" saves significant disk space and
bandwidth for non-English texts which frequently use accented
characters or characters from other alphabets.

4. Ensuring that the document's character set matches what will be
claimed in its metadata should be the document author's
responsibility, not Markdown's.

And on a personal note, leaving "special" characters as-is was one of
several reasons why I switched to Markdown in the first place
(previously I'd been using Textile, which escapes to numeric
entities); if this did become a "feature" of Markdown I'd appreciate a
way to turn it off.

-- 
"May the forces of evil become confused on the way to your house."
  -- George Carlin


More information about the Markdown-Discuss mailing list