text/markdown effort in IETF (invite)

Michel Fortin michel.fortin at michelf.ca
Thu Jul 10 07:53:25 EDT 2014

Le 10-juil.-2014 à 1:04, John MacFarlane <jgm at berkeley.edu> a écrit :

> +++ Michel Fortin [Jul 09 14 18:07 ]:
>> Fun fact: PHP Markdown is mostly encoding agnostic. It understands UTF-8 sequences but any byte that is not a valid UTF-8 sequence is treated as a character in itself. It's only relevant when converting tabs into spaces however, and only if you have non-ASCII characters before the tab.
> Small amendment: There are at least two places where the difference
> between utf-8 and latin1 matters:  tab expansion (as you note) and
> reference links, since these are stipulated to be case insensitive.
> (Case conversion is sensitive to the encoding.)

Like Markdown.pl, PHP Markdown will just treat non-ASCII characters in a case-sensitive way so in my case it doesn't matter.

Also, if you want to compare characters in a case-sensitive manner, the most correct way to do it is to use the Unicode Collation Algorithm, not case conversion to lower or uppercase, because some characters can't round-trip (see [german ß]). Then you'll notice that unfortunately Unicode collation is locale dependent (because equivalent characters aren't the same in all locales, see the [turkish ı]). And then you'll realize there's not correct way to do it universally.

 [GERMAN SS]: https://en.wikipedia.org/wiki/ß
 [TURKISH I]: https://en.wikipedia.org/wiki/Turkish_dotted_and_dotless_I

On Babelmark I see that cheapskate understands the first link above -- good job! -- an no one understands the second one.


Michel Fortin
michel.fortin at michelf.ca

More information about the Markdown-Discuss mailing list