text/markdown effort in IETF (invite)

Thu Jul 10 07:53:25 EDT 2014

Le 10-juil.-2014 à 1:04, John MacFarlane <jgm at berkeley.edu> a écrit :

> +++ Michel Fortin [Jul 09 14 18:07 ]:
> 
>> Fun fact: PHP Markdown is mostly encoding agnostic. It understands UTF-8 sequences but any byte that is not a valid UTF-8 sequence is treated as a character in itself. It's only relevant when converting tabs into spaces however, and only if you have non-ASCII characters before the tab.
> 
> Small amendment: There are at least two places where the difference
> between utf-8 and latin1 matters:  tab expansion (as you note) and
> reference links, since these are stipulated to be case insensitive.
> (Case conversion is sensitive to the encoding.)

Like Markdown.pl, PHP Markdown will just treat non-ASCII characters in a case-sensitive way so in my case it doesn't matter.

Also, if you want to compare characters in a case-sensitive manner, the most correct way to do it is to use the Unicode Collation Algorithm, not case conversion to lower or uppercase, because some characters can't round-trip (see [german ß]). Then you'll notice that unfortunately Unicode collation is locale dependent (because equivalent characters aren't the same in all locales, see the [turkish ı]). And then you'll realize there's not correct way to do it universally.

 [GERMAN SS]: https://en.wikipedia.org/wiki/ß
 [TURKISH I]: https://en.wikipedia.org/wiki/Turkish_dotted_and_dotless_I

On Babelmark I see that cheapskate 0.1.0.1 understands the first link above -- good job! -- an no one understands the second one.

http://johnmacfarlane.net/babelmark2/?normalize=1&text=Also%2C+if+you+want+to+compare+characters+in+a+case-sensitive+manner%2C+the+most+correct+way+to+do+it+is+to+use+the+Unicode+Collation+Algorithm+--+not+case+conversion+to+lower+or+uppercase+--+because+some+characters+can't+round-trip+(see+%5Bgerman+ß%5D).+Then+you'll+notice+that+unfortunately+Unicode+collation+is+locale+dependent+(because+equivalent+characters+aren't+the+same+in+all+locales%2C+see+the+%5Bturkish+ı%5D).+And+then+you'll+realize+there's+not+really+a+correct+way+to+do+it.%0A%0A+%5BGERMAN+SS%5D%3A+https%3A%2F%2Fen.wikipedia.org%2Fwiki%2Fß%0A+%5BTURKISH+I%5D%3A+https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FTurkish_dotted_and_dotless_I%0A

-- 
Michel Fortin
michel.fortin at michelf.ca
http://michelf.ca