text/markdown effort in IETF (invite)

Sean Leonard dev+ietf at seantek.com
Thu Jul 10 05:00:32 EDT 2014


On 7/9/2014 10:04 PM, John MacFarlane wrote:
> +++ Michel Fortin [Jul 09 14 18:07 ]:
>
>> Fun fact: PHP Markdown is mostly encoding agnostic. It understands 
>> UTF-8 sequences but any byte that is not a valid UTF-8 sequence is 
>> treated as a character in itself. It's only relevant when converting 
>> tabs into spaces however, and only if you have non-ASCII characters 
>> before the tab.
>
> Small amendment: There are at least two places where the difference
> between utf-8 and latin1 matters:  tab expansion (as you note) and
> reference links, since these are stipulated to be case insensitive.
> (Case conversion is sensitive to the encoding.)

I haven't tried it yet, but I suspect PHP Markdown is mostly encoding 
agnostic only for most encodings that preserve the US-ASCII range. Try 
feeding it an EBCDIC-encoded file. The 0x20-0x3F codes in EBCDIC are not 
even printable characters! :)

And speaking of UTF-8, fun fact: there is a UTF-EBCDIC encoding that 
represents the whole Unicode repetoire in EBCDIC. See 
<http://en.wikipedia.org/wiki/UTF-EBCDIC> and UTR #16.

-Sean




More information about the Markdown-Discuss mailing list