text/markdown effort in IETF (invite)
Sean Leonard
dev+ietf at seantek.com
Thu Jul 10 05:00:32 EDT 2014
On 7/9/2014 10:04 PM, John MacFarlane wrote:
> +++ Michel Fortin [Jul 09 14 18:07 ]:
>
>> Fun fact: PHP Markdown is mostly encoding agnostic. It understands
>> UTF-8 sequences but any byte that is not a valid UTF-8 sequence is
>> treated as a character in itself. It's only relevant when converting
>> tabs into spaces however, and only if you have non-ASCII characters
>> before the tab.
>
> Small amendment: There are at least two places where the difference
> between utf-8 and latin1 matters: tab expansion (as you note) and
> reference links, since these are stipulated to be case insensitive.
> (Case conversion is sensitive to the encoding.)
I haven't tried it yet, but I suspect PHP Markdown is mostly encoding
agnostic only for most encodings that preserve the US-ASCII range. Try
feeding it an EBCDIC-encoded file. The 0x20-0x3F codes in EBCDIC are not
even printable characters! :)
And speaking of UTF-8, fun fact: there is a UTF-EBCDIC encoding that
represents the whole Unicode repetoire in EBCDIC. See
<http://en.wikipedia.org/wiki/UTF-EBCDIC> and UTR #16.
-Sean
More information about the Markdown-Discuss
mailing list