Detab should be multi-byte aware?
gruber at fedora.net
Mon Oct 9 18:19:38 EDT 2006
Allan Odgaard <29mtuz102 at sneakemail.com> wrote on 10/9/06 at
> This raises two questions:
> 1. Should Markdown convert tabs to spaces in pre-formated text?
> 2. If yes, should Markdown be aware of multi-byte characters?
> I’d say yes to #1 -- Markdown converts to (X)HTML which
>does not define the tab size, and a good rule of thumb is to
>always convert to spaces before publishing on the net.
For #1, that's exactly why it does it.
> As for #2, Markdown doesn’t know the encoding of the source
> document, so that would mean it can’t really be aware of
> things such as UTF-8 mb sequences, OTOH if it changes my
> pre-formatted text, I would like to have it do the right thing.
If Markdown.pl ever gains explicit support for text encodings, the
rules will be simple: UTF-8 in, UTF-8 out, no exceptions.
This would break the way some people are using it, I'm sure. I
don't really have much sympathy for people who are clinging to
other encodings, though.
I don't think the rules for the syntax (as opposed to the
implementation) need to mention it, though, at least not yet.
I say "yet" because from the get-go I've always considered using
non-ASCII punctuation characters for certain features.
I don't think there's any reason that someone couldn't write a
UTF-8 savvy Markdown implementation using the 1.0 syntax, though.
More information about the Markdown-Discuss