Detab should be multi-byte aware?
    John Gruber 
    gruber at fedora.net
       
    Mon Oct  9 18:19:38 EDT 2006
    
    
  
Allan Odgaard <29mtuz102 at sneakemail.com> wrote on 10/9/06 at 
11:02 PM:
> This raises two questions:
>    1. Should Markdown convert tabs to spaces in pre-formated text?
>  2. If yes, should Markdown be aware of multi-byte characters?
>   I’d say yes to #1 -- Markdown converts to (X)HTML which 
>does not define the  tab size, and a good rule of thumb is to 
>always convert to spaces before  publishing on the net.
For #1, that's exactly why it does it.
> As for #2, Markdown doesn’t know the encoding of the source
> document, so that would mean it can’t really be aware of
> things such as UTF-8 mb sequences, OTOH if it changes my
> pre-formatted text, I would like to have it do the right thing.
If Markdown.pl ever gains explicit support for text encodings, the
rules will be simple: UTF-8 in, UTF-8 out, no exceptions.
This would break the way some people are using it, I'm sure. I
don't really have much sympathy for people who are clinging to
other encodings, though.
I don't think the rules for the syntax (as opposed to the
implementation) need to mention it, though, at least not yet.
I say "yet" because from the get-go I've always considered using
non-ASCII punctuation characters for certain features.
I don't think there's any reason that someone couldn't write a
UTF-8 savvy Markdown implementation using the 1.0 syntax, though.
-J.G.
    
    
More information about the Markdown-Discuss
mailing list