UTF-8 BOM

Michel Fortin michel.fortin at michelf.com
Sat Oct 27 11:53:33 EDT 2007


Le 2007-10-27 à 9:23, Allan Odgaard a écrit :


> I’d say no -- on the contrary, if the user adds a BOM to his UTF-8

> file he should be told that this is a bad idea.


Well, the issue then is that the user isn't told when his file
contains a BOM, and it makes the Markdown processor fail silently (if
the first line is a header, or anything other than a paragraph) for
no apparent reason. A knowledgeable person will find the problem, but
many people don't know a dime about encodings, and even less BOMs.


> Fortunately none of the text editors on my system even has this

> option ;)


I hear Windows users aren't as fortunate. The famous Notepad for
instance adds the BOM to UTF-8 documents and there's no way to fix
that. It even add it if you open then save an BOM-less UTF-8
document. I don't think Markdown should fail silently on the first
line for files edited with Notepad. So what should it do?

Perhaps you're right that we should ignore this issue for now. The
fact that I didn't hear about this problem until now looks just like
a confirmation of the rarity of BOMs in UTF-8 text files. But on the
other hand, it's so easy to fix:

$text = preg_repalce('{^\xEF\xBB\xBF}', '', $text);


Michel Fortin
michel.fortin at michelf.com
http://michelf.com/




More information about the Markdown-Discuss mailing list