UTF-8 BOM

Michel Fortin michel.fortin at michelf.com
Sat Oct 27 08:55:20 EDT 2007


Le 2007-10-26 à 21:43, Bruce Phillips a écrit :


> I'v encountered what seems to be a bug in PHP Markdown v1.0.1k.

> Given a UTF-8 with a BOM, the first line is not parsed.


Ok, I'll admit PHP Markdown does something wrong with the UTF-8 BOM,
but in my tests the first line *is* parsed: the BOM is seen as
regular text and appear in the first paragraph. That's not, of
course, what the output should be, but it doesn't really match what
you describe either, which is puzzling. Perhaps you're expecting a
header, or a list item, and you're equating a paragraph as "not parsed".

Now, the interesting question is: what should PHP Markdown (or any
Markdown implementation for that matter) do with the UTF-8 BOM? Here
are three options:

1. Remove it?
2. Keep it at the start of the text?
3. Ignore it (as it does now)?

Option 3 seems a logical option to me if I consider the argument to
the Markdown function to be UTF-8 text, which is not always the same
thing as a UTF-8 file (which may include the BOM).

Then again, the Markdown function is often used to parse files --
such as in `$text = Markdown(file_get_contents("file.txt"));` -- and
forcing everyone to add special logic for the UTF-8 BOM seems a
little silly. So if we want this example to work, options 1 and 2
remains.

Between option 1 and 2, surely option 1 (dropping the BOM) is the
best. Otherwise it'd be hard to concatenate the output with a
template HTML document.

So I'll probably go for option 1. This way it's easy to just read a
file and convert it to Markdown, and then put the result into a
template. I don't expect this change to break anything, since the
current behaviour is already broken and it's the first time I hear of
the issue.

UTF-8 BOM handling sounds like a good thing to add to MDTest too.



> This isn't a major issue for me (I actually included the BOM by

> mistake), but I haven't heard about it on this list before, so I'm

> sharing it now.


Yes, thank you very much for reporting that problem. It's a very
interesting question, even if it isn't a pressing issue.


Michel Fortin
michel.fortin at michelf.com
http://michelf.com/




More information about the Markdown-Discuss mailing list