Markdown doesn't always generate XHTML

Ulf Ochsenfahrt ulf at ofahrt.de
Sat Mar 15 06:49:30 EDT 2008


Waylan Limberg wrote:

> Regarding the security issues, I understand your concerns, but there

> are some situations were all document authors are trusted

> (authenticated) users and have a legitimate need for that feature. We

> can't cut them off for everyone else. However, I know that

> Python-Markdown has an option to not allow any html in a document

> (this "safe_mode" can be set to either replace with a customizable

> message, remove completely, or escape the html). Of course, to stay in

> line with the Markdown standard, it is off by default, but very easy

> to turn on in your code. Other implementations may offer a similar

> option.


Yes, there are situations where all document authors are trusted
(authentication isn't trust though), but the fact remains that this
makes markdown completely unusable for anything else. And worse, people
are not made aware of this fact. I only encountered this by coincidence,
because one of my users entered what looked like html tags into the forum.

In summary:
Markdown wasn't designed to handle this situation. Some implementations
provide a 'safe mode' which aims to filter the code either before or
after markdown conversion.


Markdownj (Java, which I've been using) doesn't provide such an option.

Markdown.pl doesn't provide such an option.

Nanoki tries to, and fails (see related mail by Michel Fortin) on:
<script <!--
alert("Hello world!")
</script <>

PHP Markdown has something like this, and it has to be enabled in the
source (?). It fails when no_markup=true and no_entities=false on:
<script>alert('hallo');</script>

Python markdown has such an option and it appears to work for simple
tests. Looking at the code, python markdown apparently creates an XML
document tree and serializes it, making sure that the generated code is
always valid XML (that's a very good design choice if I may say so).

I havn't tried Pandoc, which was also mentioned by John MacFarlane.


Ok, thanks everyone for their input. It remains the potential issue of
some browsers automatically executing javascript when used as a source
URL for an image, and possibly also the use of javascript urls in links.
Other than that I think I'll move forward using the python markdown
implementation via jython or start porting it to java.

Again, thanks everyone for their quick replies.

Cheers,

-- Ulf


More information about the Markdown-Discuss mailing list