Adding a "Safe" option?

Jason Clark jason at jclark.org
Mon May 3 23:51:31 EDT 2004


On May 1, 2004, at 12:15 AM, Jason Clark wrote:

> If we accept the the correct way to handle user-HTML (non-code stuff) 
> is to encode it instead of stripping it, then this becomes much 
> easier.  Right now, the meat of code-handling is _EncodeCode, which 
> takes a chunk of text (could be a block or span), converts `< > &` to 
> `&lt; &gt; &amp;` respectively, and then escapes Markdown special 
> characters like `* _ [ ]` etc.  How about when in Comment mode, we do 
> the first part (convert `< > &`) before we do anything else, and then 
> we skip that part in _EncodeCode.  This way, *all* html tags, whether 
> in a code span/block or not, get encoded, and code-span/block html 
> doesn't get double-encoded.
>
> I think I'll experiment with this implementation, probably tommorow.  
> I'm only about 85% convinced I want random user HTML encoded instead 
> of discarded, but it's a real start.
>

I gave this a shot and it works very nicely.  I hacked the changes into 
the Beta 4 code, going for speed over style.  Basically I just added an 
optional param to Markdown() that when true enables "no-user-html" mode 
(for lack of another name).  If the option is set, the following code 
from _EncodeCode:

         # Encode all ampersands; HTML entities are not
         # entities within a Markdown code span.
         s/&/&amp;/g;

         # Do the angle bracket song and dance:
         s! <  !&lt;!gx;
         s! >  !&gt;!gx;

Is run during Markdown() prior to _HashHTMLBlocks(), and is not run 
during _EncodeCode.  So far it's working wonderfully.  HTML inside of 
`code spans` or code blocks is treated as always; and other HTML is 
encoded as well.  HTML generated by Markdown is unaffected.  All other 
features appear to be working as normal.  Best of both worlds for my 
needs (rendering user comments).

Jason Clark <jason at jclark.org>
http://jclark.org/weblog/



More information about the Markdown-discuss mailing list