Inline styles mystery

Michel Fortin michel.fortin at michelf.com
Thu Dec 9 20:48:37 EST 2004


Le 9 déc. 2004, à 14:20, Lou Quillio a écrit :

> I've stumbled onto some curious Markdown behavior today, while
> trying to figure why Drupal's MarkSmarty module is educating quotes
> around explicit-HTML attributes (it applies SmartyPants after
> Markdown, I'm using Michel's current php modules, and MarkSmarty
> applies them as-is).  Anyhow ...

"Current" as in 1.0.1b7 I presume, not 1.0?

Ok, I have absolutely not explanation for the "medium" thing or the  
lost tailing slash and anything I tried failed to replicate your  
problem.

But I found a bug present since 1.0.1b2 that would lead to an incorrect  
behaviour with this. In fact, it's a pretty big bug and I am glad to  
find it *now*, before the official 1.0.1 release. Maybe somehow it is  
related to your problem, I don't know.

The new regex in `_TokenizeHTML` was pretty bad... In fact it could  
only match tags that:

1.	Has no empty attribute like this one `alt=""`.
2.	Has a whitespace after the name of the tag, unlike this: `<h1>`.
3.	Has a tag name consisting of only one letter!... like `<a  
href="test">`.

As you can see, it was so bad I wonder how it made it there. The only  
explanation I have is that I tested it with tags like `<a href="test">`  
and used an old version of the regex... because I knew the first  
version of it I made had this problem too.

By the way, something interesting to note is that PHP Markdown and PHP  
SmartyPants both share the same `_TokenizeHTML` function. By that I  
mean that the function is defined only once when the files parsed -- by  
the first file included. So if there is a bug in Markdown version of  
the function, it can "propagate" to PHP SmartyPants too. This is  
exactly what happens on the dingus. If I use the "both" filter I get  
this "img" tag:

	<img
	src=&#8221;http://www.michelf.com/img/photo/michel-fortin- 
arbre.jpg&#8221;
	alt=&#8221;&#8221; width=&#8221;155&#8221; height=&#8221;155&#8221;  
style=&#8221;	float:right; clear:right;
	border:none; padding:1em;&#8221; />

Markdown is included first by the dingus so it's version of  
`_TokenizeHTML` is used by both, even if the version in PHP SmartyPants  
does not have this bug. If I select PHP SmartyPants alone, the problem  
does not occur.

By the way, I planned an update to PHP SmartyPants at the same time PHP  
Markdown 1.0.1 is released so that the change to _TokenizeHTML happens  
in both.

In the meantime, expect 1.0.1b7.1 soon.

> The image tag is inline HTML.  Shouldn't it be literally untouched
> by Markdown (hands-off my trailing slash)? And I was very surprised to  
> see it reaching into my style attribute.

I am too. Are you sure it's really Markdown that does that? Does this  
happens on the dingus? I can't replicate that.

> Also, shouldn't the
> final output be identical to the HTML Source output?  Or is there
> something I'm not getting?

I don't see anything different in the HTML source and the real HTML  
source of the page. I'm not sure what I can do about this, but it  
should be the same.



Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/


More information about the Markdown-discuss mailing list