PHP Markdown 1.0.1b2

Tim Pritlove tim at ccc.de
Sat Nov 27 21:35:50 EST 2004


On 27.11.2004, at 23:57, Ian Gregory wrote:

> On Sat, Nov 27, 2004 at 08:54:04PM -0200, Tim Pritlove wrote:
>>
>> On 27.11.2004, at 20:27, Michel Fortin wrote:
>>
>>> +	Fixed problem for links defined with urls that include parens, 
>>> e.g.:
>>>
>>> 		[1]:
>>> 		http://sources.wikipedia.org/wiki/Middle_East_Policy_(Chomsky)
>>>
>>> 	"Chomsky" was being erroneously treated as the URL's title.
>>
>> erroneously?
>>
>> As far as I recall, parentheses are not valid characters for URLs.
>
> As I am the one who reported the bug and gave that URL as an example
> I feel the need to comment. I did think that parens might not be
> allowed in a URL, but I was probably either very busy or very lazy
> and made the assumption that the wikipedia would be coded to not
> allow invalid URLs to be entered.

It doesn't as it it isn't the website where you enter the URL, but your 
browser. If you want to access the page named "Foo (Bar)" the URL the 
browser sends to Wikipedia is:

    http://en.wikipedia.org/wiki/Foo%20%28Bar%29

This gets immediately redirected by the Wikipedia website, replacing 
the space with an underscore (which is an internal convention that you 
can use an underscore instead of a space):

    http://en.wikipedia.org/wiki/Foo%20%28Bar%29

"Modern" browsers like Safari actually do translate this URL in the 
Location Bar into a readable form so that you actually read:

    http://en.wikipedia.org/wiki/Foo_(Bar)

Safari does this also with UTF-8 characters so that pages containing 
non-ASCII (and non-URL) characters still read nicely in the location 
bar:

	http://de.wikipedia.org/wiki/Bär

This is actually:

	http://de.wikipedia.org/wiki/B%C3%A4r

But in either case, the characters are not actually valid.

However, I don't want to be too stiff on this as Markdown is meant to 
make things _easier_ for us and it is obvious that we somehow expect a 
lot of characters to be sort of "valid" to be entered and we don't want 
to care about what is actually the real thing. So I would suggest 
simply considering whitespace being the only real non-URL character 
that does mark an URL to be completed.

So the approach by PHP Markdown actually gets my thumbs up as long as 
it correctly replaces the parentheses with %28 and %29. But the next 
question is: what about all the other characters, especially non-ASCII 
UTF-8 characters? I would expect Markdown to handle this the way Safari 
and Firefox are dealing with it (by replacing the UTF-8 characters with 
%-encoded UTF-8-bytes).

So: how does this work in the original Markdown perl code? Would you 
agree on my view or am I saying something completely dumb?

Greetings
Tim


PS: by the way: thanks for releasing the PHP code! We want to integrate 
as soon as it's ready in our PHP-based system (if it's got the right 
license, of course)
------
Tim Pritlove, Discordian Evangelist
<mailto:tim at ccc.de> <http://tim.geekheim.de/> <skype://timpritlove>
<aim:timpritlove> <jabber:tim at jabber.berlin.ccc.de>
Project Blinkenlights <http://www.blinkenlights.de/>
------
"Sure it corrupts your files, but look how fast it is!"



More information about the Markdown-discuss mailing list