PHP Markdown 1.0.1b2
Tim Pritlove
tim at ccc.de
Sat Nov 27 21:35:50 EST 2004
On 27.11.2004, at 23:57, Ian Gregory wrote:
> On Sat, Nov 27, 2004 at 08:54:04PM -0200, Tim Pritlove wrote:
>>
>> On 27.11.2004, at 20:27, Michel Fortin wrote:
>>
>>> + Fixed problem for links defined with urls that include parens,
>>> e.g.:
>>>
>>> [1]:
>>> http://sources.wikipedia.org/wiki/Middle_East_Policy_(Chomsky)
>>>
>>> "Chomsky" was being erroneously treated as the URL's title.
>>
>> erroneously?
>>
>> As far as I recall, parentheses are not valid characters for URLs.
>
> As I am the one who reported the bug and gave that URL as an example
> I feel the need to comment. I did think that parens might not be
> allowed in a URL, but I was probably either very busy or very lazy
> and made the assumption that the wikipedia would be coded to not
> allow invalid URLs to be entered.
It doesn't as it it isn't the website where you enter the URL, but your
browser. If you want to access the page named "Foo (Bar)" the URL the
browser sends to Wikipedia is:
http://en.wikipedia.org/wiki/Foo%20%28Bar%29
This gets immediately redirected by the Wikipedia website, replacing
the space with an underscore (which is an internal convention that you
can use an underscore instead of a space):
http://en.wikipedia.org/wiki/Foo%20%28Bar%29
"Modern" browsers like Safari actually do translate this URL in the
Location Bar into a readable form so that you actually read:
http://en.wikipedia.org/wiki/Foo_(Bar)
Safari does this also with UTF-8 characters so that pages containing
non-ASCII (and non-URL) characters still read nicely in the location
bar:
http://de.wikipedia.org/wiki/Bär
This is actually:
http://de.wikipedia.org/wiki/B%C3%A4r
But in either case, the characters are not actually valid.
However, I don't want to be too stiff on this as Markdown is meant to
make things _easier_ for us and it is obvious that we somehow expect a
lot of characters to be sort of "valid" to be entered and we don't want
to care about what is actually the real thing. So I would suggest
simply considering whitespace being the only real non-URL character
that does mark an URL to be completed.
So the approach by PHP Markdown actually gets my thumbs up as long as
it correctly replaces the parentheses with %28 and %29. But the next
question is: what about all the other characters, especially non-ASCII
UTF-8 characters? I would expect Markdown to handle this the way Safari
and Firefox are dealing with it (by replacing the UTF-8 characters with
%-encoded UTF-8-bytes).
So: how does this work in the original Markdown perl code? Would you
agree on my view or am I saying something completely dumb?
Greetings
Tim
PS: by the way: thanks for releasing the PHP code! We want to integrate
as soon as it's ready in our PHP-based system (if it's got the right
license, of course)
------
Tim Pritlove, Discordian Evangelist
<mailto:tim at ccc.de> <http://tim.geekheim.de/> <skype://timpritlove>
<aim:timpritlove> <jabber:tim at jabber.berlin.ccc.de>
Project Blinkenlights <http://www.blinkenlights.de/>
------
"Sure it corrupts your files, but look how fast it is!"
More information about the Markdown-discuss
mailing list