Backslash escapes
Jacob Rus
jrus at hcs.harvard.edu
Sun Jan 7 20:19:29 EST 2007
Andrea Censi wrote:
>>> b) \<newline> represents a linebreak
>> I can't see why this would be better than what we have now. In fact I
>> think it's worse as it'll clutter the text version of the document
>> unnecessarily; the current double-space syntax means that the
>> Markdown-formatted text looks fine by itself, something which is a
>> core goal for Markdown.
>
> The problem I find with the current syntax is that I cannot *see*
> whether there is the line break.
Get a text editor which allows you to color that line break ;)
>>> 2) Inside "quoted values", you MUST escape `"`
>>> 3) Inside 'quoted values', you MUST escape `'`
>> But what happens if you don't? If you want to go deep in the corner-
>> cases of the syntax I think it'd be more useful to explain what
>> parsers have to do when they encounter that rather than tell the
>> author what not to write.
>
> At one point, you have to decide what is legal and what is not in a
> language. And, if it's not legal, then the behaviour is
> implementation-dependent.
No, that's a bad way to go about it. The edge-case behavior should be
clearly defined, and not left up to implementations.
> Just like HTML: it's very clear what is a legal HTML document.
> However, even though browser do their best to sanitize illegal
> documents, their behaviour in that case isn't specified by the spec.
Yes, and look at all the problems that has caused for web authors aiming
for cross-browser compatible sites.
>>> I would tend to drop the special case
>>>> [text](url "title"with"quotes")
>>> as it is ambiguous.
>> Drop it and replace it with what output? I agree that it has some
>> ambiguities, but it's not that bad really, especially when parsing
>> with regular expressions.
>
> My personal point is that, to support that kind of syntax, I had to
> write a function that it's the only ugly one in my shiny new
> recursive-descent parser.
>
> Also - but I reckon that it is sort of philosophical matter - it's
> really really evil to design a language which contains ambiguities.
> This is one case when the implementation (regexp-based system) heavily
> influenced the syntax.
You'll have to explain the ambiguity here a little bit. I'm not really
clear on what the syntax allows, as I don't ever use separate link
titles, so maybe someone can fill that in as well?
> Anyway, to the goal of reaching a compromise, here's the revised
> proposal for escaping:
>
> =======
>
> 1. No escaping in code spans/blocks.
>
> 2. Everywhere else, **all** PUNCTUATION characters **can** be escaped,
> and **must** be escaped when they could trigger links, tables, etc.
> (punctuation=[^a-zA-Z0-9\s\n])
>
> 3. As a rule, quotes **must** be escaped inside quoted values:
>
> * Inside `"quoted values"`, you **must** escape `"`.
> * Inside `'quoted values'`, you **must** escape `'`.
Yes, this all sounds reasonable to me. The tricky part is that number 2
isn't always completely cut and dried, especially not given the
heuristic regexp replacement method of the current markdown.pl. I
suppose that's what you're aiming here to fix, though.
Incidentally, is anyone interested at all in discussions on any of the
following:
1. Footnotes
2. Tables
3. A more formalized extension mechanism
The first two of those have lengthy archived discussions which could use
someone summarizing them for the rest of us. I plan on taking that up
at some point in the nearish future if no one else will. The last would
be really nice, for adding things like TeX-formatted math, or lilypond
formatted music, or alternate table syntaxes, or whatever else, for
people running into markdown's limitations and not wanting to just use
raw html. I think that curly braces are still available for such a use.
-Jacob
More information about the Markdown-Discuss
mailing list