Backslash escapes (was: Revised 2005 proposal for meta-data)
Andrea Censi
andrea at censi.org
Fri Jan 5 18:40:51 EST 2007
On 1/4/07, Michel Fortin <michel.fortin at michelf.com> wrote:
> Le 2007-01-01 à 15:25, Andrea Censi a écrit :
> >> [1] Even further, you could allow non-punctuation to be escaped.
> >
> > In a sense, this is the most consinstent way of escaping.
After implementing it, and playing around, I changed my mind about
escaping [a-zA-Z]. It's useless and just confusing.
> > b) \<newline> represents a linebreak
>
> I can't see why this would be better than what we have now. In fact I
> think it's worse as it'll clutter the text version of the document
> unnecessarily; the current double-space syntax means that the
> Markdown-formatted text looks fine by itself, something which is a
> core goal for Markdown.
The problem I find with the current syntax is that I cannot *see*
whether there is the line break.
> > 2) Inside "quoted values", you MUST escape `"`
> > 3) Inside 'quoted values', you MUST escape `'`
>
> But what happens if you don't? If you want to go deep in the corner-
> cases of the syntax I think it'd be more useful to explain what
> parsers have to do when they encounter that rather than tell the
> author what not to write.
At one point, you have to decide what is legal and what is not in a
language. And, if it's not legal, then the behaviour is
implementation-dependent.
Just like HTML: it's very clear what is a legal HTML document.
However, even though browser do their best to sanitize illegal
documents, their behaviour in that case isn't specified by the spec.
> > I would tend to drop the special case
> >> [text](url "title"with"quotes")
> > as it is ambiguous.
>
> Drop it and replace it with what output? I agree that it has some
> ambiguities, but it's not that bad really, especially when parsing
> with regular expressions.
My personal point is that, to support that kind of syntax, I had to
write a function that it's the only ugly one in my shiny new
recursive-descent parser.
Also - but I reckon that it is sort of philosophical matter - it's
really really evil to design a language which contains ambiguities.
This is one case when the implementation (regexp-based system) heavily
influenced the syntax.
> > The first pass of processing the document simply becomes:
> >
> > until eof
....
> > end
>
> Something that sounds odd to me is that you're doing this as the
> first pass of the whole document, yet you don't take into account
> HTML blocks, code blocks and inline HTML tags, but you've thought of
> code spans. It'll have to get much more complicated than that if you
> want to handle escapes as a first pass.
Actually, it worked ok in my first implementation. The trick is to
re-expand the escapes in code blocks or HTML code.
> Why do you want to proceed escapes first anyway?
Assume the input string is
" `code` - \`not code\` - ``code with \` slash-tick `` "
The first pass I did was to replace "\`" with a code outside of the
input range. Let `?` represent that code. The string becomes:
" `code` - ?not code? - ``code with ? slash-tick `` "
now extract code blocks (CB):
CB("code"), "- ?not code? - ", CB("code with ? slash-tick")
and undo the escapes: in strings ? becomes `, in code spans ? becomes \`:
CB("code"), "- `not code` - ", CB("code with \` slash-tick")
I did the same for code blocks and HTML.
It worked, but I don't use this method anymore.
Anyway, to the goal of reaching a compromise, here's the revised
proposal for escaping:
=======
1. No escaping in code spans/blocks.
2. Everywhere else, **all** PUNCTUATION characters **can** be escaped,
and **must** be escaped when they could trigger links, tables, etc.
(punctuation=[^a-zA-Z0-9\s\n])
3. As a rule, quotes **must** be escaped inside quoted values:
* Inside `"quoted values"`, you **must** escape `"`.
* Inside `'quoted values'`, you **must** escape `'`.
* Other examples:
`"bah 'bah' bah"` = `"bah \'bah\' bah"` = `'bah \'bah\' bah'`
`'bah "bah" bah'` = `'bah \"bah\" bah'` = `"bah \"bah\" bah"`
4. There is an exception for backward compatibility, in links/images titles:
[text](url "title"with"quotes")
The exception is not valid for attribute lists and in other
contexts, where you have to use the canonical syntax.
========
As for point 4, my implementation tries its best to parse it, but
warns the user that it's bad manners.
--
Andrea Censi
"Life is too important to be taken seriously" (Oscar Wilde)
Web: http://www.dis.uniroma1.it/~censi
More information about the Markdown-Discuss
mailing list