Backslash escapes (was: Revised 2005 proposal for meta-data)

Michel Fortin michel.fortin at michelf.com
Thu Jan 4 10:33:01 EST 2007


Le 2007-01-01 à 15:25, Andrea Censi a écrit :


>> [1] Even further, you could allow non-punctuation to be escaped.

>

> In a sense, this is the most consinstent way of escaping.


Why so? Either you have a fixed list of what you can escape and what
you cannot, either you can escape everything. Both cases seems pretty
consistent to me.

As an aside, this makes me think that there's a small integration
problem when using Markdown with SmartyPants that makes it necessary
in many cases to double-escape a litteral character so that
SmartyPants ignores it:

\\\\\ two backslashes
\\\`` two backticks
\\-- two hyphens
\\... three dots

Maybe Markdown should convert escaped characters to their numerical
entity equivalent (like SmartyPants does) so that the escape
propagates to SmartyPants too without requiring two more backslashes.



> If the rule holds for every context in the document, than the

> algorithm for interpreting the document is very very simple:

> 1 - first pass: substitute every escaped sequence with placeholders

> representing the literal

> 2 - do regexp/parsing ignoring the escapes

> 3 - substitute the placeholders


Do you expect that rule to hold for code blocks, code spans, HTML
blocks and the content of attributes inside inline HTML tags? There's
a reason you don't need escapes in theses modes: it makes editing
easier as you can copy paste parts of code or HTML markup without
having to bother about the content at all.



> The summary of all this discussion is:

> 1) Everywhere all characters can be escaped (except in code blocks)

> a) "\ " represents a non-breaking space


Personally, I strongly disagree with overloading the escape mechanism
to make it replace the following character by something else. I don't
think Markdown needs a special syntax for non-breaking spaces.


> b) \<newline> represents a linebreak


I can't see why this would be better than what we have now. In fact I
think it's worse as it'll clutter the text version of the document
unnecessarily; the current double-space syntax means that the
Markdown-formatted text looks fine by itself, something which is a
core goal for Markdown.


> 2) Inside "quoted values", you MUST escape `"`

> 3) Inside 'quoted values', you MUST escape `'`


But what happens if you don't? If you want to go deep in the corner-
cases of the syntax I think it'd be more useful to explain what
parsers have to do when they encounter that rather than tell the
author what not to write.



> I would tend to drop the special case

>> [text](url "title"with"quotes")

> as it is ambiguous.


Drop it and replace it with what output? I agree that it has some
ambiguities, but it's not that bad really, especially when parsing
with regular expressions. I do have a worry about this and it has to
do with extending what can be expressed in the parenthesis, but
that's not a big worry for me and I don't find the current syntax so
much ambiguous.



> The first pass of processing the document simply becomes:

>

> until eof

> c = getc

> if c == '\'

> push literal(getc)

> else if c == backtick `

> count the number of backticks

> possibly, eat one space

> threat as literals everything until closing backtics

> else

> push literal(getc)

> end

> end


Something that sounds odd to me is that you're doing this as the
first pass of the whole document, yet you don't take into account
HTML blocks, code blocks and inline HTML tags, but you've thought of
code spans. It'll have to get much more complicated than that if you
want to handle escapes as a first pass.

Why do you want to proceed escapes first anyway?


Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/




More information about the Markdown-Discuss mailing list