on the philosophical aspects of a specification

James Grimmelmann james at grimmelmann.net
Fri Mar 7 07:58:20 EST 2008


On Mar 7, 2008, at 2:45 AM, Yuri Takhteyev wrote:


>>> *hello **dear* boy**

>>

>> That's a very good question. Here's a counterquestion: what does

>> a human reader see in that text?

>

> When I try to look at this with my normal-person eye, what I see here

> is incorrect markup, which I then want to leave it as is and move on.

> When I look at it with my formalistic left-parsing eye, I see

> "<em>hello **dear</em> boy**". When I look at it with my reg-exps in

> a loop eye, I see "*hello <strong>dear* boy</strong>". Either one of

> those is ok with me. Let's just pick one. Everything else is from

> the devil, I say. Please, let's keep it simple.

>

> So, the user will type in something like this and get "<em>hello

> **dear</em> boy**". Not much of a tradegy. They will say, oh, silly

> me, must have screwed something up. (They did!) Then they'll go and

> fix it. I am all for flexibility, but not to the point of trying to

> divine the meaning of ambiguous or ill-formed markup.


I strongly agree. (Or is that emphatically agree?)

In this context, if the output is
<em>hello **dear</em> boy**
or
*hello <strong>dear* boy</strong>
and the user never notices the problem, the output is still readable.
If the user does notice that something's wrong, it's easy to look at
the output and realize roughly what happened. ("Some the *s must not
have matched up correctly, because they made it through into the
output.") Either of these solutions is a healthy response by the
parser.

I would prefer "formalistic left-parsing" to "reg-exps in a loop"
because (a) working from left to right is closer to the informal
intuitive model that most users are likely to have as to how the text
transformations work, and (b) if we actually specify a grammar, the
more we stick to "formalistic left-parsing," the cleaner it will be.


> I don't think it really matters what we output for cases like this. I

> think any rule would be ok, as long as it satisfies the following

> criteria:

>

> 1. It's _simple_

> 2. It always produces valid XHTML (unless input has HTML tags)

> 3. It should produce appropriate HTML for "normal" markdown.

>


I agree, though I might have said it ought to be **simple**.

James


> My reg-exp eye says: "strong" before "em" (longer pattern first),

> starting from the right for each. I am pretty sure this rule

> satisfies 1, 2, and 3.

>

> Let's stop this non-sense and get back to defining a spec for the

> _normal_ markdown.

>

> - yuri

>

> --

> http://sputnik.freewisdom.org/

> _______________________________________________

> Markdown-Discuss mailing list

> Markdown-Discuss at six.pairlist.net

> http://six.pairlist.net/mailman/listinfo/markdown-discuss




More information about the Markdown-Discuss mailing list