on the philosophical aspects of a specification

Aristotle Pagaltzis pagaltzis at gmx.de
Fri Mar 7 05:17:24 EST 2008

Hi Yuri, Weylan and Seumas,

* Yuri Takhteyev <qaramazov at gmail.com> [2008-03-07 08:50]:

> > > *hello **dear* boy**

> >

> > That's a very good question. Here's a counterquestion: what

> > does a human reader see in that text?


> When I try to look at this with my normal-person eye, what I

> see here is incorrect markup

Sorry, but if you see “markup” (much less “incorrect markup”)
you’re not looking at it with a normal-person eye. :-)

> So, the user will type in something like this and get

> "<em>hello **dear</em> boy**". Not much of a tradegy. They will

> say, oh, silly me, must have screwed something up. (They did!)

> Then they'll go and fix it. I am all for flexibility, but not

> to the point of trying to divine the meaning of ambiguous or

> ill-formed markup.

Only a small minority will do that. Most people most of the time
don’t care enough about that particular piece of text to actually
fix any small nits in it, any more than they’ll care to fix all
of their small spelling and grammar mistakes. (Less, actually.)
That has certainly been my experience on wikis and weblogs that
use shorthand markups like Markdown.

Hence my bias in favour of trying to divine *some* meaning from
anything that looks like markup somehow, as long as there is any
chance at all that the result won’t be actively contrarian to the
user’s wishes.

> I think any rule would be ok, as long as it satisfies the

> following criteria:


> 1. It's _simple_

I think the full rule needn’t be simple; as long as edge cases
don’t produce actively contrarian results, it’s OK not to mention
how they’re resolved in the interest of simplification.

> My reg-exp eye says: "strong" before "em" (longer pattern

> first), starting from the right for each. I am pretty sure this

> rule satisfies 1, 2, and 3.

So the spec is going to make assumptions about the method of

> Let's stop this non-sense and get back to defining a spec for

> the _normal_ markdown.

What’s the point? We already have one; John Gruber wrote it.
Interoperability problems crop up in the edge cases, not the
unproblematic stuff. That is what’s important to specify.

* Waylan Limberg <waylan at gmail.com> [2008-03-06 17:00]:

> Aristotle Pagaltzis wrote:

> > a human reader see in that text? Based on the visual

> > apperance I think I would make it translate to this:

> >

> > <em>hello <strong>dear</strong> boy</em>


> Ah, so your assuming the parser should automatically close

> unclosed tags much as a browser in quirks mode does.

No, a browser in quirks mode would not interpret such a
construction in the way I proposed at all.

> Sure, you and I understand how that works, but should we expect

> authors who are unfamiliar with html to get that?

I was precisely trying **not** to think about it in terms of
HTML. What I proposed was, as explicitly said, purely based on
visual apperance:

,-----,------ “dear” is enclosed in emphasis markers
v v
*hello **dear* boy**
^ ^
`------------------`----- the whole phase is also enclosed in
emphasis markers

And that’s why I prosed the translation in question.

It’s funny that everybody assumed I was thinking in terms of some
kind of error-correcting parser. I didn’t even consider how this
would be implemented while I was mulling over what to do with
that fragment.

> > <em>hello <strong>dear</em> boy</strong>


> Yeah, we could give them output that displays as they expect

> and fix it under the hood by doing:


> > <em>hello <strong>dear</strong></em><strong> boy</strong>

> >

> But, the output **I** would expect is one of:


> <em>hello </em><em>dear</em> boy**


> <em>hello **dear</em> boy**


> *hello <strong>dear* boy</strong>

I could live with that one:

<em>hello **dear</em> boy**

That makes the least non-sense, of them all. But it would mean
that a lot of people will leave visible slop in their
Markdown-formatted text because they won’t care enough to fix it.

Remember that even with the output I proposed, it’s not hard for
someone to get *any* of these other interpretations, as long they
actually care enough to explicitly write their markup so as to
produce that.

> In my mind I keep going back and forth between the three and

> can never decide which the author intended. Finally, I cringe

> as I realize they probably intended what Seumas suggested.

Well, clearly, they meant to nest a few kinds of emphasis.

They probably wrote it that way either because they were thinking
in terms of markup and wanted overlapping regions, in which case
they can still do that with a bit more effort.

Or they wrote it hastily without caring a whole lot about the
output; in which case my proposal would at least keep markup slop
out of the output.

> To me, that is an important factor that seems to be ignored by

> some here. Sometimes, IMO, the best thing to do is to pass the

> markup through as literal text and give the author a clue that

> his formatting is unclear!

Again, that works if the author cares. If they don’t, it means
the output will be ugly.

* Seumas Mac Uilleachan <seumas at idirect.ca> [2008-03-06 19:50]:

> implies an error-checking mechanism built into markdown to

> catch such cases.

No it doesn’t. In fact I’m pretty sure I can implement the
translation I proposed purely as a regex substitution, at least
in Perl.

> Maybe what is needed is some kind of syntax checker to run the

> source through to point out to users where there are errors

> and/or confusing markup.

Wasn’t the original point that Markdown should have no concept of
an invalid document? I know Michel said that was his goal for
Markdown Extra, and it’s pretty much the argument which which
“Bowerbird” started this thread.

Aristotle Pagaltzis // <http://plasmasturm.org/>

More information about the Markdown-Discuss mailing list