[ANN] vfmd

Roopesh Chander roop at forwardbias.in
Wed Oct 2 11:49:40 EDT 2013


### Handling tabs everywhere is a feature

If I think about it, handling tabs everywhere is actually more correct.
Consider the example you mentioned:
http://johnmacfarlane.net/babelmark2/?normalize=1&text=%09%09a%0A%09.%09a%0A%09..%09a%0A%09...%09a%0A%09....%09a

As you said, most implementations convert the "\t" in a code block to a
bunch of spaces, but that's not what the user would expect, right? And
there's nothing in the user documentation that would explain this either.

>From the user's point of view, this would be unexpected arbitrary behaviour

(which is exacerbated by the fact that most browsers use a tabstop of 8
columns).

### Handling tabs everywhere is easy

Also, though it initially appeared that it's hard to handled tabs
everywhere instead of dealing with them in a preprocessor, I now think it's
not that difficult. Among block-level elements, tabs are significant only
for (1) lists and (2) code blocks.

Code blocks are easy: Instead of 4 spaces at the start of the line, we
should look for either 4 spaces or 0-3 spaces + 1 tab.

Lists can be handled as I explained earlier (but actually, the behaviour of
that algo looks about the same as the modulo-4 algo for all inputs - I'm
not sure yet - still working on verifying this).

For other block level elements, the pattern matching should consider tabs
also instead of spaces, that's it.

For span-level elements, tabs are just like any other whitespace character,
so it's very straightforward there.

So it appears to me that handling tabs as tabs wouldn't increase the
complexity by many orders of magnitude. Maybe there is something else I'm
missing here?


Unless the complexity increases by several orders of magnitude, I would
think that it's better to have a parser that gives a more correct
interpretation, even if it's at the expense of a little higher complexity
of programming.

roop.


On Tue, Oct 1, 2013 at 8:11 PM, Michel Fortin <michel.fortin at michelf.ca>wrote:


> Le 1-oct.-2013 à 8:15, Roopesh Chander <roop at forwardbias.in> a écrit :

>

> > I think I was being biased towards the expandtab-way of text editing, in

> my

> > previous mails on this topic. Thinking from the point of view of a

> > keep-tabs-as-tabs text editor, another solution becomes possible, which I

> > describe below:

> >

> > This is how the user documentation would look like:

> > --- snip ---

> > The list bullet char should be followed by one or more spaces or a tab

> > character.

> >

> > For multi-paragraph list items, the subsequent paragraphs should be

> > indented to vertically align with the first paragraph of the list item,

> > using either spaces or a tab. If you are using tabs for indentation, you

> > should use the same number of tab characters to indent all paragraphs of

> > the list item, including the first paragraph.

> > --- snip ---

> >

> > The parser would operate as follows:

> > --- snip ---

> > - Let t be the number of tab characters occurring after the bullet

> > character of a list item

> > - If t > 0, then

> > - First, look for t tab characters (with optional interspersed spaces)

> as

> > indentation for subsequent paragraphs

> > - If previous step didn't match, look for tabs+spaces as indentation

> > using the modulo-4 method

> > - If t = 0, then

> > - Look for tabs+spaces as indentation using the modulo-4 method

> > --- snip ---

> >

> > Advantage:

> > - If the user doesn't mix space-indentation and tab-indentation, it would

> > work for almost all cases (see below for the not-working case),

> > irrespective of the tabstop setting he uses

> > - If the user uses space-indentation in some parts of the list and

> > tab-indentation in other parts of the list, things are only as bad as the

> > modulo-4 method

> >

> > Disadvantage:

> > - Even if the user doesn't mix space-indentation and tab-indentation, it

> > doesn't work for the case where the (number of spaces before the bullet

> > char) = (tabstop - 1). For example, with tabstop=4 (_ being tab and .

> being

> > space):

> >

> > ...*____list item para 1

> >

> > ________list item para 2

> >

> > will be parsed unintuitively. This is why the user documentation above

> > says: "If you are using tabs for indentation, you should use the same

> > number of tab characters to indent all paragraphs of the list item,

> > including the first paragraph."

> >

> > What are your thoughts on this option?

>

> Seems more complicated than just replacing tabs with spaces in the input.

> More complicated because now you have to handle tabs everywhere in the

> parser. This means more things can go wrong: there's many more edge cases

> to deal with and that will require a bigger test suite. And more

> complicated logic means more code to maintain/debug and potentially a

> slower parser. That's for the inconveniences, on the plus side we can deal

> with arbitrary tab-stop lengths in some cases.

>

> I think it's a good idea, but I'm not convinced it is worth the trouble.

>

> --

> Michel Fortin

> michel.fortin at michelf.ca

> http://michelf.ca

>

> _______________________________________________

> Markdown-Discuss mailing list

> Markdown-Discuss at six.pairlist.net

> http://six.pairlist.net/mailman/listinfo/markdown-discuss

>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://six.pairlist.net/pipermail/markdown-discuss/attachments/20131002/3b85d14e/attachment.html>


More information about the Markdown-Discuss mailing list