[ANN] vfmd

Michel Fortin michel.fortin at michelf.ca
Thu Oct 3 14:06:43 EDT 2013


Le 3-oct.-2013 à 11:38, Roopesh Chander <roop at forwardbias.in> a écrit :


>> Anything that is more complex has ripple effects through the test suite

>> (which has more code paths to check)

>

> I totally agree that tests need to be added (I think that would hold true

> even without this new tab-habdling thingie). Any change in behaviour might

> involve an update to the tests. I don't think that can be considered a

> reason for not making parsers behave more correctly.

>

>> and to the edge cases we need to think about when factoring the spec.

>

> Let me try updating the vfmd spec for handling tabs during parsing (i.e.

> without handling them in preprocessing) in a separate branch/fork in

> GitHub. Then let's see if the updated spec has any problems or unconsidered

> cases.


Well, what I meant is that it's more maintenance work for everyone (spec writer and all implementers).



>> One you probably didn't think of:

>> <

> http://johnmacfarlane.net/babelmark2/?normalize=1&text=%3Eblockquote%0A%3E%0A%3E+still+blockquote%0A%3E%0A%3E%09blockquote+or+code+block%3F%0A%3E%0A%3E+%09blockquote+or+code+block%3F

>>

>

> Methinks, from a user's point of view, both the "blockquote or code block"

> blocks would seem like code-blocks, so I would say that should be the

> correct interpretation. (I realize that many implementations interpret it

> differently, but we should be more concerned with how it *should* be

> interpreted, rather than how it *is* being interpreted by implementations

> now).


Really? We be more concerned with how it *should* be interpreted instead of how it *is* implemented?

I'll just open a parenthesis here. You know what made the HTML5 parsing algorithm a success? It's quite simple actually. It formalized all the clunky patchwork that browsers where doing and created a parser algorithm that everyone could use. That meant that parsing of the `<title>` element is idiotically special-cased, so is `<script>`, so is `<plaintext>`, etc. Why? Because browser vendors could not start from a clean state: their browser needed to be able to parse the thousands of millions of HTML documents on the web reliably, irrespective of how "well-formed" they were. The failure rate had to be tremendously small.

The web has shown that a majority of people don't code with specs in mind, they just test that it works with their implementation. That's very true for Markdown too. There are probably millions of Markdown documents out there using various quirks of current implementations, and there's no way to even estimate how many. If you want mainstream implementations to use your spec, it needs to incur minimal changes to how those mainstream implementations behave, otherwise adoption becomes a risky move for those implementers who do not have the luxury of starting with a clean state.

So a change in the treatment of tabs, while it might seem innocuous at first glance, is the kind of change that has the potential to break existing documents in various ways that are hard to predict even for an expert reviewing a document in text form (all whitespaces are look the same after all). Whether your algorithm is "the correct one" or not, I'd argue that the only possible correct algorithm is one that is already mainstream. That is, unless you wish to start from a clean state and create another derivative of Markdown which will have to fight for its own place.

I know it's boring to define things as they are when they could be "better". But the more boring the spec is and the more it conforms with how mainstream implementation parses Markdown, the best chances it has of being adopted by mainstream implementations, at least for implementers who think like me.



>> The more the spec deviates from what the parsers are actually doing, the more

>> difficult it'll be to adopt for implementers for two reasons: implementation work

>> and the potential to break our user's documents.

>

> Let's consider each of your reasons one by one.

>

> ### Reason 1: Implementation work

>

> vfmd can entice developers to adopt it on two orthogonal, sometimes

> conflicting factors:

> (a) It's easy to adopt it

> (b) It gives the best possible interpretation for any input

>

> vfmd anyway has a different parsing architecture from most current

> implementations (per my knowledge), so (a) wouldn't stand. Just satisfying

> (a) wouldn't be very persuasive either. If it wants a chance at being

> implemented, it's got to aim for (b), even if that can be a little

> detrimental to (a). It should be easy to adopt, but not at the cost of

> correctness.


But are you sure about B? I'm not convinced it is so much better. Replacing tabs with spaces before parsing means that we interpret things the same way as a 4-spaces-per-tab-stop editor will display them, always. Even if you have an invisible stray tab somewhere, anywhere, if it looks right in your editor it'll work. On the other side, your algorithm assumes that tabs are always intentional; it will break if somewhere they're a stray tab that was not meant to be there. It's not that clear-cut to me which is better. It is just based on different assumptions, and will fail in different circumstances. Changing behaviour is more likely to fail with existing documents however.



> ### Reason 2: Breaking existing documents

>

> Are you talking about list handling or for other parts of the syntax? For

> lists, for users using tabstop=4, the behaviour is the same, as we saw

> earlier.


It's not always the same, even for lists. Putting your list inside a blockquote changes things because it adds a two-column indentation. Some extension features I mentioned before may also be affected, making things harder to adopt for implementations that have to support extensions (pretty much all of them actually).



> For tabs within code blocks, the behaviour would be different, but

> I would be surprised if users *relied* on tabs within code blocks turning

> into spaces in the output.


Actually, *I* rely on tabs being converted to spaces within code blocks in many of my documents. It happens a lot that I have tabs in the code I copy-paste, and since browsers don't all show tabs in a consistent way inside a `<pre>`, it's much better if they get converted to spaces. But I didn't think tabs inside code blocks were into question here, are they?


--
Michel Fortin
michel.fortin at michelf.ca
http://michelf.ca



More information about the Markdown-Discuss mailing list