text/markdown effort in IETF (invite)

Thu Jul 10 11:49:54 EDT 2014

On Jul 09, 2014, at 11:49 AM, Sean Leonard <dev+ietf at seantek.com> wrote:

Hi markdown-discuss Folks:

I am working on a Markdown effort in the Internet Engineering Task 
Force, to standardize on "text/markdown" as the Internet media type for 
all variations of Markdown content. You can read my draft here: 
<http://tools.ietf.org/html/draft-seantek-text-markdown-media-type-00        >.

My response below is lengthy but covers a number of different points including some raised later in the discussion by others. 

Sean, have you reached out to Mr. Gruber specifically? I mention this because in the past I have CCed him directly on a response I sent to this list which prompted him to respond (admittedly that happened some years ago). I suspect he might be amenable to the general idea though. A search of the list archives turned up a previous discussion [1] where he indicated a willingness to put in some work to obtain a mime type for markdown. Of course, that was back when he was still actively involved. Your mileage may vary.

[1]: http://article.gmane.org/gmane.text.markdown.general/1179

In any event, I have some thoughts about your proposal. I like it for the most part. But a few comments on some specifics:

Why do we need a Mime Type?
----------------------------------------

First of all, when is this necessary? In order words, when is plain markdown being sent around such that it needs a mime type? In my experience, REST API's (for example) use JSON or XML which may contain some Markdown text among other data. That other data may identify that the text is "markdown", but the mime type for the file is JSON or XML (or at least the appropriate mime type for that file type). Or are you proposing that everyone standardize on a way to identify the markdown text within JSON and XML documents as Markdown text? What am I missing here?

Encodings
--------------

To shed a different light on the encoding issue, consider Python-Markdown (disclosure: I'm the primary developer). Just as in Python 3 (where all strings are Unicode), Python-Markdown only works with Unicode. You pass Unicode text in, and you get Unicode text out. It is up to the user of the library to concern themselves with encoding and decoding a file to/from a specific encoding. As Python provides the libraries to do that, it is not a big problem -- although for those used to working with byte strings it may be a little jarring (I'm seeing that reaction from people who are experimenting with Apple's new Swift Language -- which also supports Unicode only strings).

The point is, the Python-Markdown implementation has no use for the encoding (except for the included wrapping commandline script). Of course, the user (user of the library) will care about that and will need some way to identify the encoding before converting and passing the input on to the Python-Markdown library. So yes, encoding is very much a real, needed piece of meta-data.

However, if the markdown text is included in a JSON file (see my previous point above), then wouldn't the encoding be defined for the JSON file, not the markdown text specifically. The JSON parsing library would just spit out a Unicode string -- in which case, why do we need this?

Flavors
---------

To me, "flavors" seems like a disaster waiting to happen. Sean, I realize you have specifically stated a lack of understanding here, so lets go back in time. The following may not be an all-inclusive (or in proper order of events) history of Markdown, but provides enough (I hope) to make a point.

Way back when, the "flavor" of markdown you used depended almost entirely on which language (Perl, PHP, Python...) you were using to code your project (blog, wiki, CMS, etc.). If you where using PHP, them your flavor was PHP Markdown... There was only one implementation per language and they (mostly?) agreed with each other. In that day "flavor" was completely pointless. I suspect a number of us resistant to the "flavors" part of your proposal are from that period in Markdown's history.

Of course, then Ruby came along. I don't remember which library was which, but when the first library came out, it was not very good (lots of bugs and slow). Then a second library came out which also wasn't very good, but in different ways (except for the slow part). Some people wrote their markdown documents with the bugs of the first implementation in mind, while other's wrote their documents with the second in mind. Then a few projects started offering users the option to pick which Ruby implementation of Markdown to use for each individual document - and "flavors" were born. Then other people started making ports of those projects to other languages and the "flavors" followed -- even though the other languages didn't really have any choices. As a reminder, Github came out of that Ruby culture, which might explain why Github-Flavored-Markdown ever existed in the first place (interesting side note: Gruber appears to like GFM [2] -- or at least the original release -- it has grown to include more features since then).

[2]: http://daringfireball.net/linked/2009/10/23/github-flavored-markdown

Then someone wrote a PEG grammar for Markdown. Once the hard work was done, a few people ported that grammar to other languages. And then a few people wrote C implementations (one of which used a PEG Grammar IIRC). Then, people wrote wrappers around the C libraries for any number of scripting languages (Perl, PHP, Python, Ruby...) and now there are a multitude of choices regardless of which language your project is coded in. Some time ago I started an incomplete [list] -- incomplete because those are the implementations I am aware of -- I'm sure there are some others.

[list]: https://github.com/markdown/markdown.github.com/wiki/Implementations

But for those of us that remember the pre-Ruby days there is only "one true implementation" per language and all the rest is just a bunch or noise (Okay, perhaps I exaggerate a bit -- just trying to make a point). For us "flavors" means something else entirely. Because before all this Ruby and C mess, we also had Multimarkdown and PHP Markdown Extra, more-or-less extending the same basic markdown syntax. Of course, those extensions are not identical, but given that each was implemented in a different language, it didn't matter. The "flavor" depended on which language your project was implemented in and that was it.

Of course, many of the extensions created in Multimarkdown and PHP Markdown Extra were then ported to other implementations in other languages. Consider Python-Markdown for instance. Python-Markdown provides an extension API so that any user of the library can write an extension which modifies the syntax in any way they wish -- to the point that it may not be Markdown any more. And a number of extensions ship with the Python-Markdown library [3]. Of those (at current count) 17 extensions, 7 of them also come under the umbrella of an 8th -- Extra. In other words, each individual feature of PHP Markdown Extra was implemented as its own extension, then when we had all of them, a wrapping extension (called "extra") was created as a shortcut. Some users use "extra", but others only use "footnotes" (for example). Any number of "flavors" are possible with the various combinations of extensions that ship with just this one library. And many of those extensions also accept user defined configuration settings which alters that extension's behavior (see footnotes [4] for an example). Then, there is a fairly extensive list of third party extensions [5] (which is always changing). I don't imagine that there is any sensible way to define all those possibilities in a way that is also understandable by other markdown implementations.

[3]: https://pythonhosted.org/Markdown/extensions/index.html
[4]: https://pythonhosted.org/Markdown/extensions/footnotes.html
[5]: https://github.com/waylan/Python-Markdown/wiki/Third-Party-Extensions

The great thing about Markdown is that any (decent) parser will simply pass over markup it doesn't understand. The text will just get passed through as (mostly) plain text. Given that one of the guiding principles behind Markdown is that it is human readable, if a particular implementation does not support a certain extension, the reader of the output could still understand the intended meaning and formatting (or at least "view source" as other's have mentioned). Of course, this depends on a number of factors (overridden tokens, HTML's whitespace collapsing considerations, etc). There are certainly many examples that that does not hold true for. But overall, I don't see that as a large concern. 

So, the point (finally) is that "flavors" seem like an impossible-to-get-right part of your proposal and really won't matter in the real word. For example. if you send me some markdown text with a flavor of "markdown.pl", but I'm using Awk as my programming language, then I'm not going to use markdown.pl anyway. Or, if you send me a flavor of "extra", Awk doesn't have an implementation that supports "extra" (AFAIK), so, that is useless to me as well. On the other hand, if I'm using Python, I can account for "extra" easily. Or for "markdown.pl" (just turn off smart_emphasis [6]). But "multimarkdown" is a different matter (I'm not exactly sure which features are supported by Multimarkdown or whether Python-Markdown's extensions implement them in the same way). And then there's "gfm" and "pandoc" and ... so many variations to account for. I think I'll just ignore this flavor stuff and use the implementation of *my* choice which may or may not support the flavor sent my way.

[6]: https://pythonhosted.org/Markdown/reference.html#smart_emphasis

I hope that helps.

Waylan Limberg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://six.pairlist.net/pipermail/markdown-discuss/attachments/20140710/7149ff12/attachment.html>