Metadata syntax (was Universal syntax for Markdown)

David Chambers david.chambers.05 at gmail.com
Wed Aug 17 21:17:19 EDT 2011


It is true that certain metadata (author and date, to provide two examples)
are used far more frequently than return addresses or URIs for graphical
signatures. That said, it would be foolish to try to imagine every way in
which metadata might be used, nor do I see much value in doing so.

If Markdown is to process metadata, the syntax should support arbitrary
key–value pairs.

For example:

author: Jesper Nøhr
date: 17 August 2011
tags: lol, omg, lulz

Formatted differently:

author: Jesper Nøhr
date: 17 August 2011
tag: lol
tag: omg
tag: lulz

If — again, if — Markdown is to be charged with parsing metadata, my opinion
is that it's role should be limited to returning a dictionary-like metadata
object (in addition to the HTML string generated from the remainder of the
document's contents).

For the first example:

{"date": "17 August 2011", "tags": "lol, omg, lulz", "author": "Jesper
Nøhr"}

For the second example:

{"date": "17 August 2011", "author": "Jesper Nøhr", "tag": ["lol",
"omg", "lulz"]}

In my opinion, Markdown should *not* be responsible for any of the
following:

- splitting lists (note that "lol, omg, lulz" is a string in the first
example)
- converting date strings into date objects
- any other manipulation of values

In other words, every value should be either a string, or an ordered,
list-like object containing two or more strings (in the case of a repeated
key).

In addition to converting strings into appropriate objects, applications
making use of Markdown's metadata feature would also be responsible for
handling the fact that the value for a particular key may be a string for
one document and a list of strings for another.

Fletcher touched on another question that should be discussed: should
multiline values be accommodated and if so, how?

I think it'd be great to support multiline strings. I imagine the formatting
looking something like this:

author:
Jesper Nøhr
date:
17 August 2011
lol:
Irony keffiyeh pitchfork, mustache letterpress tofu cred twee scenester
thundercats gluten-free yr chambray sartorial stumptown. Homo cosby
sweater
gentrify banh mi letterpress, vinyl beard hoodie terry richardson. Art
party
whatever banksy, readymade skateboard you probably haven't heard of them
tumblr tattooed PBR letterpress photo booth carles vegan organic.
omg:
VHS carles photo booth food truck synth craft beer, wes anderson tofu
banksy
fanny pack stumptown.

This strikes me as being in the spirit of Markdown, as it's how one might
structure this content if one were to produce it on a typewriter.

I'm interested to hear people's thoughts on multiline values and on the
unfancy approach to metadata parsing that I (currently) favour.

David


On 17 August 2011 15:17, M Harris <mark at 2011.n0b.org> wrote:


> So, hi all. First time commenting on the list.

>

> I personally think having tags (whether of type "author:" or type "by")

> is useful for two reasons.

> One: It allows multiple tags to be entered. Two, it clears up the

> potential problem listed by Fletcher regarding tags.

>

> by Christoph Freitag

> Affiliation: XYZ

> by Fletcher T. Penney

> Affiliation: ABC

> tags: Markdown, Standardization, MMD, Metadata

> desc: An interesting discussion of how metadata could be included

> usefully in Markdown, whilst being readable etc.

>

>

> Regarding the localisation problem then, I thought that this was a

> solved problem when it came to computing? (At least in the cases of the

> major world languages.) A parser could have a table of equivalent words,

> so in English "by", en français "de" (pardon my French*).

>

> * By which I mean, I'm not sure that's correct, because I'm only a

> learner.

>

> > From: Christoph Freitag <mail at christoph-freitag.de>

> > Fletcher, sorry, but personally -- despite loving MMD (and even having

> used MMD CMS for a diary) -- I have never liked the way MMD handles

> metadata. Partly this is because, not being a native English speaker, I

> dislike English meta descriptors. A localization could resolve this -- but I

> still think it looks ugly. However, do you actually need descriptors at all?

> I doubt it:

> >

> > * The title could be anything "at the start" of the document. Blosxom

> is a good example. Anything up to the first blank line is the title.

> > * After that, anything between the first blank line and the second

> blank line would be treated as additional metadata.

> > * Instead of the "Author:" descriptor, explicitely stated, it should

> suffice to write "by". What follows is the name of the author. (Localization

> would be easier as only this "keyword" would have to be known to the parser

> in a number of languages.)

> > * Dates would be self-explanatory, to a clever parser.

> > * Any list of words separated by commas on a single line would be

> treated as tags.

> > * Any more fanciful meta descriptors might be given explicitly just as

> in MMD before. This could be left to non-standard, personalized variants of

> Markdown.

> >

> > Thus the following would be a valid document:

> >

> > ---

> > Test Document for Automatic Metadata Detection

> >

> > by Christoph Freitag

> > 08/17/2011

> > Markdown, Standardization, MMD, Metadata

> >

> > A Markdown document may contain metadata in a human readable form that

> the parser converts to a machine readable form of metadata automatically. A

> casual reader will understand the content directly and without distraction.

> Bowerbird will love this.

>

>

> > From: "Fletcher T. Penney" <fletcher at fletcherpenney.net>

>

> > You mention the English-centric nature of MMD metadata. This is

> certainly true, but no more so than HTML itself. One could certainly

> localize MMD to use any language you like (the beauty of open source), but

> to match your proposal in multiple languages would be quite complicated.

> >

> > For example, the following are valid MMD metadata dates, and easily used:

> >

> > date: 8/17/2011

> > date: August 17th, 2011

> > date: 2011-08-17

> > date: 17/8/2011

> > date: 14. Juni 2001

> > date: 8 avril 2000

> >

> > Writing a parser that would correctly catch all of these dates in any

> language would be quite difficult, and prone to error.

> >

> > You mention tags as being easily recognized, but that this is not always

> true:

> >

> > A sample document

> >

> > by John Smith, MD

> > Director of Palliative Care, Division of General Medicine, Medical

> University of Somewhere

> >

> > While perhaps not the best example of potential problems, this would be

> incorrectly interpreted as tags, when the author probably implies that this

> represents his academic affiliation and would like it to be properly placed

> after his name on the title page, or on the slide deck if generating via

> beamer.

>

>

>

> _______________________________________________

> Markdown-Discuss mailing list

> Markdown-Discuss at six.pairlist.net

> http://six.pairlist.net/mailman/listinfo/markdown-discuss

>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://six.pairlist.net/pipermail/markdown-discuss/attachments/20110817/2247f382/attachment.html>


More information about the Markdown-Discuss mailing list