Metadata syntax (was Universal syntax for Markdown)

David Sanson dsanson at gmail.com
Wed Aug 17 23:09:23 EDT 2011


First time posting here as well. I've been watching this discussion
with interest. As a user of (extended) markdown, I have long hoped for
a unified standard for (most or all) markdown extensions and a unified
handling of metadata.

It seems to me that one of the issues that arises when we start
thinking about metadata is that there really are two different kinds
of metadata: some metadata (title, author, date) is---at least in many
cases---also part of the *content* of the document. This is the kind
of metadata for which I feel the force of the demand for an elegant
plaintext solution. For some bold suggestions in this direction, see
this [old post by Michael Thompson][1] to the pandoc-discuss list.
Here is one of his examples from that post:

A Good Man Is Hard To Find

Flannery O'Connor
Spring 1952


The grandmother didn't want to go to Florida. She wanted to visit
some of her connections in east Tennessee and she was seizing at
every chance to change Bailey's mind.

Isn't that so much *prettier* than any of the options currently in
play? Email someone a document like that, and they will know exactly
what you mean, and see no distracting markup. No doubt this presents
challenges when it comes to parsing, and I have no idea whether or not
those challenges are surmountable. Clearly some rules would have to be
laid down (Does it have to be centered? Indented? Can I underline the
title ala setext? Do I have to have two blank lines after the date?
Can I leave the date out? etc.) And it raises issues for backwards
compatibility too. But I think its worth having in view a solution
that achieves a certain degree of perfection along this one dimension.

But then there is the other kind of metadata. Tags, keywords,
baseurls, paths to associated files, directives for webpage templating
software, and so on and so on. This sort of stuff is definitely not
content. It is a bunch of data that I want to associate with the file
for some reason or other. It needs to be indefinitely extensible. It
is frequently tied directly to some specific output format or context.
In other contexts, probably just needs to be ignored. Blosxom taught
us that it should all be at the top of the document (and successors,
like Jekyll, follow this tradition), but much of it is ugly enough
that it could just as well be banished to the bottom of the document,
where nobody but the author would ever have to look at it.

When it comes to this sort of metadata, I don't see any reason to look
for something elegant, language-independent, and plaintext-y. This is
where it feels like I just want a way of embedding a block of data
within a markdown file, knowing that it won't be treated as content
(and, depending on my processor and the context, knowing that it may
be sucked up and used in various ways). It is here that I agree with
the sentiment that metadata shouldn't be part of the markdown spec,
*but* I think markdown should be smart enough to ignore the metadata,
so that I don't have to strip it out before feeding the document to a
markdown processor.

Here is an extreme version of this: extant implementations of citeproc
support JSON as a bibliography format. Imagine they supported YAML.
Then imagine being able to stick something like this at the *end* of
your markdown file,

---
story:
title: A Good Man is Hard to Find
author: Flannery O'Connor
date: Spring 1952
key: oconnor1952
story:
title: The Old Man and the Sea
author: Ernest Hemingway
date: Sep 1952
key: hemingway1952
...

and then being able to treat the same file as both your markdown file
and your bibliography database, knowing that, when you run it through
the markdown parser, that chunk of metadata will be ignored, and when
you feed it as a database to your citeproc implementation, the
markdown will be ignored. This is just one example of the sort of
flexibility and power that you might get from supporting arbitrary
blocks of data within markdown files.

So, here is my *pipe dream* implementation of metadata in markdown:

1. A syntax for clean, language independent title, author, date (and
?) that looks the way you would have done it on a typewriter or in a
plaintext email.

2. Support for embedding arbitrary metadata inside of appropriate
delimiters (e.g., YAML's '---' and '...') *anywhere* within the
document.

I would then add, that, for simplicity, all markdown processors should
look into the arbitrary metadata for a few common bits of metadata,
namely, title, date, and author (perhaps with proper localizations).
That way, I could write beautiful plaintext markdown, providing title,
author, date as part of the content, if I wanted too, but if I was
lazy, or was using a bunch of metadata and preferred to keep it all in
one place, I could instead just specify that as metadata along with
all the rest. I guess this means that I think markdown should settle
on a preferred metadata format---the one that it will understand
enough to look for title, author, and date---and I guess I think YAML
is the best candidate out there.

Note that this proposal provides for something close to exactly what
MMD does right now, but makes that way of doing things even more
flexible (by allowing metadata not just at the beginning of the
document). And it also scratches the itch expressed in Christoph's
earlier email, by allowing a subset of that metadata---the stuff that
really seems like it is part of the content of the document---to be
expressed in an elegant way without ugly markup or unexpected English
in non-English documents.

Best,
David

[1]: https://groups.google.com/d/msg/pandoc-discuss/Cp8LHq9GZLo/zBoFRL4CupsJ


More information about the Markdown-Discuss mailing list