RFC: Lazy syntax for paragraphs, blockquotes and lists

Thomas Leitner t_leitner at gmx.at
Fri Sep 3 05:43:49 EDT 2010


Hi everybody,

it was requested that kramdown (a Markdown parser in Ruby, see
<http://kramdown.rubyforge.org>) supports the lazy syntax of Markdown.
So I sat down, thought about it, skimmed through the Markdown ML on
issues regarding lazy indentation as done with Markdown and now I have
some rough idea on how to do this in kramdown.

First: I'd like to say that there is no way to satisfy everyone. Lazy
indentation has some pros and cons and we have to find a middle ground!

Second: This is a rather long mail but worth the read, especially if
you want to influence how kramdown implements the lazy syntax!

Third: I have cross-posted this email to the Markdown ML because it
provides a nice explanation of why the behaviour of the lazy syntax in
Markdown.pl might be as it is.



In the beginning there was...
=============================

Markdown was created by John Gruber because he wanted a nice text
format that is inspired by how email messages are written. There is a
requirement that lines in plain text email message should not be longer
than 78 characters and therefore many mail (transport) programs
hard-wrap text to a specific line length.

I think that this is the reason why we have lazy indentation or
generally long line wrapping in Markdown. If we would say that each
paragraph must be one long line, there would obviously be problems when
messages get automatically wrapped by (email) programs. Therefore
Markdown allows paragraphs to continue on the following lines.



The Markdown syntax
===================

This is just a short summary of how and in which elements Markdown
supports lazy indentation (taken more or less from the [Markdown Syntax
Page][1]).


## Blockquotes

A blockquote starts with a `>` character. All following lines with a
`>` character belong to the same blockquote. However, you may be lazy
and put the `>` character only before the first line of a blockquote:

> This is a normal
paragraph in a blockquote.

> The blockquote is continued here!!!

A blank line between two blockquotes does *not* separate the
blockquotes, it's just one large blockquote.


## Lists

As with blockquotes, the content of a list item must not be indented
correctly. For example:

* This is a normal
paragraph in a list item.

This is even allowed for other paragraphs in the list:

* This is a paragraph.

This is a paragraph
with a lazy indentation.



Problems/Ambiguities
====================

The lazy indentation syntax provides Markdown users with many chances
to get some unexpected output... Additionally, since both lists and
blockquotes support lazy indentation it is sometimes not clear what the
outcome is when those two elements are combined.

Here are some issues taken from the Markdown ML.

PA1. First example:

* this is list item
> * this item is in a block quote
more block quoting?

PA2. Second example:

* > list item with quoting
more text here

* > list item with quoting
more text here
* another list item

PA3. Third example:

> > I wrote something
> you replied
and now here is my reply to your reply.

PA4. Fourth example:

> * foo
> > bar
> > baz

The above examples can be interpreted in one way or another. This means
that we won't find a solution that satisfies all needs. We can only try
to find a solution that is based on a general rule which feels natural
to the user and does what most people would expect.

Michel Fortin wrote [this][2] on the Markdown ML regarding the lazy
syntax:


> Basically, I'd eliminate any "half-lazy" syntax were you can be lazy

> about list item indentation while not being lazy on blockquote

> markers. This just creates confusion; syntax markers shouldn't be

> allowed to be lazy.

>

> Removing half-lazy things would also fix a surprising issue with

> blockquotes:

>

> > foo

> > > bar

> > baz

>

> This would be seen as a blockquote containing a "foo" paragraph, a

> nested "bar" blockquote and a "baz" paragraph, instead of the

> completly counter-intuitive output produced today. To make "baz"

> part of the nested blockquote, you would either go the explicit route:

>

> > foo

> > > bar

> > > baz

>

> or the lazy route:

>

> > foo

> > > bar

> baz

>

> but not something in between.




kramdown "lazy" syntax
======================

I thought about how I would like things to work, considering all of the
above and I came to the following solution. Note, however, that I do
*not* recommend using the lazy syntax when writing a document!

Since the problem of the lazy syntax arises from the problem of line
wrapping, why not just use that to specify how the lazy syntax should
work?

Before we go into details consider the following: The kramdown syntax
page lists the following structural block level elements:

* Blank lines
* Paragraphs
* Headers
* Blockquotes
* Code blocks
* Lists (incl. footnote definitions)
* Tables
* Horizontal rules
* Math blocks
* HTML blocks

We can leave out all elements which do not inherently support line
wrapping, namely blank lines (no text to wrap), code blocks (should be
output as is), tables, horizontal rules, math blocks (same as with code
blocks) and HTML blocks.

Headers can also be left out assuming that a header text is not long
enough to trigger line wrapping (this has also been discussed on the
Markdown ML and I think that the consensus was that longer header texts
should be written directly in HTML).

This leaves us with three elements: paragraphs, blockquotes and lists.
However, blockquotes and lists are just "wrappers" around paragraphs
and therefore the only element that really contains any text in a
kramdown (Markdown) document is a paragraph (I also count the compact
list text that is not wrapped in `<p>` tags as a paragraph because
conceptually it is one). So when we know how long lines in paragraphs
are wrapped, the behaviour of long lines in blockquotes and lists are
easy to derive.


## Requirements

There are two requirements regarding line wrapping and "lazy" syntax:

* Line wrapping may be done like it is done by dumb editors, ie. a long
line is split on whitespace before the maximal line length and the
text continues on the next line (ie. *no* blank line in-between).
This means that the additional lines belong to the line (and
therefore a certain paragraph) to which line wrapping has been
applied!

* It must be possible to blockquote a kramdown document (which
possible contains lazy lines) and preserve the structure of the quoted
document.


## Paragraphs

So how to lazy wrap simple paragraphs? This is the easiest one since
the [Markdown syntax description][1] already tells us how: just
hard-wrap your lines and separate multiple consecutive paragraphs with
one or more blank lines.

For example:

This is one long long long long long long long long long line

gets wrapped to:

This is one long long
long long long long
long long long line

So the paragraph rule as stated on the [Markdown syntax page][1] is
actually needed to support being lazy when writing paragraphs - and to
support programs that hard-wrap long lines.


## Blockquotes

By following the two requirements as stated above, it is clear how the
lazy syntax for blockquotes has to look like.

The following examples modify this document:

This is one long long long long long long long long long line

BQ1. After blockquoting:

> This is one long long long long long long long long long line

BQ2. After line wrapping and blockquoting:

> This is one long long
> long long long long
> long long long line

BQ3. After line wrapping, blockquoting and blockquoting:

> > This is one long long
> > long long long long
> > long long long line

BG4. After blockquoting and line wrapping:

> This is one long long
long long long long
long long long line

BG5. After blockquoting, line wrapping and blockquoting:

> > This is one long long
> long long long long
> long long long line

As can be seen in the last example, the "half-lazy" syntax described by
Michel Fortin arises naturally when blockquoting and line wrapping are
combined in a certain way. However, I think it should not make any
difference whether a document is first line-wrapped and then
blockquoted or the other way around. Therefore I would allow this
"half-lazy" syntax.

What happens if line wrapping is done several times?

BG5 with additional line wrapping:

> > This is one
long long
> long long
long long
> long long
long line

This looks a bit scary, I admit, but it is still one paragraph embedded
in two blockquotes... I don't suggest that anyone writes his documents
in this way though...

Due to line wrapping we now also have to require the use of blank lines
between a blockquote and a following paragraph. Otherwise it is
impossible to know whether example BQ4 contains just a blockquote or a
blockquote followed by a paragraph.

I don't think that requiring a blank line is a burden on writers. If
you look through the kramdown or the Markdown ML, you will see that in
nearly all emails quoted text is separated from the response by at
least one blank line.

Note that kramdown would generate two separate blockquotes if they are
separated by a blank line (Markdown.pl merges the blockquotes):

> This is one blockquote with
a long line.

> This is another blockquote
with a long line.

If you run the example BQ1 to BQ5 through Markdown.pl, you will find
that it produces the expected output (as defined above). This is no
coincidence, I think, since Markdown.pl has been designed with email
messages in mind. However, the requirements as stated above
haven't been written down anywhere (at least I don't know of it) and
with those the behaviour of Markdown.pl is easily explained.


## Lists

The content of lists, footnote definitions and all other content
(except code blocks) that is defined via indentation, also has to
support the lazy syntax.

We will start with this document:

* This is one long long long long long long long long long line

This is one long long long long long long long long long line
* Another very very very very very very very very long line

LI1. After line wrapping:

* This is one long long
long long long long long
long long line

This is one long long
long long long long long
long long line
* Another very very very
very very very very very
long line

So line wrapping inside lists can also be explained in terms of the
requirements. And the line wrapping behaviour is identical to that of
Markdown.pl.



How to interpret the stated problems/ambiguities
================================================

After having specified how the kramdown lazy syntax would work, here is
how the initially given problems would be interpreted:

PA1. A list with one item, followed by a blockquote containing a list
with one item. Markdown.pl interprets it in more or less the same
way but using invalid HTML.

PA2. A list with three items: the first and the second item contain a
blockquote with a paragraph, the third item contains just text.
Again, Markdown.pl shows the same behaviour.

PA3. Two nested blockquotes containing one paragraph with all the text.
Markdown.pl shows the same behaviour.

PA4. A blockquote containing a) a list with one item and b) a
blockquote with a paragraph containing the text "bar baz".
Markdown.pl's behaviour differs - it puts the inner blockquote
inside the list item - again we have to disregard the invalid HTML
it produces.

There is always the problem with blockquote and list markers: if they
appear inside a paragraph and line wrapping is applied, they may
potentially end up at the beginning of a line... I don't think that
this can be avoided.

Any other problems/ambiguities/edge cases that need to be addressed?



Conclusion
==========

The proposed lazy syntax for kramdown is identical to that of the
original Markdown implementation - some edge cases are handled
differently though. However, in contrast to Markdown.pl more reasons are
given why this lazy syntax is useful and how it arises naturally when
looking at email messages and how they are processed by MTAs and email
programs.

I haven't looked at how to implement this in kramdown but it shouldn't
be too difficult. Before I do that I would like to hear your opinions
on this matter! :-)


Best regards and thanks for staying with me through this long email,
Thomas


[1]: http://daringfireball.net/projects/markdown/syntax
[2]: http://osdir.com/ml/text.markdown.general/2007-05/msg00031.html


More information about the Markdown-Discuss mailing list