xMarkdown implementation
european bob
bob at wolfwall.com
Mon Mar 29 19:19:05 EST 2004
Aaron Swartz wrote:
> For correcting the HTML, have you considered HTML Tidy? It already has
> tons of logic to deal with this stuff, and I think it's available as a
> Perl plugin.
Yes, I have (a little ;). Basically, xMarkdown is not aimed at
correcting HTML - it's aimed at generating it correctly in the first
place. The way it works is that the content of the document is held
outside the structure; it parses Markdown to generate the structure and
get the content, but it doesn't attempt to map the Markdown into HTML
via a transformative process. The way the structure is implemented, it's
basically impossible to generate something malformed, at the cost of
your content potentially being in the wrong place (xMarkdown moving
closing tags in your raw HTML is a good example) if the input is
"invalid" (for some value of invalid). So it's not really a tool to
"correct" really, it's a tool to create. And, if what it creates is
wrong, then it's a bug.
I think it would actually be reasonable code to formally proof too;
because of the type of data structure we're talking about there's a
relatively simple proof by induction you can do (assuming that there are
no loops; which we do, because it blows up big bang otherwise :) which
would show that the output always meets the XML validity rules. (By
saying I think it's provable; I haven't proved it nor do I think the
current code would necessarily pass such a proof - character set
handling is reasonably suspect, for example).
Tidy is potentially useful though, and I think you make a very good
point. One of the things I'm very interested is going beyond what
xMarkdown offers in terms of the 100% xml guarantee - I can output XML,
but there's no real certainty that the XML is actually XHTML (are the
tags valid? are they in the correct context? do they have mandated
attributes? etc.), and being able to extent the guarantee to XHTML would
be very interesting (for example, have you seen the studies of sites
that claim to be HTML valid, but actually aren't? I guess they actually
were valid at some point, but lost validity at some point in time - it
would be nice to not have to think about it). Now, there are a number of
ways of going about it. I could code a load of rules into a rendering
subsystem, or I could just run it through Tidy (once we have valid XML,
the ambiguity issue is even less of a problem). Maybe the answer is to
meet half-way: extend the guarantee with a few bits of code that hit the
low-hanging fruit, and then pull in Tidy if we detect it's available and
wanted.
It would be quite interesting to know how many people use Tidy in page
generation. I know I never have done; and I've been running systems
which have been XHTML1.1 valid for some time. I think I see Tidy much
more as a rectifying tool, and therefore don't think about it in terms
of actually generating content to begin with. I will take a look at it
though; it can only inform me. There are definitely a number of
different approaches you can take toward the same end-goal, and I
strongly believe in diversity of solutions.
-- bob.
More information about the Markdown-discuss
mailing list