Benchmarks with TextMate's manual
john at attacklab.net
Mon Aug 27 20:43:05 EDT 2007
On 8/27/07, Andrea Censi <andrea at censi.org> wrote:
> Maruku takes 8 seconds for parsing (on my PowerBook G4 1.5GHz).
> (please note that Ruby, per se, is much slower than Perl)
> I guess that if you plot [time for parsing] versus [length of the
> document], you get a curve which grows more than linearly for
> Markdown.pl and PHP Markdown.
> This is the same behaviour that I observed in Bluecloth (straight port
> of Markdown.pl in Ruby) -- if I remember well, time was O(length^2).
> By comparison, Maruku, and other real parsers, takes O(length).
One implementation of the current Markdown algorithm can actually beat
Markdown.pl, but it converts the TextMate in under 4 seconds (in
Firefox, on an ancient machine).
John Gruber's focus in the reference implementation has been on
functionality -- not speed. That's good news, because it means
there's a lot of room for improvement. I didn't do any clever
optimizations with Showdown; I just benchmarked, then fixed the
glaring problems. It's not so easy to speed up the Perl version right
now, because the dog-slow text::balanced parser would eclipse any
simple fixes. But the PHP port shouldn't be hard too speed up. In
fact, a good band-aid for the Perl version might be to backport
Michael's variant of the old HTML parsing code -- which builds a big
regexp to handle balanced tags up to four levels deep. (My gut tells
me we can write a simple HTML parser that's even faster than that, and
I'll take a crack at it as soon as I find the time.) Once you have a
fast HTML parser, it's easy to optimize the rest.
So, big-O aside, Markdown's current algorithm can perform a lot better
in the real world than the Perl and PHP implementations indicate; in
fact, it sounds like it can keep up with Maruku on reasonably-sized
documents. But don't get me wrong; I'm actually strongly in favor of
switching to a traditional parsing model -- even though I'd have to
throw out all the work I did porting the current algorithm to
scratch, but quickly realized it would be impossible to get 100%
compatibility without doing a direct, line-by-line port. And that's a
More information about the Markdown-Discuss