Jan's Markdown Preview

A. Pagaltzis pagaltzis at gmx.de
Sat Nov 12 21:40:03 EST 2005


* robert mcgonegal <robert37 at gmail.com> [2005-11-13 02:20]:

> Has anyone created a Perl version of Jan's script?


I have a script which

1. Opens a Markdown source file, assuming it is UTF-8.
2. Appends a long list of initialisms, marked up as

[PHP]: abbr: "PHP: Hypertext Processor"

This is rather a hack and I don’t really like it, but I
didn’t want hack on Markdown itself to add syntax for
initialisms, so this was the cheapest solution.
3. Converts the whole shebang from Unicode to ASCII with numeric
entities because Markdown balks at Perl Unicode strings.
4. Convers the source to HTML.
5. Sticks the result into a full HTML document, scanning the
result for the first `<h1>` element in order to use it as a
document title. If no `<h1>` is there it uses the path to the
source file.
6. Translates `<a href="abbr:">` links to `<abbr>` tags.
7. Throws the whole thing into HTML Tidy, to make sure it really
is XHTML-valid-as-XML, because Markdown produces badly nested
tags in some circumstances, and in any case it doesn’t do
anything about badly nested tags in its input.
8. The cleaning is necessary in order to push the result down
the wire as an honest-to-god, `application/xhtml+xml` XHTML
document, which happens next. (Note that this means IE users
see naught but a “save file” dialog… I don’t care, but you
might.)

I integrated this into Apache’s serving machinery using the
following configuration directives:

AddHandler translate-markdown .mkd
Action translate-markdown /cgi/mkd.cgi

I *think* you can put those in a `.htaccess` file too, should
that be easier. Obviously, the local URL to the script will have
to be adjusted for your needs.

The script is a bit too specific to my setup to be useful to
others as is, but it shouldn’t need more than a bit of tweaking.
You will need the Text::Markdown, HTML::TokeParser::Simple and
HTML::Tidy modules from the CPAN.

If you don’t need/want some of the things I require, it could be
shortened significantly, including shedding dependencies.

In any case, here goes:

#!/usr/bin/perl
use strict;
use warnings;

use Encode qw( encode );
use Text::Markdown qw( markdown );
use HTML::TokeParser::Simple;
use HTML::Tidy;

sub scan_anchor_end {
my ( $p ) = @_;
my $html = '';
while( my $t = $p->get_token ) {
if( $t->is_start_tag( 'a' ) ) {
if ( $t->get_attr( 'href' ) eq 'abbr:' ) {
$t->delete_attr( 'href' );
my $rewritten = $t->as_is();
$rewritten =~ s/<a\b/<abbr/;
$html .= $rewritten;
scan_anchor_end( $p );
$html .= qq{</abbr>};
}
else {
$html .= $t->as_is();
( my $subdoc, $t ) = scan_anchor_end( $p );
$html .= $subdoc . $t->as_is();
}
}
elsif( $t->is_end_tag( 'a' ) ) {
return ( $html, $t );
}
else {
$html .= $t->as_is();
}
}
return $html;
}

###################################################################

my $source = "# Error\n\nNo document.";

if( open my $fh, '<:utf8', $ENV{ PATH_TRANSLATED } || '' ) {
local $/;
$source = <$fh>;

# append acronym definitions
open $fh, '<', '/home/ap/.abbr.mkd'
and $source .= "\n\n" . <$fh>;
}

my $body = markdown( encode( 'us-ascii', $source, Encode::FB_HTMLCREF ) );

my $html = do{
my $title = $body =~ m{<h1>(.*?)</h1>} ? $1 : $ENV{ REDIRECT_URL };
<<"END_HTML";
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>$title</title>
<link rel="stylesheet" type="text/css" href="/readings/readings.css" />
<script type="text/javascript" src="/readings/readings.js"></script>
</head>
<body>
$body
</body>
</html>
END_HTML
};

$html = scan_anchor_end( HTML::TokeParser::Simple->new( \$html ) );

my $tidy = HTML::Tidy->new( { config_file => '/home/ap/.tidy.conf.xhtmlize' } );

$tidy->ignore( type => TIDY_WARNING );
$tidy->ignore( type => TIDY_ERROR );

my $document = $tidy->clean( $html );

print "Content-type: application/xhtml+xml\n\n", $document;

And the HTML Tidy configuration file I use:

show-warnings: no
show-errors: 0
quiet: yes
force-output: yes

output-xhtml: yes
doctype: strict
add-xml-decl: no

drop-empty-paras: no
drop-proprietary-attributes: yes
enclose-text: yes
enclose-block-text: yes
fix-uri: yes
logical-emphasis: yes
replace-color: yes
numeric-entities: yes

indent: no
indent-attributes: no
markup: yes
wrap: 0

ascii-chars: no
newline: LF
output-encoding: utf8

fix-backslash: yes
tidy-mark: no

Regards,
--
#Aristotle
*AUTOLOAD=*_=sub{s/(.*)::(.*)/print$2,(",$\/"," ")[defined wantarray]/e;$1};
&Just->another->Perl->hacker;


More information about the Markdown-Discuss mailing list