Jan's Markdown Preview
    A. Pagaltzis 
    pagaltzis at gmx.de
       
    Sat Nov 12 21:40:03 EST 2005
    
    
  
* robert mcgonegal <robert37 at gmail.com> [2005-11-13 02:20]:
> Has anyone created a Perl version of Jan's script?
I have a script which
 1. Opens a Markdown source file, assuming it is UTF-8.
 2. Appends a long list of initialisms, marked up as
 
        [PHP]: abbr: "PHP: Hypertext Processor"
    This is rather a hack and I don’t really like it, but I
    didn’t want hack on Markdown itself to add syntax for
    initialisms, so this was the cheapest solution.
 3. Converts the whole shebang from Unicode to ASCII with numeric
    entities because Markdown balks at Perl Unicode strings.
 4. Convers the source to HTML.
 5. Sticks the result into a full HTML document, scanning the
    result for the first `<h1>` element in order to use it as a
    document title. If no `<h1>` is there it uses the path to the
    source file.
 6. Translates `<a href="abbr:">` links to `<abbr>` tags.
 7. Throws the whole thing into HTML Tidy, to make sure it really
    is XHTML-valid-as-XML, because Markdown produces badly nested
    tags in some circumstances, and in any case it doesn’t do
    anything about badly nested tags in its input.
 8. The cleaning is necessary in order to push the result down
    the wire as an honest-to-god, `application/xhtml+xml` XHTML
    document, which happens next. (Note that this means IE users
    see naught but a “save file” dialog… I don’t care, but you
    might.)
I integrated this into Apache’s serving machinery using the
following configuration directives:
	AddHandler translate-markdown .mkd
	Action translate-markdown /cgi/mkd.cgi
I *think* you can put those in a `.htaccess` file too, should
that be easier. Obviously, the local URL to the script will have
to be adjusted for your needs.
The script is a bit too specific to my setup to be useful to
others as is, but it shouldn’t need more than a bit of tweaking.
You will need the Text::Markdown, HTML::TokeParser::Simple and
HTML::Tidy modules from the CPAN.
If you don’t need/want some of the things I require, it could be
shortened significantly, including shedding dependencies.
In any case, here goes:
    #!/usr/bin/perl
    use strict;
    use warnings;
    use Encode qw( encode );
    use Text::Markdown qw( markdown );
    use HTML::TokeParser::Simple;
    use HTML::Tidy;
    sub scan_anchor_end {
        my ( $p ) = @_;
        my $html = '';
        while( my $t = $p->get_token ) {
            if( $t->is_start_tag( 'a' ) ) {
                if ( $t->get_attr( 'href' ) eq 'abbr:' ) {
                    $t->delete_attr( 'href' );
                    my $rewritten = $t->as_is();
                    $rewritten =~ s/<a\b/<abbr/;
                    $html .= $rewritten;
                    scan_anchor_end( $p );
                    $html .= qq{</abbr>};
                }
                else {
                    $html .= $t->as_is();
                    ( my $subdoc, $t ) = scan_anchor_end( $p );
                    $html .= $subdoc . $t->as_is();
                }
            }
            elsif( $t->is_end_tag( 'a' ) ) {
                return ( $html, $t );
            }
            else {
                $html .= $t->as_is();
            }
        }
        return $html;
    }
    ###################################################################
    my $source = "# Error\n\nNo document.";
    if( open my $fh, '<:utf8', $ENV{ PATH_TRANSLATED } || '' ) {
        local $/;
        $source = <$fh>;
        # append acronym definitions
        open $fh, '<', '/home/ap/.abbr.mkd'
            and $source .= "\n\n" . <$fh>;
    }
    my $body = markdown( encode( 'us-ascii', $source, Encode::FB_HTMLCREF ) );
    my $html = do{
        my $title = $body =~ m{<h1>(.*?)</h1>} ? $1 : $ENV{ REDIRECT_URL };
        <<"END_HTML";
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <title>$title</title>
    <link rel="stylesheet" type="text/css" href="/readings/readings.css" />
    <script type="text/javascript" src="/readings/readings.js"></script>
    </head>
    <body>
    $body
    </body>
    </html>
    END_HTML
    };
    $html = scan_anchor_end( HTML::TokeParser::Simple->new( \$html ) );
    my $tidy = HTML::Tidy->new( { config_file => '/home/ap/.tidy.conf.xhtmlize' } );
    $tidy->ignore( type => TIDY_WARNING );
    $tidy->ignore( type => TIDY_ERROR );
    my $document = $tidy->clean( $html );
    print "Content-type: application/xhtml+xml\n\n", $document;
And the HTML Tidy configuration file I use:
    show-warnings: no
    show-errors: 0
    quiet: yes
    force-output: yes
    output-xhtml: yes
    doctype: strict
    add-xml-decl: no
    drop-empty-paras: no
    drop-proprietary-attributes: yes
    enclose-text: yes
    enclose-block-text: yes
    fix-uri: yes
    logical-emphasis: yes
    replace-color: yes
    numeric-entities: yes
    indent: no
    indent-attributes: no
    markup: yes
    wrap: 0
    ascii-chars: no
    newline: LF
    output-encoding: utf8
    fix-backslash: yes
    tidy-mark: no
Regards,
-- 
#Aristotle
*AUTOLOAD=*_=sub{s/(.*)::(.*)/print$2,(",$\/"," ")[defined wantarray]/e;$1};
&Just->another->Perl->hacker;
    
    
More information about the Markdown-Discuss
mailing list