converting html with \xa9 to Markdown and using iconv?

Milian Wolff mail at milianw.de
Thu Mar 22 17:03:29 EDT 2007


Am Donnerstag, 22. März 2007 schrieb Jeremy C. Reed:

> The html document various characters like

>   \xa0

> © \xa9 (Copyright symbol)

> (and others).

>

> I tried using html2text.py but it didn't like these characters.

>

> Any ideas on how I can use iconv or another tool to convert documents like

> this so I can then convert to Markdown?

>

> I don't want to do manually as I have around 500+ documents.

>

>

> Jeremy C. Reed


As far as I understand you, you are looking for a converter which supports
UTF-8 / Unicode characters?

My PHP-script (ported from html2text.py) doesn't change those, so it would
theoretically work. Try it out at [1].

But: It's PHP - so unless you have access to a command line or write a little
PHP script to be run locally it will be of no use for you. The latter should
be pretty easy though, simply recourse through your files / folders, apply
html2text to all and save the output somewhere. You might want to allow
long(er) execution times for PHP scripts for the meantime.

Another alternative would be to use one of the other converters, I know there
are some but I don't have their URLs at hand. Maybe someone will be able to
help you.

[1]: http://milianw.de/projects/html2text/

--
Milian Wolff
http://milianw.de


More information about the Markdown-Discuss mailing list