HTML2.BAT

A HTML to Text File Converter in Batch

This is a batch that cleanly converts Web documents into plain text
files. You might have to convert the HTML file into DOS format by
loading into EDIT then saving (or using other means) before converting
if the source document is in UNIX format. This does not interpret
codes like <br> or <p>, it goes only by the line breaks in
the source. If it's one of those documents where everything's on
one line, this won't work very well. The WORDWRAP batch can help
if the file comes out with long lines.

QBasic must be on the path, it is supplied with MSDOS versions 5
and above. To use, simply enter:

   html2 infile.htm outfile.txt

If you want to preserve embedded http/ftp hyperlinks then use:

   html2 infile.htm outfile.txt /link

(substitute the actual file names, they don't have to be .htm and .txt)

HTML2.BAT removes all HTML tags that begin with a letter, ! or /
then it converts &lt; &gt; &#60; &#62; &nbsp; &quot; &#38; &amp;
and &middot codes to their proper characters, hopefully resulting
in readable text or working batch code. It gets a little flaky
when mixing &#.. codes and other & codes when right next to each
other, but I think I can live with that.

