Converting from HTML
Up to full list of filters
Browsers
Most www browsers will convert html to plain text, for example the Linemode browser
or Lynx:
www -na "some-URL" > my-text
lynx -dump "some-URL" > my-text
(See the Lynx documentation)
Mosaic, Netscape Navigator, Internet Explorer and other browsers
will let you "save as" plain text, and in some versions also in other formats
including formatted text and PostScript.
SGML tools
Some SGML tools will allow you to convert HTML to other formats. For instance:
-
gf. A general-purpose SGML compiler.
-
HyperHelp Bridge from
Bristol Technology will convert to RTF.
-
SGML2TeX will convert SGML to TeX on the PC.
-
Fred will convert SGML to
HTML, TeX (PostScript), ASCII etc.
-
sgml2 will convert SGML to other formats.
-
instant from OSF can be used with sgmls to produce
various output formats from standard SGML inputs.
- An HTML to ICADD Transformation Service
translates HTML into the ICADD DTD, suitable for further translation to Braille,
large print or voice synthesis.
Further information is available on
SGML resources and tools.
Other tools
- PostScript
- MS Word / RTF
-
html2rtf
translates HTML to RTF. Using this program and the standard windows help
compiler, you can convert hyperlinked Web pages into Windows HLP files.
Contact: [email protected]
-
hh2rtf
is a set of freeware perl scripts that converts most HtmlHelp formatted
HTML to WinHelp-ready RTF. Contact: [email protected] (Steve Atkins).
-
htm2rtf is another
converter for PC/DOS. Contact: [email protected] (Yves Sagnier)
-
html2wrd.zip is a Microsoft Word Basic program to
convert HTML documents (including lists, tables and other formatting) into WinWord documents.
Contact: [email protected] (Yuri M. Lesiuk)
-
Here is another tool under development
- Frame
- WordPerfect
- LaTeX etc
- Plain text and setext
- Markup Remover from
Aquatic Moon Software is a Windows application to convert HTML into plain text:
there are several output options.
- Remove-It from GME Systems
is a Windows 3.1X based HTML Tag removal utility.
- Here is a Visual Basic application to do the job.
- HTMSTRIP by Bruce
Guthrie for DOS/Windows processes and removes embedded HTML commands
from Web pages. Reflows paragraphs, processes tables, etc as straight ASCII
text.
- HTMLCon for MSDOS converts HTML to ASCII.
- HTML Markdown
is a drag-and-drop Macintosh program that converts HTML files into
regular text files.
- An html parser in perl is available which will also
convert HTML to plain text.
- Here is information about some other html to ascii converters.
- The dehtml option of htmlchek will produce plain
ASCII from HTML.
- html2setext will convert HTML to setext
structured enhanced text which is human-readable. It is available from
Serious Cybernetics.
Contact: [email protected] (Andrew Pam)
- Other formats
- An enhanced html parser handles
HP-PCL and other printer formats
- HTMLhelp converts to WinHelp
- see html2rtf for another route to WinHelp
- and hh2rtf for yet another route to WinHelp
- cphtml converts an annotated HTML file
into a perl script as an aid to writing cgi scripts.
- HTMLDBF converts HTML pages into DBF files (dBase III+).
Check out word processor filters, some of
which work both ways, and also HTML editors.
__________________________________________________________________
MS,
CERN
19 March 1999