From [email protected] Fri Jul 28 10:11:52 1995 Article: 6946 of comp.infosystems.www.announce From: [email protected] (Tom Phelps) Subject: SOFTWARE: RosettaMan, a manual page to HTML converter Date: Thu, 27 Jul 95 05:48:47 GMT-1:00 Organization: University of California, Berkeley RosettaMan is distinguished from other manual page to HTML filters in several ways: it makes the most aggressive analysis of pages (attempting to identify lists, for instance), it generates several types of output including HTML, and it recognizes the man pages of many flavors of UNIX, each of which varies in some way from the others. RosettaMan has been around for some time and quite it's stable, though for some reason it has never been announced in this group. Tom ---------------- RosettaMan is a filter for UNIX manual pages. It takes as input man pages formatted for a variety of UNIX flavors (i.e., formatted [tn]roff source) and produces as output a variety of file formats. Currently RosettaMan accept man pages as formatted by the following flavors of UNIX: Hewlett-Packard HP-UX, AT&T System V, SunOS, Sun Solaris, OSF/1, DEC Ultrix, SGI IRIX, Linux, SCO, FreeBSD; and produces output for the following formats: printable ASCII only (with page headers and footers stripped), section and subsection headers only, TkMan, [tn]roff, Ensemble, HTML, LaTeX, RTF, Perl 5 pod. RosettaMan improves upon other man page filters in several ways: (1) its analysis recognizes the structural pieces of man pages, enabling high quality output, (2) its modular structure permits easy augmentation of output formats, (3) it accepts man pages formatted with the variant macros of many different flavors of UNIX, and (4) it doesn't require modification of or cooperation with any other program. RosettaMan is a rewrite of TkMan's man page filter, called bs2tk. (If you haven't heard about TkMan, a hypertext man page browser written in Tcl/Tk, you can grab it via anonymous ftp from the same place as RosettaMan.) Whereas bs2tk generated output only for TkMan, RosettaMan generalizes the process so that the analysis can be leveraged to new output formats. A single analysis engine recognizes section heads, subsection heads, body text, lists(!), references to other man pages, boldface, italics, bold italics, special characters (like bullets) and strips out page headers and footers. The engine sends signals to the selected output functions so that an improvement of the engine improves the quality of output of all of them. Output format functions are easy to add, and thus far average about about 75 lines of C code each. A note for HTML consumers: This filter does real (heuristic) parsing--no <PRE>! Man page references are turned into hypertext links. This file is an example of the quality of output produced entirely automatically (no retouching) by RosettaMan. Several people have extended World Wide Web servers to format man pages on the fly. Check the README file in the contrib directory for a list. CHANGES in 2.2 * when in SEE ALSO, hyphens would confuse man page-reference finder, so re-linebreak if necessary to eliminate them (!) (Greg Earle & Uri Guttman) -- [email protected] --