Tolerating broken HTML writers

These are illegal according to SGML, but they're so prevalent that they're supported by the sample implementation.

Please stop generating HTML in this style!

Document Structure

The BODY element must start with some element. See: an example document where this rule is broken. Paragraph breaks are not allowed in headers, lists etc. They may be ignored or treated intelligently.

Muti-paragraph

heading

Unknown Tags

Tags that aren't known to the parser are treated as data by, for example, the MidasWWW-1.0 implementation. They should be ignored. There should be no tags around the word foo: foo.

Body Elements

Note that conforming SGML parsers will treat "&", "<", "</", and "<!" as normal text characters when they are not followed by a letter. HTML producers are discouraged from taking advantage of this feature.

Anchors

numeric IDs: NeXT and html-mode.el

This anchor's name starts with a digit, which is not a name start character.

unquoted attribute literals: NeXT and html-mode.el

This anchor's href contains a '#', which is not a name character. It should lead to the NeXT implementation reference below anyway. This anchor's href contains ':' and '/', which are not a name characters. It should lead to the SLAC MidasWWW doc anyway.

Literal Text Elements

Historical Note

The original semantics of the XMP and LISTING elements is not representable in SGML. From Tags used in HTML:

But in section 7.6 of the SGML standard:

The XMP and LISTING elements are deprecated in favor of the TYPEWRITER element.

Non-standard CDATA parsing: LineMode, MidasWWW, etc.

This example section ends here: </foo . Even though the above ETAGO begins a markup error, this text is in a normal paragraph in conforming implementations.<P> <XMP> Just in case the foo close tag above wasn't recognized:

Known Implementations

The following systems are known to read and/or write HTML. They all have bugs.

Linemode Browser 1.3c
MidasWWW 1.0
The MidasWWW parses HTML into its internal data structures, and then offers the option to extract the data and write it to a file. It doesn't get it right all the time.
NeXT editor
From [email protected]
html-mode.el
from marca@@@
Viola
From Pei Wei @ O'Reilly (@@email address). Any known problems? I hear it's going to use SGMLs.
www_and_frame
@@Go get The latest version -- it should be current with this spec.
perl client
Just heard about it. haven't tried it. I don't think it supports entities.