Parser Design

HTMLParser is a SAX-like parser streaming parser, that has the capability to correct dirty-html on the fly. It is extremely fast and lightweight. The binary distribution of the jar file is around 135 KB only, and it can easily be brought down to 65 KB for a minimal parsing requirement (prior to optimization and obfuscation).

It is also extensible. The parser provides both InternalIterators and ExternalIterators. The parser has some interesting PatternStories..

--SomikRaha


Last edited on Monday, March 17, 2003 6:18:45 am.