Reverse Html Rendering
In order to get back the html representation of a web page, you may use toHtml() recursively. Here's one way to get it:
import org.htmlparser.Parser;
import org.htmlparser.util.NodeIterator;
import org.htmlparser.util.ParserException;
public class ToHtmlDemo
{
public static void main (String[] args) throws ParserException
{
Parser parser = new Parser ("http://urlIWantToParse.com");
StringBuffer html = new StringBuffer (4096);
for (NodeIterator i = parser.elements();i.hasMoreNodes();)
html.append (i.nextNode().toHtml ());
System.out.println (html);
}
}
Often, it might be desired to modify the html being reconstructed. In such a case, you must change the tag's attributes prior to calling toHtml().
For example, if the tag in question is a link tag, and you wish to modify the href, do this:
linkTag.setLink ("http://newUrlString");
linkTag.toHtml ();
This is equivalent to:
linkTag.setAttribute ("href", "http://newUrlString");
linkTag.toHtml ();
This latter would work on any tag, but few other tags have an HREF attribute according to the HTML specification.
The toHtml() method applies to all nodes, not just tags. For tags it is basically a reconstruction of the tag using its attributes (at the atomic level) and its children (at the macro/composite level).
You can also change the name of the tag like so:
tag.setTagName (newTagName);
and there are numerous ways to add, remove or change the attributes of a tag. For example, to add or change the ID attribute to "EditArea" use:
tag.setAttribute ("id", "EditArea", '"');
Whole tags can be added and removed from the list of children held by each tag. For example, to add a <P> tag at the same level as another tag:
newTag = new Tag ();
newTag.setTagName ("P");
tag.getParent ().getChildren ().add (newTag);
Be careful, getChildren () may return null for an arbitrary tag.