String Extraction
To get all the text content from a web page, use the TextExtractingVisitor, like so:
import org.htmlparser.Parser;
import org.htmlparser.util.ParserException;
import org.htmlparser.visitors.TextExtractingVisitor;
public class StringDemo
{
public static void main (String[] args) throws ParserException
{
Parser parser = new Parser ("http://pageIwantToParse.com");
TextExtractingVisitor visitor = new TextExtractingVisitor ();
parser.visitAllNodesWith (visitor);
System.out.println (visitor.getExtractedText());
}
}
If you want a more browser like behaviour, use the StringBean like so:
import org.htmlparser.beans.StringBean;
public class StringDemo
{
public static void main (String[] args)
{
StringBean sb = new StringBean ();
sb.setLinks (false);
sb.setReplaceNonBreakingSpaces (true);
sb.setCollapse (true);
sb.setURL ("http://pageIwantToParse.com");
System.out.println (sb.getStrings ());
}
}