String Extraction

To get all the text content from a web page, use the TextExtractingVisitor, like so:

import org.htmlparser.Parser;
import org.htmlparser.util.ParserException;
import org.htmlparser.visitors.TextExtractingVisitor;
public class StringDemo
{
    public static void main (String[] args) throws ParserException
    {
        Parser parser = new Parser ("http://pageIwantToParse.com");
        TextExtractingVisitor visitor = new TextExtractingVisitor ();
        parser.visitAllNodesWith (visitor);
        System.out.println (visitor.getExtractedText());
    }
}

If you want a more browser like behaviour, use the StringBean like so:

import org.htmlparser.beans.StringBean;
public class StringDemo
{
    public static void main (String[] args)
    {
        StringBean sb = new StringBean ();
        sb.setLinks (false);
        sb.setReplaceNonBreakingSpaces (true);
        sb.setCollapse (true);
        sb.setURL ("http://pageIwantToParse.com");
        System.out.println (sb.getStrings ());
    }
}




Last edited on Tuesday, January 6, 2004 6:36:18 pm.