HTML Parser Home Page

org.htmlparser.lexer.nodes
Class StringNode

java.lang.Object
  extended byorg.htmlparser.AbstractNode
      extended byorg.htmlparser.lexer.nodes.StringNode
All Implemented Interfaces:
Node, Serializable
Direct Known Subclasses:
StringNode

public class StringNode
extends AbstractNode

Normal text in the HTML document is represented by this class.

See Also:
Serialized Form

Field Summary
protected  String mText
          The contents of the string node, or override text.
 
Fields inherited from class org.htmlparser.AbstractNode
children, mPage, nodeBegin, nodeEnd, parent
 
Constructor Summary
StringNode(Page page, int start, int end)
          Constructor takes in the page and beginning and ending posns.
StringNode(String text)
          Constructor takes in the text string.
 
Method Summary
 void accept(Object visitor)
          Apply the visitor object (of type NodeVisitor) to this node.
 String getText()
          Returns the text of the string line.
 void setText(String text)
          Sets the string contents of the node.
 String toHtml()
          This method will make it easier when using html parser to reproduce html pages (with or without modifications) Applications reproducing html can use this method on nodes which are to be used or transferred as they were recieved, with the original html
 String toPlainTextString()
          Returns a string representation of the node.
 String toString()
          Express this string node as a printable string This is suitable for display in a debugger or output to a printout.
 
Methods inherited from class org.htmlparser.AbstractNode
collectInto, doSemanticAction, elementBegin, elementEnd, getChildren, getEndPosition, getPage, getParent, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHTML
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

mText

protected String mText
The contents of the string node, or override text.

Constructor Detail

StringNode

public StringNode(String text)
Constructor takes in the text string.

Parameters:
text - The string node text. For correct generation of HTML, this should not contain representations of tags (unless they are balanced).

StringNode

public StringNode(Page page,
                  int start,
                  int end)
Constructor takes in the page and beginning and ending posns.

Parameters:
page - The page this string is on.
start - The beginning position of the string.
end - The ending positiong of the string.
Method Detail

getText

public String getText()
Returns the text of the string line.

Specified by:
getText in interface Node
Overrides:
getText in class AbstractNode

setText

public void setText(String text)
Sets the string contents of the node.

Specified by:
setText in interface Node
Overrides:
setText in class AbstractNode
Parameters:
text - The new text for the node.

toPlainTextString

public String toPlainTextString()
Description copied from class: AbstractNode
Returns a string representation of the node. This is an important method, it allows a simple string transformation of a web page, regardless of a node.
Typical application code (for extracting only the text from a web page) would then be simplified to :
 Node node;
 for (Enumeration e = parser.elements();e.hasMoreElements();) {
    node = (Node)e.nextElement();
    System.out.println(node.toPlainTextString()); // Or do whatever processing you wish with the plain text string
 }
 

Specified by:
toPlainTextString in interface Node
Specified by:
toPlainTextString in class AbstractNode

toHtml

public String toHtml()
Description copied from class: AbstractNode
This method will make it easier when using html parser to reproduce html pages (with or without modifications) Applications reproducing html can use this method on nodes which are to be used or transferred as they were recieved, with the original html

Specified by:
toHtml in interface Node
Specified by:
toHtml in class AbstractNode

toString

public String toString()
Express this string node as a printable string This is suitable for display in a debugger or output to a printout. Control characters are replaced by their equivalent escape sequence and contents is truncated to 80 characters.

Specified by:
toString in interface Node
Specified by:
toString in class AbstractNode
Returns:
A string representation of the string node.

accept

public void accept(Object visitor)
Description copied from interface: Node
Apply the visitor object (of type NodeVisitor) to this node.

Specified by:
accept in interface Node
Specified by:
accept in class AbstractNode

© 2004 Somik Raha
Mar 14, 2004

HTML Parser is an open source library released under LGPL.
SourceForge.net