HTML Parser Home Page

org.htmlparser.tags
Class CompositeTag

java.lang.Object
  extended byorg.htmlparser.AbstractNode
      extended byorg.htmlparser.lexer.nodes.TagNode
          extended byorg.htmlparser.tags.Tag
              extended byorg.htmlparser.tags.CompositeTag
All Implemented Interfaces:
Cloneable, Node, Serializable
Direct Known Subclasses:
AppletTag, BodyTag, Bullet, BulletList, Div, FormTag, FrameSetTag, HeadTag, Html, LabelTag, LinkTag, OptionTag, ScriptTag, SelectTag, Span, StyleTag, TableColumn, TableHeader, TableRow, TableTag, TextareaTag, TitleTag

public class CompositeTag
extends Tag

The base class for tags that have an end tag. Provided extra accessors for the children above and beyond what the basic Tag provides. Also handles the conversion of it's children for the toHtml method.

See Also:
Serialized Form

Field Summary
protected static CompositeTagScanner mDefaultScanner
          The default scanner for non-composite tags.
protected  TagNode mEndTag
          The tag that causes this tag to finish.
 
Fields inherited from class org.htmlparser.lexer.nodes.TagNode
breakTags, mAttributes
 
Fields inherited from class org.htmlparser.AbstractNode
children, mPage, nodeBegin, nodeEnd, parent
 
Constructor Summary
CompositeTag()
           
 
Method Summary
 void accept(NodeVisitor visitor)
          Tag visiting code.
 Node childAt(int index)
          Get child at given index
 SimpleNodeIterator children()
          Get an iterator over the children of this node.
 void collectInto(NodeList list, NodeFilter filter)
          Collect this node and its child nodes (if-applicable) into the collectionList parameter, provided the node satisfies the filtering criteria.
 StringNode[] digupStringNode(String searchText)
          Finds a string node, however embedded it might be, and returns it.
 SimpleNodeIterator elements()
          Return the child tags as an iterator.
 int findPositionOf(Node searchNode)
          Returns the node number of a child node given the node object.
 int findPositionOf(String text)
          Returns the node number of the first node containing the given text.
 int findPositionOf(String text, Locale locale)
          Returns the node number of the first node containing the given text.
 Node getChild(int index)
          Get the child of this node at the given position.
 int getChildCount()
           
 Node[] getChildrenAsNodeArray()
          Get the children as an array of Node objects.
 String getChildrenHTML()
           
 TagNode getEndTag()
           
 TagNode getStartTag()
          Deprecated. The tag *is* ths start tag.
 String getStringText()
          Return the text between the start tag and the end tag.
 String getText()
          Return the text contained in this tag.
protected  void putChildrenInto(StringBuffer sb)
           
protected  void putEndTagInto(StringBuffer sb)
           
 void removeChild(int i)
          Remove the child at the position given.
 Tag searchByName(String name)
          Searches all children who for a name attribute.
 NodeList searchFor(Class classType, boolean recursive)
          Collect all objects that are of a certain type Note that this will not check for parent types, and will not recurse through child tags
 NodeList searchFor(String searchString)
          Searches for all nodes whose text representation contains the search string.
 NodeList searchFor(String searchString, boolean caseSensitive)
          Searches for all nodes whose text representation contains the search string.
 NodeList searchFor(String searchString, boolean caseSensitive, Locale locale)
          Searches for all nodes whose text representation contains the search string.
 void setEndTag(TagNode end)
           
 void setStartTag(TagNode start)
          Deprecated. The tag *is* ths start tag.
 String toHtml()
          Render the tag as HTML.
 String toPlainTextString()
          Get the plain text from this node.
 String toString()
          Print the contents of the tag
 void toString(int level, StringBuffer buffer)
           
 
Methods inherited from class org.htmlparser.tags.Tag
accept, clone, getEnders, getEndTagEnders, getIds, getThisScanner, setThisScanner
 
Methods inherited from class org.htmlparser.lexer.nodes.TagNode
breaksFlow, getAttribute, getAttributeEx, getAttributes, getAttributesEx, getEndingLineNumber, getParameter, getParsed, getRawTagName, getStartingLineNumber, getTagBegin, getTagEnd, getTagName, isEmptyXmlTag, isEndTag, removeAttribute, setAttribute, setAttribute, setAttribute, setAttributes, setAttributesEx, setEmptyXmlTag, setTagBegin, setTagEnd, setTagName, setText
 
Methods inherited from class org.htmlparser.AbstractNode
doSemanticAction, elementBegin, elementEnd, getChildren, getEndPosition, getPage, getParent, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHTML
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

mEndTag

protected TagNode mEndTag
The tag that causes this tag to finish. May be a virtual tag generated by the scanning logic.


mDefaultScanner

protected static final CompositeTagScanner mDefaultScanner
The default scanner for non-composite tags.

Constructor Detail

CompositeTag

public CompositeTag()
Method Detail

children

public SimpleNodeIterator children()
Get an iterator over the children of this node.

Returns:
Am iterator over the children of this node.

getChild

public Node getChild(int index)
Get the child of this node at the given position.

Parameters:
index - The in the node list of the child.
Returns:
The child at that index.

getChildrenAsNodeArray

public Node[] getChildrenAsNodeArray()
Get the children as an array of Node objects.

Returns:
The children in an array.

removeChild

public void removeChild(int i)
Remove the child at the position given.

Parameters:
i - The index of the child to remove.

elements

public SimpleNodeIterator elements()
Return the child tags as an iterator. Equivalent to calling getChildren ().elements ().

Returns:
An iterator over the children.

toPlainTextString

public String toPlainTextString()
Description copied from class: TagNode
Get the plain text from this node.

Specified by:
toPlainTextString in interface Node
Overrides:
toPlainTextString in class TagNode
Returns:
An empty string (tag contents do not display in a browser). If you want this tags HTML equivalent, use toHtml().

putChildrenInto

protected void putChildrenInto(StringBuffer sb)

putEndTagInto

protected void putEndTagInto(StringBuffer sb)

toHtml

public String toHtml()
Description copied from class: TagNode
Render the tag as HTML. A call to a tag's toHtml() method will render it in HTML.

Specified by:
toHtml in interface Node
Overrides:
toHtml in class TagNode
Returns:
The tag as an HTML fragment.
See Also:
Node.toHtml()

searchByName

public Tag searchByName(String name)
Searches all children who for a name attribute. Returns first match.

Parameters:
name - Attribute to match in tag
Returns:
Tag Tag matching the name attribute

searchFor

public NodeList searchFor(String searchString)
Searches for all nodes whose text representation contains the search string. Collects all nodes containing the search string into a NodeList. This search is case-insensitive and the search string and the node text are converted to uppercase using an English locale. For example, if you wish to find any textareas in a form tag containing "hello world", the code would be: NodeList nodeList = formTag.searchFor("Hello World");

Parameters:
searchString - Search criterion.
Returns:
A collection of nodes whose string contents or representation have the searchString in them.

searchFor

public NodeList searchFor(String searchString,
                          boolean caseSensitive)
Searches for all nodes whose text representation contains the search string. Collects all nodes containing the search string into a NodeList. For example, if you wish to find any textareas in a form tag containing "hello world", the code would be: NodeList nodeList = formTag.searchFor("Hello World");

Parameters:
searchString - Search criterion.
caseSensitive - If true this search should be case sensitive. Otherwise, the search string and the node text are converted to uppercase using an English locale.
Returns:
A collection of nodes whose string contents or representation have the searchString in them.

searchFor

public NodeList searchFor(String searchString,
                          boolean caseSensitive,
                          Locale locale)
Searches for all nodes whose text representation contains the search string. Collects all nodes containing the search string into a NodeList. For example, if you wish to find any textareas in a form tag containing "hello world", the code would be: NodeList nodeList = formTag.searchFor("Hello World");

Parameters:
searchString - Search criterion.
caseSensitive - If true this search should be case sensitive. Otherwise, the search string and the node text are converted to uppercase using the locale provided.
locale - The locale for uppercase conversion.
Returns:
A collection of nodes whose string contents or representation have the searchString in them.

searchFor

public NodeList searchFor(Class classType,
                          boolean recursive)
Collect all objects that are of a certain type Note that this will not check for parent types, and will not recurse through child tags

Parameters:
classType - The class to search for.
recursive - If true, recursively search through the children.
Returns:
A list of children found.

findPositionOf

public int findPositionOf(String text)
Returns the node number of the first node containing the given text. This can be useful to index into the composite tag and get other children. Text is compared without case sensitivity and conversion to uppercase uses an English locale.

Parameters:
text - The text to search for.
Returns:
int The node index in the children list of the node containing the text or -1 if not found.

findPositionOf

public int findPositionOf(String text,
                          Locale locale)
Returns the node number of the first node containing the given text. This can be useful to index into the composite tag and get other children. Text is compared without case sensitivity and conversion to uppercase uses the supplied locale.

Parameters:
text - The text to search for.
Returns:
int The node index in the children list of the node containing the text or -1 if not found.

findPositionOf

public int findPositionOf(Node searchNode)
Returns the node number of a child node given the node object. This would typically be used in conjuction with digUpStringNode, after which the string node's parent can be used to find the string node's position. Faster than calling findPositionOf(text) again. Note that the position is at a linear level alone - there is no recursion in this method.

Parameters:
searchNode - The child node to find.
Returns:
The offset of the child tag or -1 if it was not found.

childAt

public Node childAt(int index)
Get child at given index

Parameters:
index - The index into the child node list.
Returns:
Node The child node at the given index or null if none.

collectInto

public void collectInto(NodeList list,
                        NodeFilter filter)
Collect this node and its child nodes (if-applicable) into the collectionList parameter, provided the node satisfies the filtering criteria.

This mechanism allows powerful filtering code to be written very easily, without bothering about collection of embedded tags separately. e.g. when we try to get all the links on a page, it is not possible to get it at the top-level, as many tags (like form tags), can contain links embedded in them. We could get the links out by checking if the current node is a CompositeTag, and going through its children. So this method provides a convenient way to do this.

Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:

 NodeList collectionList = new NodeList();
 NodeFilter filter = new TagNameFilter ("A");
 for (NodeIterator e = parser.elements(); e.hasMoreNodes();)
      e.nextNode().collectInto(collectionList, filter);
 
Thus, collectionList will hold all the link nodes, irrespective of how deep the links are embedded.

Another way to accomplish the same objective is:

 NodeList collectionList = new NodeList();
 NodeFilter filter = new TagClassFilter (LinkTag.class);
 for (NodeIterator e = parser.elements(); e.hasMoreNodes();)
      e.nextNode().collectInto(collectionList, filter);
 
This is slightly less specific because the LinkTag class may be registered for more than one node name, e.g. <LINK> tags too.

Specified by:
collectInto in interface Node
Overrides:
collectInto in class AbstractNode

getChildrenHTML

public String getChildrenHTML()

accept

public void accept(NodeVisitor visitor)
Tag visiting code. Invokes accept() on the start tag and then walks the child list invoking accept() on each of the children, finishing up with an accept() call on the end tag. If shouldRecurseSelf() returns true it then asks the visitor to visit itself.

Overrides:
accept in class Tag
Parameters:
visitor - The NodeVisitor object to be signalled for each child and possibly this tag.

getChildCount

public int getChildCount()

getStartTag

public TagNode getStartTag()
Deprecated. The tag *is* ths start tag.


setStartTag

public void setStartTag(TagNode start)
Deprecated. The tag *is* ths start tag.


getEndTag

public TagNode getEndTag()

setEndTag

public void setEndTag(TagNode end)

digupStringNode

public StringNode[] digupStringNode(String searchText)
Finds a string node, however embedded it might be, and returns it. The string node will retain links to its parents, so further navigation is possible.

Parameters:
searchText -
Returns:
The list of string nodes (recursively) found.

toString

public String toString()
Description copied from class: TagNode
Print the contents of the tag

Specified by:
toString in interface Node
Overrides:
toString in class TagNode

getText

public String getText()
Return the text contained in this tag.

Specified by:
getText in interface Node
Overrides:
getText in class TagNode
Returns:
The complete contents of the tag (within the angle brackets).

getStringText

public String getStringText()
Return the text between the start tag and the end tag.

Returns:
The contents of the CompositeTag.

toString

public void toString(int level,
                     StringBuffer buffer)

© 2004 Somik Raha
Mar 14, 2004

HTML Parser is an open source library released under LGPL.
SourceForge.net