|
HTML Parser Home Page | ||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objectorg.htmlparser.AbstractNode
org.htmlparser.lexer.nodes.TagNode
org.htmlparser.tags.Tag
org.htmlparser.tags.CompositeTag
The base class for tags that have an end tag.
Provided extra accessors for the children above and beyond what the basic
Tag provides. Also handles the conversion of it's children for
the toHtml method.
| Field Summary | |
protected static CompositeTagScanner |
mDefaultScanner
The default scanner for non-composite tags. |
protected TagNode |
mEndTag
The tag that causes this tag to finish. |
| Fields inherited from class org.htmlparser.lexer.nodes.TagNode |
breakTags, mAttributes |
| Fields inherited from class org.htmlparser.AbstractNode |
children, mPage, nodeBegin, nodeEnd, parent |
| Constructor Summary | |
CompositeTag()
|
|
| Method Summary | |
void |
accept(NodeVisitor visitor)
Tag visiting code. |
Node |
childAt(int index)
Get child at given index |
SimpleNodeIterator |
children()
Get an iterator over the children of this node. |
void |
collectInto(NodeList list,
NodeFilter filter)
Collect this node and its child nodes (if-applicable) into the collectionList parameter, provided the node satisfies the filtering criteria. |
StringNode[] |
digupStringNode(String searchText)
Finds a string node, however embedded it might be, and returns it. |
SimpleNodeIterator |
elements()
Return the child tags as an iterator. |
int |
findPositionOf(Node searchNode)
Returns the node number of a child node given the node object. |
int |
findPositionOf(String text)
Returns the node number of the first node containing the given text. |
int |
findPositionOf(String text,
Locale locale)
Returns the node number of the first node containing the given text. |
Node |
getChild(int index)
Get the child of this node at the given position. |
int |
getChildCount()
|
Node[] |
getChildrenAsNodeArray()
Get the children as an array of Node objects. |
String |
getChildrenHTML()
|
TagNode |
getEndTag()
|
TagNode |
getStartTag()
Deprecated. The tag *is* ths start tag. |
String |
getStringText()
Return the text between the start tag and the end tag. |
String |
getText()
Return the text contained in this tag. |
protected void |
putChildrenInto(StringBuffer sb)
|
protected void |
putEndTagInto(StringBuffer sb)
|
void |
removeChild(int i)
Remove the child at the position given. |
Tag |
searchByName(String name)
Searches all children who for a name attribute. |
NodeList |
searchFor(Class classType,
boolean recursive)
Collect all objects that are of a certain type Note that this will not check for parent types, and will not recurse through child tags |
NodeList |
searchFor(String searchString)
Searches for all nodes whose text representation contains the search string. |
NodeList |
searchFor(String searchString,
boolean caseSensitive)
Searches for all nodes whose text representation contains the search string. |
NodeList |
searchFor(String searchString,
boolean caseSensitive,
Locale locale)
Searches for all nodes whose text representation contains the search string. |
void |
setEndTag(TagNode end)
|
void |
setStartTag(TagNode start)
Deprecated. The tag *is* ths start tag. |
String |
toHtml()
Render the tag as HTML. |
String |
toPlainTextString()
Get the plain text from this node. |
String |
toString()
Print the contents of the tag |
void |
toString(int level,
StringBuffer buffer)
|
| Methods inherited from class org.htmlparser.tags.Tag |
accept, clone, getEnders, getEndTagEnders, getIds, getThisScanner, setThisScanner |
| Methods inherited from class org.htmlparser.lexer.nodes.TagNode |
breaksFlow, getAttribute, getAttributeEx, getAttributes, getAttributesEx, getEndingLineNumber, getParameter, getParsed, getRawTagName, getStartingLineNumber, getTagBegin, getTagEnd, getTagName, isEmptyXmlTag, isEndTag, removeAttribute, setAttribute, setAttribute, setAttribute, setAttributes, setAttributesEx, setEmptyXmlTag, setTagBegin, setTagEnd, setTagName, setText |
| Methods inherited from class org.htmlparser.AbstractNode |
doSemanticAction, elementBegin, elementEnd, getChildren, getEndPosition, getPage, getParent, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHTML |
| Methods inherited from class java.lang.Object |
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
protected TagNode mEndTag
protected static final CompositeTagScanner mDefaultScanner
| Constructor Detail |
public CompositeTag()
| Method Detail |
public SimpleNodeIterator children()
public Node getChild(int index)
index - The in the node list of the child.
public Node[] getChildrenAsNodeArray()
Node objects.
public void removeChild(int i)
i - The index of the child to remove.public SimpleNodeIterator elements()
public String toPlainTextString()
TagNode
toPlainTextString in interface NodetoPlainTextString in class TagNodetoHtml().protected void putChildrenInto(StringBuffer sb)
protected void putEndTagInto(StringBuffer sb)
public String toHtml()
TagNodetoHtml() method will render it in HTML.
toHtml in interface NodetoHtml in class TagNodeNode.toHtml()public Tag searchByName(String name)
name - Attribute to match in tag
public NodeList searchFor(String searchString)
NodeList nodeList = formTag.searchFor("Hello World");
searchString - Search criterion.
searchString in them.
public NodeList searchFor(String searchString,
boolean caseSensitive)
NodeList nodeList = formTag.searchFor("Hello World");
searchString - Search criterion.caseSensitive - If true this search should be case
sensitive. Otherwise, the search string and the node text are converted
to uppercase using an English locale.
searchString in them.
public NodeList searchFor(String searchString,
boolean caseSensitive,
Locale locale)
NodeList nodeList = formTag.searchFor("Hello World");
searchString - Search criterion.caseSensitive - If true this search should be case
sensitive. Otherwise, the search string and the node text are converted
to uppercase using the locale provided.locale - The locale for uppercase conversion.
searchString in them.
public NodeList searchFor(Class classType,
boolean recursive)
classType - The class to search for.recursive - If true, recursively search through the children.
public int findPositionOf(String text)
text - The text to search for.
public int findPositionOf(String text,
Locale locale)
text - The text to search for.
public int findPositionOf(Node searchNode)
searchNode - The child node to find.
public Node childAt(int index)
index - The index into the child node list.
public void collectInto(NodeList list,
NodeFilter filter)
This mechanism allows powerful filtering code to be written very easily,
without bothering about collection of embedded tags separately.
e.g. when we try to get all the links on a page, it is not possible to
get it at the top-level, as many tags (like form tags), can contain
links embedded in them. We could get the links out by checking if the
current node is a CompositeTag, and going through its children.
So this method provides a convenient way to do this.
Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:
NodeList collectionList = new NodeList();
NodeFilter filter = new TagNameFilter ("A");
for (NodeIterator e = parser.elements(); e.hasMoreNodes();)
e.nextNode().collectInto(collectionList, filter);
Thus, collectionList will hold all the link nodes, irrespective of how
deep the links are embedded.Another way to accomplish the same objective is:
NodeList collectionList = new NodeList();
NodeFilter filter = new TagClassFilter (LinkTag.class);
for (NodeIterator e = parser.elements(); e.hasMoreNodes();)
e.nextNode().collectInto(collectionList, filter);
This is slightly less specific because the LinkTag class may be
registered for more than one node name, e.g. <LINK> tags too.
collectInto in interface NodecollectInto in class AbstractNodepublic String getChildrenHTML()
public void accept(NodeVisitor visitor)
accept() on the start tag and then
walks the child list invoking accept() on each
of the children, finishing up with an accept()
call on the end tag. If shouldRecurseSelf()
returns true it then asks the visitor to visit itself.
accept in class Tagvisitor - The NodeVisitor object to be signalled
for each child and possibly this tag.public int getChildCount()
public TagNode getStartTag()
public void setStartTag(TagNode start)
public TagNode getEndTag()
public void setEndTag(TagNode end)
public StringNode[] digupStringNode(String searchText)
searchText -
public String toString()
TagNode
toString in interface NodetoString in class TagNodepublic String getText()
getText in interface NodegetText in class TagNodepublic String getStringText()
public void toString(int level,
StringBuffer buffer)
|
© 2004 Somik Raha Mar 14, 2004
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||