|
HTML Parser Home Page | ||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objectorg.htmlparser.AbstractNode
AbstractNode, which implements the Node interface, is the base class for all types of nodes, including tags, string elements, etc
| Field Summary | |
protected NodeList |
children
The children of this node. |
protected Page |
mPage
The page this node came from. |
protected int |
nodeBegin
The beginning position of the tag in the line |
protected int |
nodeEnd
The ending position of the tag in the line |
protected Node |
parent
The parent of this node. |
| Constructor Summary | |
AbstractNode(Page page,
int start,
int end)
Create an abstract node with the page positions given. |
|
| Method Summary | |
abstract void |
accept(Object visitor)
Apply the visitor object (of type NodeVisitor) to this node. |
void |
collectInto(NodeList list,
NodeFilter filter)
Collect this node and its child nodes (if-applicable) into the collectionList parameter, provided the node satisfies the filtering criteria. |
void |
doSemanticAction()
Perform the meaning of this tag. |
int |
elementBegin()
Deprecated. Use getStartPosition(). |
int |
elementEnd()
Deprecated. Use getEndPosition(). |
NodeList |
getChildren()
Get the children of this node. |
int |
getEndPosition()
Gets the ending position of the node. |
Page |
getPage()
Get the page this node came from. |
Node |
getParent()
Get the parent of this node. |
int |
getStartPosition()
Gets the starting position of the node. |
String |
getText()
Returns the text of the string line |
void |
setChildren(NodeList children)
Set the children of this node. |
void |
setEndPosition(int position)
Sets the ending position of the node. |
void |
setPage(Page page)
Set the page this node came from. |
void |
setParent(Node node)
Sets the parent of this node. |
void |
setStartPosition(int position)
Sets the starting position of the node. |
void |
setText(String text)
Sets the string contents of the node. |
abstract String |
toHtml()
This method will make it easier when using html parser to reproduce html pages (with or without modifications) Applications reproducing html can use this method on nodes which are to be used or transferred as they were recieved, with the original html |
String |
toHTML()
Deprecated. - use toHtml() instead |
abstract String |
toPlainTextString()
Returns a string representation of the node. |
abstract String |
toString()
Return the string representation of the node. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
protected Page mPage
protected int nodeBegin
protected int nodeEnd
protected Node parent
protected NodeList children
| Constructor Detail |
public AbstractNode(Page page,
int start,
int end)
page - The page this tag was read from.start - The starting offset of this node within the page.end - The ending offset of this node within the page.| Method Detail |
public abstract String toPlainTextString()
Node node;
for (Enumeration e = parser.elements();e.hasMoreElements();) {
node = (Node)e.nextElement();
System.out.println(node.toPlainTextString()); // Or do whatever processing you wish with the plain text string
}
toPlainTextString in interface Nodepublic abstract String toHtml()
toHtml in interface Nodepublic abstract String toString()
System.out.println(node)
toString in interface Node
public void collectInto(NodeList list,
NodeFilter filter)
This mechanism allows powerful filtering code to be written very easily,
without bothering about collection of embedded tags separately.
e.g. when we try to get all the links on a page, it is not possible to
get it at the top-level, as many tags (like form tags), can contain
links embedded in them. We could get the links out by checking if the
current node is a CompositeTag, and going through its children.
So this method provides a convenient way to do this.
Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:
NodeList collectionList = new NodeList();
NodeFilter filter = new TagNameFilter ("A");
for (NodeIterator e = parser.elements(); e.hasMoreNodes();)
e.nextNode().collectInto(collectionList, filter);
Thus, collectionList will hold all the link nodes, irrespective of how
deep the links are embedded.Another way to accomplish the same objective is:
NodeList collectionList = new NodeList();
NodeFilter filter = new TagClassFilter (LinkTag.class);
for (NodeIterator e = parser.elements(); e.hasMoreNodes();)
e.nextNode().collectInto(collectionList, filter);
This is slightly less specific because the LinkTag class may be
registered for more than one node name, e.g. <LINK> tags too.
collectInto in interface Nodepublic int elementBegin()
getStartPosition().
elementBegin in interface Nodepublic int elementEnd()
getEndPosition().
elementEnd in interface Nodepublic Page getPage()
public void setPage(Page page)
page - The page that supplied this node.public int getStartPosition()
getStartPosition in interface Nodepublic void setStartPosition(int position)
setStartPosition in interface Nodeposition - The new start position.public int getEndPosition()
getEndPosition in interface Nodepublic void setEndPosition(int position)
setEndPosition in interface Nodeposition - The new end position.public abstract void accept(Object visitor)
Node
accept in interface Nodepublic final String toHTML()
public Node getParent()
CompositeTag.
getParent in interface Nodenull otherwise.public void setParent(Node node)
setParent in interface Nodenode - The node that contains this node. Must be a CompositeTag.public NodeList getChildren()
getChildren in interface Nodenull otherwise.public void setChildren(NodeList children)
setChildren in interface Nodechildren - The new list of children this node contains.public String getText()
getText in interface Nodepublic void setText(String text)
setText in interface Nodetext - The new text for the node.
public void doSemanticAction()
throws ParserException
doSemanticAction in interface NodeParserException
|
© 2004 Somik Raha Mar 14, 2004
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||