|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Object | +--arlut.csd.Util.XMLReader
This class is intended to serve as a stream-oriented proxy, allowing the Ganymede server to read XML entity and character data from a SAX parser entity by entity, rather than through the use of a callback interface, as is traditionally done with SAX.
When instantiated, the XMLReader creates a background thread that receives
SAX events from James Clark's XP XML parser. These SAX events are converted
to XMLItem objects and saved in an internal
buffer. The user of the XMLReader class calls getNextItem() to retrieve
these XMLItem objects from the XMLReader buffer, in order of receipt.
The background parse thread is throttled back as needed to avoid overflowing the XMLReader's internal buffer.
| Field Summary | |
private arlut.csd.Util.XMLItem[] |
buffer
|
private int |
bufferContents
|
private int |
bufferSize
|
private arlut.csd.Util.CircleBuffer |
circleBuffer
|
static boolean |
debug
|
private int |
dequeuePtr
|
private boolean |
done
|
private int |
enqueuePtr
|
private java.io.PrintWriter |
err
|
private arlut.csd.Util.XMLElement |
halfElement
|
private int |
highWaterMark
Set the highWaterMark to something high if on a single processor system, to something low (equal to 0) on a multi-processor native threads system. |
private org.xml.sax.InputSource |
inputSource
|
private java.lang.Thread |
inputThread
|
private org.xml.sax.Locator |
locator
|
private int |
lowWaterMark
Set the lowWaterMark to something low on a single processor system, to something high (equal to bufferSize?) on a multi-processor native threads system. |
private org.xml.sax.Parser |
parser
|
private arlut.csd.Util.XMLItem |
pushback
|
private boolean |
skipWhiteSpace
|
| Constructor Summary | |
XMLReader(java.io.PipedOutputStream sourcePipe,
int bufferSize,
boolean skipWhiteSpace)
This constructor takes a PipeOutputStream as a parameter, creates a large matching input pipe to read from, and spins off the XMLReader's parsing thread to process data that is fed into the PipeOutputStream. |
|
XMLReader(java.io.PipedOutputStream sourcePipe,
int bufferSize,
boolean skipWhiteSpace,
java.io.PrintWriter err)
This constructor takes a PipeOutputStream as a parameter, creates a large matching input pipe to read from, and spins off the XMLReader's parsing thread to process data that is fed into the PipeOutputStream. |
|
XMLReader(java.lang.String xmlFilename,
int bufferSize,
boolean skipWhiteSpace)
|
|
XMLReader(java.lang.String xmlFilename,
int bufferSize,
boolean skipWhiteSpace,
java.io.PrintWriter err)
|
|
| Method Summary | |
void |
characters(char[] ch,
int start,
int length)
Receive notification of character data. |
void |
close()
close() causes the XMLReader to terminate its operations as soon as possible. |
private void |
completeElement()
This is a private helper method used to move a completed halfElement XMLElement (which stays half-completed until we know whether the SAX parser will give us an immediately following close element, in which case we want to mark the halfElement as empty and eat the subsequent close) into the XMLReader's primary buffer. |
private arlut.csd.Util.XMLItem |
dequeue()
private dequeue method. |
void |
endDocument()
Receive notification of the end of a document. |
void |
endElement(java.lang.String name)
Receive notification of the end of an element. |
private void |
enqueue(arlut.csd.Util.XMLItem item)
private enqueue method. |
void |
error(org.xml.sax.SAXParseException exception)
Receive notification of a recoverable error. |
void |
fatalError(org.xml.sax.SAXParseException exception)
Receive notification of a non-recoverable error. |
java.lang.String |
getFollowingString(arlut.csd.Util.XMLItem openItem,
boolean skipWhiteSpace)
This method is intended to be called in the situation where we have some text between an open and close tag, as in ' |
arlut.csd.Util.XMLItem |
getNextItem()
getNextItem() returns the next XMLItem
from the XMLReader's buffer. |
arlut.csd.Util.XMLItem |
getNextItem(boolean skipWhiteSpaceChars)
getNextItem() returns the next XMLItem
from the XMLReader's buffer. |
arlut.csd.Util.XMLItem |
getNextTree()
This method reads the next XMLItem from the reader stream and, if it is an non-empty XMLElement, will return that element as the root node of a tree of all elements contained under it. |
arlut.csd.Util.XMLItem |
getNextTree(arlut.csd.Util.XMLItem startingItem)
This method takes an optional XMLItem and, if it is an non-empty XMLElement, will return that element as the root node of a tree of all elements contained under it. |
arlut.csd.Util.XMLItem |
getNextTree(arlut.csd.Util.XMLItem startingItem,
boolean skipWhiteSpace)
This method takes an optional XMLItem and, if it is an non-empty XMLElement, will return that element as the root node of a tree of all elements contained under it. |
void |
ignorableWhitespace(char[] ch,
int start,
int length)
Receive notification of ignorable whitespace in element content. |
boolean |
isDone()
|
boolean |
isNextCharData()
This method returns true if the next thing to be read in the input stream is non-whitespace character data rather than an open or close element tag. |
arlut.csd.Util.XMLItem |
peekNextItem()
peekNextItem() returns the next XMLItem
from the XMLReader's buffer. |
arlut.csd.Util.XMLItem |
peekNextItem(boolean skipWhiteSpaceChars)
peekNextItem() returns the next XMLItem
from the XMLReader's buffer. |
private void |
pourIntoBuffer(arlut.csd.Util.XMLItem item)
|
void |
processingInstruction(java.lang.String target,
java.lang.String data)
Receive notification of a processing instruction. |
void |
pushbackItem(arlut.csd.Util.XMLItem item)
pushbackItem() may be used to push the most recently read XMLItem back onto the XMLReader's buffer. |
void |
run()
|
void |
setDocumentLocator(org.xml.sax.Locator locator)
The locator allows the application to determine the end position of any document-related event, even if the parser is not reporting an error. |
void |
startDocument()
Receive notification of the beginning of a document. |
void |
startElement(java.lang.String name,
org.xml.sax.AttributeList atts)
Receive notification of the beginning of an element. |
void |
warning(org.xml.sax.SAXParseException exception)
Receive notification of a warning. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
public static final boolean debug
private org.xml.sax.Parser parser
private org.xml.sax.InputSource inputSource
private org.xml.sax.Locator locator
private final arlut.csd.Util.XMLItem[] buffer
private int enqueuePtr
private int dequeuePtr
private int bufferContents
private int bufferSize
private int lowWaterMark
private int highWaterMark
private java.lang.Thread inputThread
private boolean done
private arlut.csd.Util.XMLItem pushback
private arlut.csd.Util.XMLElement halfElement
private boolean skipWhiteSpace
private java.io.PrintWriter err
private arlut.csd.Util.CircleBuffer circleBuffer
| Constructor Detail |
public XMLReader(java.lang.String xmlFilename,
int bufferSize,
boolean skipWhiteSpace)
throws java.io.IOException
xmlFilename - Name of the file to readbufferSize - How many items the XMLReader will buffer in its
data structures at one timeskipWhiteSpace - If true, the no-param getNextItem() and peekNextItem()
methods will jump over any all-whitespace character data between other
elements.
public XMLReader(java.lang.String xmlFilename,
int bufferSize,
boolean skipWhiteSpace,
java.io.PrintWriter err)
throws java.io.IOException
xmlFilename - Name of the file to readbufferSize - How many items the XMLReader will buffer in its
data structures at one timeskipWhiteSpace - If true, the no-param getNextItem() and peekNextItem()
methods will jump over any all-whitespace character data between other
elements.err - A PrintWriter object to send debugging/error output to
public XMLReader(java.io.PipedOutputStream sourcePipe,
int bufferSize,
boolean skipWhiteSpace)
throws java.io.IOException
sourcePipe - the PipeOutputStream object that XML characters arebufferSize - How many items the XMLReader will buffer in its
data structures at one timeskipWhiteSpace - If true, the no-param getNextItem() and peekNextItem()
methods will jump over any all-whitespace character data between other
elements.
public XMLReader(java.io.PipedOutputStream sourcePipe,
int bufferSize,
boolean skipWhiteSpace,
java.io.PrintWriter err)
throws java.io.IOException
sourcePipe - the PipeOutputStream object that XML characters arebufferSize - How many items the XMLReader will buffer in its
data structures at one timeskipWhiteSpace - If true, the no-param getNextItem() and peekNextItem()
methods will jump over any all-whitespace character data between other
elements.err - A PrintWriter object to send debugging/error output to| Method Detail |
public arlut.csd.Util.XMLItem getNextItem(boolean skipWhiteSpaceChars)
getNextItem() returns the next XMLItem
from the XMLReader's buffer. If the background thread's parsing has fallen
behind, getNextItem() will block until either data is made available from
the parse thread, or the XMLReader is closed.
getNextItem() returns null when there are no more XML elements or character data to be read from the XMLReader stream.
skipWhiteSpaceChars - if true, getNextItem() will silently eat any
all-whitespace character data.public arlut.csd.Util.XMLItem getNextItem()
getNextItem() returns the next XMLItem
from the XMLReader's buffer. If the background thread's parsing has fallen
behind, getNextItem() will block until either data is made available from
the parse thread, or the XMLReader is closed.
getNextItem() returns null when there are no more XML elements or character data to be read from the XMLReader stream.
public arlut.csd.Util.XMLItem peekNextItem(boolean skipWhiteSpaceChars)
peekNextItem() returns the next XMLItem
from the XMLReader's buffer. If the background thread's parsing has fallen
behind, peekNextItem() will block until either data is made available from
the parse thread, or the XMLReader is closed.
peekNextItem() returns null when there are no more XML elements or character data to be read from the XMLReader stream.
skipWhiteSpaceChars - if true, peekNextItem() will silently eat any
all-whitespace character data. Any all-whitespace character data eaten
in this way will be taken out of the XMLReader buffer, and no subsequent
peekNextItem() or getNextItem(), with skipWhiteSpaceChars true or false,
will return that item.public arlut.csd.Util.XMLItem peekNextItem()
peekNextItem() returns the next XMLItem
from the XMLReader's buffer. If the background thread's parsing has fallen
behind, peekNextItem() will block until either data is made available from
the parse thread, or the XMLReader is closed.
peekNextItem() returns null when there are no more XML elements or character data to be read from the XMLReader stream.
public void pushbackItem(arlut.csd.Util.XMLItem item)
pushbackItem() may be used to push the most recently read XMLItem back onto the XMLReader's buffer. The XMLReader code guarantees that there will be room to handle a single item pushback, but two pushbacks in a row with no getNextItem() call in between will cause an exception to be thrown.
public java.lang.String getFollowingString(arlut.csd.Util.XMLItem openItem,
boolean skipWhiteSpace)
This method is intended to be called in the situation where we
have some text between an open and close tag, as in '
getFollowingString() does not expect there to be any other XML elements between the open and close element in the stream.
getFollowingString() expects the openElement to have already been consumed from the reader at the time that it is called, and will consume the close element before returning.
If there is no character data between openElement and the matching closeElement, null will be returned.
public arlut.csd.Util.XMLItem getNextTree()
This method reads the next XMLItem from the reader stream and, if it is an non-empty XMLElement, will return that element as the root node of a tree of all elements contained under it. All XMLItems in the tree will be linked using the getParent() and getChildren() methods supported by every XMLItem class.
If getNextTree returns a multi-node tree, all XMLCloseElements read from the reader stream will be eaten, and will not appear in the tree returned. The XMLCloseElements are used to determine where the list of children should end, and so are implicitly captured in the tree returned. If any XMLError or XMLEndDocument items are found while searching for the completion of an open element's tree, that will be returned directly, and all items loaded from the reader in building the tree will be thrown away. XMLWarning elements will be returned at the point at which they were encountered in the tree parsing.
This method is recursive, and so may cause a StackOverflowError to be thrown if the XML under the startingItem is extremely deeply nested.
This variant of getNextItem() uses the default skipWhiteSpace setting for this XMLReader.
public arlut.csd.Util.XMLItem getNextTree(arlut.csd.Util.XMLItem startingItem)
This method takes an optional XMLItem and, if it is an non-empty XMLElement, will return that element as the root node of a tree of all elements contained under it. All XMLItems in the tree will be linked using the getParent() and getChildren() methods supported by every XMLItem class.
If getNextTree returns a multi-node tree, all XMLCloseElements read from the reader stream will be eaten, and will not appear in the tree returned. The XMLCloseElements are used to determine where the list of children should end, and so are implicitly captured in the tree returned. If any XMLError or XMLEndDocument items are found while searching for the completion of an open element's tree, that will be returned directly, and all items loaded from the reader in building the tree will be thrown away. XMLWarning elements will be returned at the point at which they were encountered in the tree parsing.
This method is recursive, and so may cause a StackOverflowError to be thrown if the XML under the startingItem is extremely deeply nested.
Note that the startingItem is optional, and if it is present, it must be the last XMLItem read from this XMLReader.. getNextTree() assumes that the XMLReader is primed to read the first XMLItem following the startingItem if startingItem is provided. If startingItem is not provided, getNextTree() will read the next item from the XMLReader, and make that the root of the tree returned. If the next item is not a non-empty XML element start tag, the next item will be returned by itself.
This variant of getNextItem() uses the default skipWhiteSpace setting for this XMLReader.
public arlut.csd.Util.XMLItem getNextTree(arlut.csd.Util.XMLItem startingItem,
boolean skipWhiteSpace)
This method takes an optional XMLItem and, if it is an non-empty XMLElement, will return that element as the root node of a tree of all elements contained under it. All XMLItems in the tree will be linked using the getParent() and getChildren() methods supported by every XMLItem class.
If getNextTree returns a multi-node tree, all XMLCloseElements read from the reader stream will be eaten, and will not appear in the tree returned. The XMLCloseElements are used to determine where the list of children should end, and so are implicitly captured in the tree returned. If any XMLError or XMLEndDocument items are found while searching for the completion of an open element's tree, that will be returned directly, and all items loaded from the reader in building the tree will be thrown away. XMLWarning elements will be returned at the point at which they were encountered in the tree parsing.
This method is recursive, and so may cause a StackOverflowError to be thrown if the XML under the startingItem is extremely deeply nested.
Note that the startingItem is optional, and if it is present, it must be the last XMLItem read from this XMLReader.. getNextTree() assumes that the XMLReader is primed to read the first XMLItem following the startingItem if startingItem is provided. If startingItem is not provided, getNextTree() will read the next item from the XMLReader, and make that the root of the tree returned. If the next item is not a non-empty XML element start tag, the next item will be returned by itself.
public boolean isNextCharData()
This method returns true if the next thing to be read in the input stream is non-whitespace character data rather than an open or close element tag.
Calling this method has the side effect that if the next data in the stream is a block of all-whitespace character data, that all-whitespace character data will be silently eaten.
This method goes well with getFollowingString(); you can call this method first to verify that the next data is indeed char data, then call getFollowingString() to get all of it.
public void close()
close() causes the XMLReader to terminate its operations as soon as possible. Once close() has been called, the background XML parser will terminate with a SAXException the next time a SAX callback is performed.
public boolean isDone()
public void run()
run in interface java.lang.Runnable
private final void pourIntoBuffer(arlut.csd.Util.XMLItem item)
throws org.xml.sax.SAXException
org.xml.sax.SAXException
private final void completeElement()
throws org.xml.sax.SAXException
This is a private helper method used to move a completed halfElement XMLElement (which stays half-completed until we know whether the SAX parser will give us an immediately following close element, in which case we want to mark the halfElement as empty and eat the subsequent close) into the XMLReader's primary buffer.
org.xml.sax.SAXExceptionpublic void setDocumentLocator(org.xml.sax.Locator locator)
The locator allows the application to determine the end position of any document-related event, even if the parser is not reporting an error. Typically, the application will use this information for reporting its own errors (such as character content that does not match an application's business rules). The information returned by the locator is probably not sufficient for use with a search engine.
Note that the locator will return correct information only during the invocation of the events in this interface. The application should not attempt to use it at any other time.
setDocumentLocator in interface org.xml.sax.DocumentHandlerlocator - An object that can return the location of
any SAX document event.Locator
public void startDocument()
throws org.xml.sax.SAXException
The SAX parser will invoke this method only once, before any other methods in this interface or in DTDHandler (except for setDocumentLocator).
startDocument in interface org.xml.sax.DocumentHandlerorg.xml.sax.SAXException - Any SAX exception, possibly
wrapping another exception.
public void endDocument()
throws org.xml.sax.SAXException
The SAX parser will invoke this method only once, and it will be the last method invoked during the parse. The parser shall not invoke this method until it has either abandoned parsing (because of an unrecoverable error) or reached the end of input.
endDocument in interface org.xml.sax.DocumentHandlerorg.xml.sax.SAXException - Any SAX exception, possibly
wrapping another exception.
public void startElement(java.lang.String name,
org.xml.sax.AttributeList atts)
throws org.xml.sax.SAXException
The Parser will invoke this method at the beginning of every element in the XML document; there will be a corresponding endElement() event for every startElement() event (even when the element is empty). All of the element's content will be reported, in order, before the corresponding endElement() event.
If the element name has a namespace prefix, the prefix will still be attached. Note that the attribute list provided will contain only attributes with explicit values (specified or defaulted): #IMPLIED attributes will be omitted.
startElement in interface org.xml.sax.DocumentHandlername - The element type name.atts - The attributes attached to the element, if any.
org.xml.sax.SAXException - Any SAX exception, possibly
wrapping another exception.endElement(java.lang.String),
AttributeList
public void endElement(java.lang.String name)
throws org.xml.sax.SAXException
The SAX parser will invoke this method at the end of every element in the XML document; there will be a corresponding startElement() event for every endElement() event (even when the element is empty).
If the element name has a namespace prefix, the prefix will still be attached to the name.
endElement in interface org.xml.sax.DocumentHandlername - The element type name
org.xml.sax.SAXException - Any SAX exception, possibly
wrapping another exception.
public void characters(char[] ch,
int start,
int length)
throws org.xml.sax.SAXException
The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information.
The application must not attempt to read from the array outside of the specified range.
Note that some parsers will report whitespace using the ignorableWhitespace() method rather than this one (validating parsers must do so).
characters in interface org.xml.sax.DocumentHandlerch - The characters from the XML document.start - The start position in the array.length - The number of characters to read from the array.
org.xml.sax.SAXException - Any SAX exception, possibly
wrapping another exception.ignorableWhitespace(char[], int, int),
Locator
public void ignorableWhitespace(char[] ch,
int start,
int length)
throws org.xml.sax.SAXException
Validating Parsers must use this method to report each chunk of ignorable whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating parsers may also use this method if they are capable of parsing and using content models.
SAX parsers may return all contiguous whitespace in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information.
The application must not attempt to read from the array outside of the specified range.
ignorableWhitespace in interface org.xml.sax.DocumentHandlerch - The characters from the XML document.start - The start position in the array.length - The number of characters to read from the array.
org.xml.sax.SAXException - Any SAX exception, possibly
wrapping another exception.characters(char[], int, int)
public void processingInstruction(java.lang.String target,
java.lang.String data)
throws org.xml.sax.SAXException
The Parser will invoke this method once for each processing instruction found: note that processing instructions may occur before or after the main document element.
A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a text declaration (XML 1.0, section 4.3.1) using this method.
processingInstruction in interface org.xml.sax.DocumentHandlertarget - The processing instruction target.data - The processing instruction data, or null if
none was supplied.
org.xml.sax.SAXException - Any SAX exception, possibly
wrapping another exception.
public void warning(org.xml.sax.SAXParseException exception)
throws org.xml.sax.SAXException
SAX parsers will use this method to report conditions that are not errors or fatal errors as defined by the XML 1.0 recommendation. The default behaviour is to take no action.
The SAX parser must continue to provide normal parsing events after invoking this method: it should still be possible for the application to process the document through to the end.
warning in interface org.xml.sax.ErrorHandlerexception - The warning information encapsulated in a
SAX parse exception.
org.xml.sax.SAXException - Any SAX exception, possibly
wrapping another exception.SAXParseException
public void error(org.xml.sax.SAXParseException exception)
throws org.xml.sax.SAXException
This corresponds to the definition of "error" in section 1.2 of the W3C XML 1.0 Recommendation. For example, a validating parser would use this callback to report the violation of a validity constraint. The default behaviour is to take no action.
The SAX parser must continue to provide normal parsing events after invoking this method: it should still be possible for the application to process the document through to the end. If the application cannot do so, then the parser should report a fatal error even if the XML 1.0 recommendation does not require it to do so.
error in interface org.xml.sax.ErrorHandlerexception - The error information encapsulated in a
SAX parse exception.
org.xml.sax.SAXException - Any SAX exception, possibly
wrapping another exception.SAXParseException
public void fatalError(org.xml.sax.SAXParseException exception)
throws org.xml.sax.SAXException
This corresponds to the definition of "fatal error" in section 1.2 of the W3C XML 1.0 Recommendation. For example, a parser would use this callback to report the violation of a well-formedness constraint.
The application must assume that the document is unusable after the parser has invoked this method, and should continue (if at all) only for the sake of collecting addition error messages: in fact, SAX parsers are free to stop reporting any other events once this method has been invoked.
fatalError in interface org.xml.sax.ErrorHandlerexception - The error information encapsulated in a
SAX parse exception.
org.xml.sax.SAXException - Any SAX exception, possibly
wrapping another exception.SAXParseException
private void enqueue(arlut.csd.Util.XMLItem item)
throws java.lang.InterruptedException
java.lang.InterruptedExceptionprivate arlut.csd.Util.XMLItem dequeue()
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||