arlut.csd.Util
Class XMLReader

java.lang.Object
  |
  +--arlut.csd.Util.XMLReader
All Implemented Interfaces:
org.xml.sax.DocumentHandler, org.xml.sax.ErrorHandler, java.lang.Runnable

public class XMLReader
extends java.lang.Object
implements org.xml.sax.DocumentHandler, org.xml.sax.ErrorHandler, java.lang.Runnable

This class is intended to serve as a stream-oriented proxy, allowing the Ganymede server to read XML entity and character data from a SAX parser entity by entity, rather than through the use of a callback interface, as is traditionally done with SAX.

When instantiated, the XMLReader creates a background thread that receives SAX events from James Clark's XP XML parser. These SAX events are converted to XMLItem objects and saved in an internal buffer. The user of the XMLReader class calls getNextItem() to retrieve these XMLItem objects from the XMLReader buffer, in order of receipt.

The background parse thread is throttled back as needed to avoid overflowing the XMLReader's internal buffer.


Field Summary
private  arlut.csd.Util.XMLItem[] buffer
           
private  int bufferContents
           
private  int bufferSize
           
private  arlut.csd.Util.CircleBuffer circleBuffer
           
static boolean debug
           
private  int dequeuePtr
           
private  boolean done
           
private  int enqueuePtr
           
private  java.io.PrintWriter err
           
private  arlut.csd.Util.XMLElement halfElement
           
private  int highWaterMark
          Set the highWaterMark to something high if on a single processor system, to something low (equal to 0) on a multi-processor native threads system.
private  org.xml.sax.InputSource inputSource
           
private  java.lang.Thread inputThread
           
private  org.xml.sax.Locator locator
           
private  int lowWaterMark
          Set the lowWaterMark to something low on a single processor system, to something high (equal to bufferSize?) on a multi-processor native threads system.
private  org.xml.sax.Parser parser
           
private  arlut.csd.Util.XMLItem pushback
           
private  boolean skipWhiteSpace
           
 
Constructor Summary
XMLReader(java.io.PipedOutputStream sourcePipe, int bufferSize, boolean skipWhiteSpace)
          This constructor takes a PipeOutputStream as a parameter, creates a large matching input pipe to read from, and spins off the XMLReader's parsing thread to process data that is fed into the PipeOutputStream.
XMLReader(java.io.PipedOutputStream sourcePipe, int bufferSize, boolean skipWhiteSpace, java.io.PrintWriter err)
          This constructor takes a PipeOutputStream as a parameter, creates a large matching input pipe to read from, and spins off the XMLReader's parsing thread to process data that is fed into the PipeOutputStream.
XMLReader(java.lang.String xmlFilename, int bufferSize, boolean skipWhiteSpace)
           
XMLReader(java.lang.String xmlFilename, int bufferSize, boolean skipWhiteSpace, java.io.PrintWriter err)
           
 
Method Summary
 void characters(char[] ch, int start, int length)
          Receive notification of character data.
 void close()
          close() causes the XMLReader to terminate its operations as soon as possible.
private  void completeElement()
          This is a private helper method used to move a completed halfElement XMLElement (which stays half-completed until we know whether the SAX parser will give us an immediately following close element, in which case we want to mark the halfElement as empty and eat the subsequent close) into the XMLReader's primary buffer.
private  arlut.csd.Util.XMLItem dequeue()
          private dequeue method.
 void endDocument()
          Receive notification of the end of a document.
 void endElement(java.lang.String name)
          Receive notification of the end of an element.
private  void enqueue(arlut.csd.Util.XMLItem item)
          private enqueue method.
 void error(org.xml.sax.SAXParseException exception)
          Receive notification of a recoverable error.
 void fatalError(org.xml.sax.SAXParseException exception)
          Receive notification of a non-recoverable error.
 java.lang.String getFollowingString(arlut.csd.Util.XMLItem openItem, boolean skipWhiteSpace)
          This method is intended to be called in the situation where we have some text between an open and close tag, as in 'Some string'.
 arlut.csd.Util.XMLItem getNextItem()
          getNextItem() returns the next XMLItem from the XMLReader's buffer.
 arlut.csd.Util.XMLItem getNextItem(boolean skipWhiteSpaceChars)
          getNextItem() returns the next XMLItem from the XMLReader's buffer.
 arlut.csd.Util.XMLItem getNextTree()
          This method reads the next XMLItem from the reader stream and, if it is an non-empty XMLElement, will return that element as the root node of a tree of all elements contained under it.
 arlut.csd.Util.XMLItem getNextTree(arlut.csd.Util.XMLItem startingItem)
          This method takes an optional XMLItem and, if it is an non-empty XMLElement, will return that element as the root node of a tree of all elements contained under it.
 arlut.csd.Util.XMLItem getNextTree(arlut.csd.Util.XMLItem startingItem, boolean skipWhiteSpace)
          This method takes an optional XMLItem and, if it is an non-empty XMLElement, will return that element as the root node of a tree of all elements contained under it.
 void ignorableWhitespace(char[] ch, int start, int length)
          Receive notification of ignorable whitespace in element content.
 boolean isDone()
           
 boolean isNextCharData()
          This method returns true if the next thing to be read in the input stream is non-whitespace character data rather than an open or close element tag.
 arlut.csd.Util.XMLItem peekNextItem()
          peekNextItem() returns the next XMLItem from the XMLReader's buffer.
 arlut.csd.Util.XMLItem peekNextItem(boolean skipWhiteSpaceChars)
          peekNextItem() returns the next XMLItem from the XMLReader's buffer.
private  void pourIntoBuffer(arlut.csd.Util.XMLItem item)
           
 void processingInstruction(java.lang.String target, java.lang.String data)
          Receive notification of a processing instruction.
 void pushbackItem(arlut.csd.Util.XMLItem item)
          pushbackItem() may be used to push the most recently read XMLItem back onto the XMLReader's buffer.
 void run()
           
 void setDocumentLocator(org.xml.sax.Locator locator)
          The locator allows the application to determine the end position of any document-related event, even if the parser is not reporting an error.
 void startDocument()
          Receive notification of the beginning of a document.
 void startElement(java.lang.String name, org.xml.sax.AttributeList atts)
          Receive notification of the beginning of an element.
 void warning(org.xml.sax.SAXParseException exception)
          Receive notification of a warning.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

debug

public static final boolean debug
See Also:
Constant Field Values

parser

private org.xml.sax.Parser parser

inputSource

private org.xml.sax.InputSource inputSource

locator

private org.xml.sax.Locator locator

buffer

private final arlut.csd.Util.XMLItem[] buffer

enqueuePtr

private int enqueuePtr

dequeuePtr

private int dequeuePtr

bufferContents

private int bufferContents

bufferSize

private int bufferSize

lowWaterMark

private int lowWaterMark
Set the lowWaterMark to something low on a single processor system, to something high (equal to bufferSize?) on a multi-processor native threads system.


highWaterMark

private int highWaterMark
Set the highWaterMark to something high if on a single processor system, to something low (equal to 0) on a multi-processor native threads system.


inputThread

private java.lang.Thread inputThread

done

private boolean done

pushback

private arlut.csd.Util.XMLItem pushback

halfElement

private arlut.csd.Util.XMLElement halfElement

skipWhiteSpace

private boolean skipWhiteSpace

err

private java.io.PrintWriter err

circleBuffer

private arlut.csd.Util.CircleBuffer circleBuffer
Constructor Detail

XMLReader

public XMLReader(java.lang.String xmlFilename,
                 int bufferSize,
                 boolean skipWhiteSpace)
          throws java.io.IOException
Parameters:
xmlFilename - Name of the file to read
bufferSize - How many items the XMLReader will buffer in its data structures at one time
skipWhiteSpace - If true, the no-param getNextItem() and peekNextItem() methods will jump over any all-whitespace character data between other elements.

XMLReader

public XMLReader(java.lang.String xmlFilename,
                 int bufferSize,
                 boolean skipWhiteSpace,
                 java.io.PrintWriter err)
          throws java.io.IOException
Parameters:
xmlFilename - Name of the file to read
bufferSize - How many items the XMLReader will buffer in its data structures at one time
skipWhiteSpace - If true, the no-param getNextItem() and peekNextItem() methods will jump over any all-whitespace character data between other elements.
err - A PrintWriter object to send debugging/error output to

XMLReader

public XMLReader(java.io.PipedOutputStream sourcePipe,
                 int bufferSize,
                 boolean skipWhiteSpace)
          throws java.io.IOException
This constructor takes a PipeOutputStream as a parameter, creates a large matching input pipe to read from, and spins off the XMLReader's parsing thread to process data that is fed into the PipeOutputStream.

Parameters:
sourcePipe - the PipeOutputStream object that XML characters are
bufferSize - How many items the XMLReader will buffer in its data structures at one time
skipWhiteSpace - If true, the no-param getNextItem() and peekNextItem() methods will jump over any all-whitespace character data between other elements.

XMLReader

public XMLReader(java.io.PipedOutputStream sourcePipe,
                 int bufferSize,
                 boolean skipWhiteSpace,
                 java.io.PrintWriter err)
          throws java.io.IOException
This constructor takes a PipeOutputStream as a parameter, creates a large matching input pipe to read from, and spins off the XMLReader's parsing thread to process data that is fed into the PipeOutputStream.

Parameters:
sourcePipe - the PipeOutputStream object that XML characters are
bufferSize - How many items the XMLReader will buffer in its data structures at one time
skipWhiteSpace - If true, the no-param getNextItem() and peekNextItem() methods will jump over any all-whitespace character data between other elements.
err - A PrintWriter object to send debugging/error output to
Method Detail

getNextItem

public arlut.csd.Util.XMLItem getNextItem(boolean skipWhiteSpaceChars)

getNextItem() returns the next XMLItem from the XMLReader's buffer. If the background thread's parsing has fallen behind, getNextItem() will block until either data is made available from the parse thread, or the XMLReader is closed.

getNextItem() returns null when there are no more XML elements or character data to be read from the XMLReader stream.

Parameters:
skipWhiteSpaceChars - if true, getNextItem() will silently eat any all-whitespace character data.

getNextItem

public arlut.csd.Util.XMLItem getNextItem()

getNextItem() returns the next XMLItem from the XMLReader's buffer. If the background thread's parsing has fallen behind, getNextItem() will block until either data is made available from the parse thread, or the XMLReader is closed.

getNextItem() returns null when there are no more XML elements or character data to be read from the XMLReader stream.


peekNextItem

public arlut.csd.Util.XMLItem peekNextItem(boolean skipWhiteSpaceChars)

peekNextItem() returns the next XMLItem from the XMLReader's buffer. If the background thread's parsing has fallen behind, peekNextItem() will block until either data is made available from the parse thread, or the XMLReader is closed.

peekNextItem() returns null when there are no more XML elements or character data to be read from the XMLReader stream.

Parameters:
skipWhiteSpaceChars - if true, peekNextItem() will silently eat any all-whitespace character data. Any all-whitespace character data eaten in this way will be taken out of the XMLReader buffer, and no subsequent peekNextItem() or getNextItem(), with skipWhiteSpaceChars true or false, will return that item.

peekNextItem

public arlut.csd.Util.XMLItem peekNextItem()

peekNextItem() returns the next XMLItem from the XMLReader's buffer. If the background thread's parsing has fallen behind, peekNextItem() will block until either data is made available from the parse thread, or the XMLReader is closed.

peekNextItem() returns null when there are no more XML elements or character data to be read from the XMLReader stream.


pushbackItem

public void pushbackItem(arlut.csd.Util.XMLItem item)

pushbackItem() may be used to push the most recently read XMLItem back onto the XMLReader's buffer. The XMLReader code guarantees that there will be room to handle a single item pushback, but two pushbacks in a row with no getNextItem() call in between will cause an exception to be thrown.


getFollowingString

public java.lang.String getFollowingString(arlut.csd.Util.XMLItem openItem,
                                           boolean skipWhiteSpace)

This method is intended to be called in the situation where we have some text between an open and close tag, as in 'Some string'.

getFollowingString() does not expect there to be any other XML elements between the open and close element in the stream.

getFollowingString() expects the openElement to have already been consumed from the reader at the time that it is called, and will consume the close element before returning.

If there is no character data between openElement and the matching closeElement, null will be returned.


getNextTree

public arlut.csd.Util.XMLItem getNextTree()

This method reads the next XMLItem from the reader stream and, if it is an non-empty XMLElement, will return that element as the root node of a tree of all elements contained under it. All XMLItems in the tree will be linked using the getParent() and getChildren() methods supported by every XMLItem class.

If getNextTree returns a multi-node tree, all XMLCloseElements read from the reader stream will be eaten, and will not appear in the tree returned. The XMLCloseElements are used to determine where the list of children should end, and so are implicitly captured in the tree returned. If any XMLError or XMLEndDocument items are found while searching for the completion of an open element's tree, that will be returned directly, and all items loaded from the reader in building the tree will be thrown away. XMLWarning elements will be returned at the point at which they were encountered in the tree parsing.

This method is recursive, and so may cause a StackOverflowError to be thrown if the XML under the startingItem is extremely deeply nested.

This variant of getNextItem() uses the default skipWhiteSpace setting for this XMLReader.


getNextTree

public arlut.csd.Util.XMLItem getNextTree(arlut.csd.Util.XMLItem startingItem)

This method takes an optional XMLItem and, if it is an non-empty XMLElement, will return that element as the root node of a tree of all elements contained under it. All XMLItems in the tree will be linked using the getParent() and getChildren() methods supported by every XMLItem class.

If getNextTree returns a multi-node tree, all XMLCloseElements read from the reader stream will be eaten, and will not appear in the tree returned. The XMLCloseElements are used to determine where the list of children should end, and so are implicitly captured in the tree returned. If any XMLError or XMLEndDocument items are found while searching for the completion of an open element's tree, that will be returned directly, and all items loaded from the reader in building the tree will be thrown away. XMLWarning elements will be returned at the point at which they were encountered in the tree parsing.

This method is recursive, and so may cause a StackOverflowError to be thrown if the XML under the startingItem is extremely deeply nested.

Note that the startingItem is optional, and if it is present, it must be the last XMLItem read from this XMLReader.. getNextTree() assumes that the XMLReader is primed to read the first XMLItem following the startingItem if startingItem is provided. If startingItem is not provided, getNextTree() will read the next item from the XMLReader, and make that the root of the tree returned. If the next item is not a non-empty XML element start tag, the next item will be returned by itself.

This variant of getNextItem() uses the default skipWhiteSpace setting for this XMLReader.


getNextTree

public arlut.csd.Util.XMLItem getNextTree(arlut.csd.Util.XMLItem startingItem,
                                          boolean skipWhiteSpace)

This method takes an optional XMLItem and, if it is an non-empty XMLElement, will return that element as the root node of a tree of all elements contained under it. All XMLItems in the tree will be linked using the getParent() and getChildren() methods supported by every XMLItem class.

If getNextTree returns a multi-node tree, all XMLCloseElements read from the reader stream will be eaten, and will not appear in the tree returned. The XMLCloseElements are used to determine where the list of children should end, and so are implicitly captured in the tree returned. If any XMLError or XMLEndDocument items are found while searching for the completion of an open element's tree, that will be returned directly, and all items loaded from the reader in building the tree will be thrown away. XMLWarning elements will be returned at the point at which they were encountered in the tree parsing.

This method is recursive, and so may cause a StackOverflowError to be thrown if the XML under the startingItem is extremely deeply nested.

Note that the startingItem is optional, and if it is present, it must be the last XMLItem read from this XMLReader.. getNextTree() assumes that the XMLReader is primed to read the first XMLItem following the startingItem if startingItem is provided. If startingItem is not provided, getNextTree() will read the next item from the XMLReader, and make that the root of the tree returned. If the next item is not a non-empty XML element start tag, the next item will be returned by itself.


isNextCharData

public boolean isNextCharData()

This method returns true if the next thing to be read in the input stream is non-whitespace character data rather than an open or close element tag.

Calling this method has the side effect that if the next data in the stream is a block of all-whitespace character data, that all-whitespace character data will be silently eaten.

This method goes well with getFollowingString(); you can call this method first to verify that the next data is indeed char data, then call getFollowingString() to get all of it.


close

public void close()

close() causes the XMLReader to terminate its operations as soon as possible. Once close() has been called, the background XML parser will terminate with a SAXException the next time a SAX callback is performed.


isDone

public boolean isDone()

run

public void run()
Specified by:
run in interface java.lang.Runnable

pourIntoBuffer

private final void pourIntoBuffer(arlut.csd.Util.XMLItem item)
                           throws org.xml.sax.SAXException
org.xml.sax.SAXException

completeElement

private final void completeElement()
                            throws org.xml.sax.SAXException

This is a private helper method used to move a completed halfElement XMLElement (which stays half-completed until we know whether the SAX parser will give us an immediately following close element, in which case we want to mark the halfElement as empty and eat the subsequent close) into the XMLReader's primary buffer.

org.xml.sax.SAXException

setDocumentLocator

public void setDocumentLocator(org.xml.sax.Locator locator)

The locator allows the application to determine the end position of any document-related event, even if the parser is not reporting an error. Typically, the application will use this information for reporting its own errors (such as character content that does not match an application's business rules). The information returned by the locator is probably not sufficient for use with a search engine.

Note that the locator will return correct information only during the invocation of the events in this interface. The application should not attempt to use it at any other time.

Specified by:
setDocumentLocator in interface org.xml.sax.DocumentHandler
Parameters:
locator - An object that can return the location of any SAX document event.
See Also:
Locator

startDocument

public void startDocument()
                   throws org.xml.sax.SAXException
Receive notification of the beginning of a document.

The SAX parser will invoke this method only once, before any other methods in this interface or in DTDHandler (except for setDocumentLocator).

Specified by:
startDocument in interface org.xml.sax.DocumentHandler
Throws:
org.xml.sax.SAXException - Any SAX exception, possibly wrapping another exception.

endDocument

public void endDocument()
                 throws org.xml.sax.SAXException
Receive notification of the end of a document.

The SAX parser will invoke this method only once, and it will be the last method invoked during the parse. The parser shall not invoke this method until it has either abandoned parsing (because of an unrecoverable error) or reached the end of input.

Specified by:
endDocument in interface org.xml.sax.DocumentHandler
Throws:
org.xml.sax.SAXException - Any SAX exception, possibly wrapping another exception.

startElement

public void startElement(java.lang.String name,
                         org.xml.sax.AttributeList atts)
                  throws org.xml.sax.SAXException
Receive notification of the beginning of an element.

The Parser will invoke this method at the beginning of every element in the XML document; there will be a corresponding endElement() event for every startElement() event (even when the element is empty). All of the element's content will be reported, in order, before the corresponding endElement() event.

If the element name has a namespace prefix, the prefix will still be attached. Note that the attribute list provided will contain only attributes with explicit values (specified or defaulted): #IMPLIED attributes will be omitted.

Specified by:
startElement in interface org.xml.sax.DocumentHandler
Parameters:
name - The element type name.
atts - The attributes attached to the element, if any.
Throws:
org.xml.sax.SAXException - Any SAX exception, possibly wrapping another exception.
See Also:
endElement(java.lang.String), AttributeList

endElement

public void endElement(java.lang.String name)
                throws org.xml.sax.SAXException
Receive notification of the end of an element.

The SAX parser will invoke this method at the end of every element in the XML document; there will be a corresponding startElement() event for every endElement() event (even when the element is empty).

If the element name has a namespace prefix, the prefix will still be attached to the name.

Specified by:
endElement in interface org.xml.sax.DocumentHandler
Parameters:
name - The element type name
Throws:
org.xml.sax.SAXException - Any SAX exception, possibly wrapping another exception.

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws org.xml.sax.SAXException
Receive notification of character data.

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information.

The application must not attempt to read from the array outside of the specified range.

Note that some parsers will report whitespace using the ignorableWhitespace() method rather than this one (validating parsers must do so).

Specified by:
characters in interface org.xml.sax.DocumentHandler
Parameters:
ch - The characters from the XML document.
start - The start position in the array.
length - The number of characters to read from the array.
Throws:
org.xml.sax.SAXException - Any SAX exception, possibly wrapping another exception.
See Also:
ignorableWhitespace(char[], int, int), Locator

ignorableWhitespace

public void ignorableWhitespace(char[] ch,
                                int start,
                                int length)
                         throws org.xml.sax.SAXException
Receive notification of ignorable whitespace in element content.

Validating Parsers must use this method to report each chunk of ignorable whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating parsers may also use this method if they are capable of parsing and using content models.

SAX parsers may return all contiguous whitespace in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information.

The application must not attempt to read from the array outside of the specified range.

Specified by:
ignorableWhitespace in interface org.xml.sax.DocumentHandler
Parameters:
ch - The characters from the XML document.
start - The start position in the array.
length - The number of characters to read from the array.
Throws:
org.xml.sax.SAXException - Any SAX exception, possibly wrapping another exception.
See Also:
characters(char[], int, int)

processingInstruction

public void processingInstruction(java.lang.String target,
                                  java.lang.String data)
                           throws org.xml.sax.SAXException
Receive notification of a processing instruction.

The Parser will invoke this method once for each processing instruction found: note that processing instructions may occur before or after the main document element.

A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a text declaration (XML 1.0, section 4.3.1) using this method.

Specified by:
processingInstruction in interface org.xml.sax.DocumentHandler
Parameters:
target - The processing instruction target.
data - The processing instruction data, or null if none was supplied.
Throws:
org.xml.sax.SAXException - Any SAX exception, possibly wrapping another exception.

warning

public void warning(org.xml.sax.SAXParseException exception)
             throws org.xml.sax.SAXException
Receive notification of a warning.

SAX parsers will use this method to report conditions that are not errors or fatal errors as defined by the XML 1.0 recommendation. The default behaviour is to take no action.

The SAX parser must continue to provide normal parsing events after invoking this method: it should still be possible for the application to process the document through to the end.

Specified by:
warning in interface org.xml.sax.ErrorHandler
Parameters:
exception - The warning information encapsulated in a SAX parse exception.
Throws:
org.xml.sax.SAXException - Any SAX exception, possibly wrapping another exception.
See Also:
SAXParseException

error

public void error(org.xml.sax.SAXParseException exception)
           throws org.xml.sax.SAXException
Receive notification of a recoverable error.

This corresponds to the definition of "error" in section 1.2 of the W3C XML 1.0 Recommendation. For example, a validating parser would use this callback to report the violation of a validity constraint. The default behaviour is to take no action.

The SAX parser must continue to provide normal parsing events after invoking this method: it should still be possible for the application to process the document through to the end. If the application cannot do so, then the parser should report a fatal error even if the XML 1.0 recommendation does not require it to do so.

Specified by:
error in interface org.xml.sax.ErrorHandler
Parameters:
exception - The error information encapsulated in a SAX parse exception.
Throws:
org.xml.sax.SAXException - Any SAX exception, possibly wrapping another exception.
See Also:
SAXParseException

fatalError

public void fatalError(org.xml.sax.SAXParseException exception)
                throws org.xml.sax.SAXException
Receive notification of a non-recoverable error.

This corresponds to the definition of "fatal error" in section 1.2 of the W3C XML 1.0 Recommendation. For example, a parser would use this callback to report the violation of a well-formedness constraint.

The application must assume that the document is unusable after the parser has invoked this method, and should continue (if at all) only for the sake of collecting addition error messages: in fact, SAX parsers are free to stop reporting any other events once this method has been invoked.

Specified by:
fatalError in interface org.xml.sax.ErrorHandler
Parameters:
exception - The error information encapsulated in a SAX parse exception.
Throws:
org.xml.sax.SAXException - Any SAX exception, possibly wrapping another exception.
See Also:
SAXParseException

enqueue

private void enqueue(arlut.csd.Util.XMLItem item)
              throws java.lang.InterruptedException
private enqueue method. Will block on the internal XMLItem buffer if the circular buffer is full.

java.lang.InterruptedException

dequeue

private arlut.csd.Util.XMLItem dequeue()
private dequeue method. assumes that the calling code will check bounds.