HTML Parser Home Page

Serialized Form


Package org.htmlparser

Class org.htmlparser.AbstractNode extends Object implements Serializable

Serialized Fields

mPage

Page mPage
The page this node came from.


nodeBegin

int nodeBegin
The beginning position of the tag in the line


nodeEnd

int nodeEnd
The ending position of the tag in the line


parent

Node parent
The parent of this node.


children

NodeList children
The children of this node.

Class org.htmlparser.Parser extends Object implements Serializable

Serialized Fields

mFeedback

ParserFeedback mFeedback
Feedback object.


mLexer

Lexer mLexer
The html lexer associated with this parser.

Class org.htmlparser.PrototypicalNodeFactory extends Object implements Serializable

Serialized Fields

mBlastocyst

Map mBlastocyst
The list of tags to return at the top level. The list is keyed by tag name.

Class org.htmlparser.RemarkNode extends RemarkNode implements Serializable

Class org.htmlparser.StringNode extends StringNode implements Serializable

Class org.htmlparser.StringNodeFactory extends PrototypicalNodeFactory implements Serializable

Serialized Fields

mDecode

boolean mDecode
Flag to tell the parser to decode strings returned by StringNode's toPlainTextString. Decoding occurs via the method, org.htmlparser.util.Translate.decode()


mRemoveEscapes

boolean mRemoveEscapes
Flag to tell the parser to remove escape characters, like \n and \t, returned by StringNode's toPlainTextString. Escape character removal occurs via the method, org.htmlparser.util.ParserUtils.removeEscapeCharacters()


mConvertNonBreakingSpaces

boolean mConvertNonBreakingSpaces
Flag to tell the parser to convert non breaking space (from   to a space " "). If true, this will happen inside StringNode's toPlainTextString.


Package org.htmlparser.beans

Class org.htmlparser.beans.BeanyBaby extends JFrame implements Serializable

Serialized Fields

mTrail

Vector mTrail
Bread crumb trail of visited URLs.


mCrumb

int mCrumb
Current position on the bread crumb trail.


mLinkBean

HTMLLinkBean mLinkBean

mForward

JMenuItem mForward

mBack

JMenuItem mBack

mCollapse

JCheckBoxMenuItem mCollapse

mTextField

JTextField mTextField

mSplitPane

JSplitPane mSplitPane

mLinks

JCheckBoxMenuItem mLinks

mStringBean

HTMLTextBean mStringBean

mNobreak

JCheckBoxMenuItem mNobreak

Class org.htmlparser.beans.HTMLLinkBean extends JList implements Serializable

Serialized Fields

mBean

LinkBean mBean
The underlying bean that provides our htmlparser specific properties.

Class org.htmlparser.beans.HTMLTextBean extends JTextArea implements Serializable

Serialized Fields

mBean

StringBean mBean
The underlying bean that provides our htmlparser specific properties.

Class org.htmlparser.beans.LinkBean extends Object implements Serializable

Serialized Fields

mPropertySupport

PropertyChangeSupport mPropertySupport
Bound property support.


mLinks

URL[] mLinks
The strings extracted from the URL.


mParser

Parser mParser
The parser used to extract strings.

Class org.htmlparser.beans.StringBean extends NodeVisitor implements Serializable

Serialized Fields

mPropertySupport

PropertyChangeSupport mPropertySupport
Bound property support.


mParser

Parser mParser
The parser used to extract strings.


mStrings

String mStrings
The strings extracted from the URL.


mLinks

boolean mLinks
If true the link URLs are embedded in the text output.


mReplaceSpace

boolean mReplaceSpace
If true regular space characters are substituted for non-breaking spaces in the text output.


mCollapse

boolean mCollapse
If true sequences of whitespace characters are replaced with a single space character.


mBuffer

StringBuffer mBuffer
The buffer text is stored in while traversing the HTML.


mIsScript

boolean mIsScript
Set true when traversing a SCRIPT tag.


mIsPre

boolean mIsPre
Set true when traversing a PRE tag.


mIsStyle

boolean mIsStyle
Set true when traversing a STYLE tag.


Package org.htmlparser.lexer

Class org.htmlparser.lexer.Cursor extends Object implements Serializable

Serialized Fields

mPosition

int mPosition
This cursor's position.


mPage

Page mPage
This cursor's page.

Class org.htmlparser.lexer.Lexer extends Object implements Serializable

Serialized Fields

mPage

Page mPage
The page lexemes are retrieved from.


mCursor

Cursor mCursor
The current position on the page.


mFactory

NodeFactory mFactory
The factory for new nodes.

Class org.htmlparser.lexer.Page extends Object implements Serializable

Serialization Methods

readObject

private void readObject(ObjectInputStream in)
                 throws IOException,
                        ClassNotFoundException
Deserialize the page. For details see writeObject().


writeObject

private void writeObject(ObjectOutputStream out)
                  throws IOException
Serialize the page. There are two modes to serializing a page based on the connected state. If connected, the URL and the current offset is saved, while if disconnected, the underling source is saved.

Serialized Fields

mUrl

String mUrl
The URL this page is coming from. Cached value of getConnection().toExternalForm() or setUrl().


mSource

Source mSource
The source of characters.


mIndex

PageIndex mIndex
Character positions of the first character in each line.


mProcessor

LinkProcessor mProcessor
The processor of relative links on this page. Holds any overridden base HREF.

Class org.htmlparser.lexer.PageIndex extends Object implements Serializable

Serialized Fields

mCount

int mCount
The number of valid elements.


mIndices

int[] mIndices
The elements.


mPage

Page mPage
The page associated with this index.

Class org.htmlparser.lexer.Source extends Reader implements Serializable

Serialization Methods

readObject

private void readObject(ObjectInputStream in)
                 throws IOException,
                        ClassNotFoundException

writeObject

private void writeObject(ObjectOutputStream out)
                  throws IOException
Serialized Fields

mEncoding

String mEncoding
The character set in use.


mBuffer

char[] mBuffer
The characters read so far.


mLevel

int mLevel
The number of valid bytes in the buffer.


mOffset

int mOffset
The offset of the next byte returned by read().


mMark

int mMark
The bookmark.


Package org.htmlparser.lexer.nodes

Class org.htmlparser.lexer.nodes.Attribute extends Object implements Serializable

Serialized Fields

mName

String mName
The name of this attribute. The part before the equals sign, or the stand-alone attribute. This will be null if the attribute is whitespace.


mAssignment

String mAssignment
The assignment string of the attribute. The equals sign. This will be null if the attribute is a stand-alone attribute.


mValue

String mValue
The value of the attribute. The part after the equals sign. This will be null if the attribute is an empty or stand-alone attribute.


mQuote

char mQuote
The quote, if any, surrounding the value of the attribute, if any. This will be zero if there are no quotes around the value.

Class org.htmlparser.lexer.nodes.PageAttribute extends Attribute implements Serializable

Serialized Fields

mPage

Page mPage
The page this attribute is extracted from.


mNameStart

int mNameStart
The starting offset of the name within the page. If negative, the name is considered null.


mNameEnd

int mNameEnd
The ending offset of the name within the page.


mValueStart

int mValueStart
The starting offset of the value within the page. If negative, the value is considered null.


mValueEnd

int mValueEnd
The ending offset of the name within the page.

Class org.htmlparser.lexer.nodes.RemarkNode extends AbstractNode implements Serializable

Serialized Fields

mText

String mText
The contents of the remark node, or override text.

Class org.htmlparser.lexer.nodes.StringNode extends AbstractNode implements Serializable

Serialized Fields

mText

String mText
The contents of the string node, or override text.

Class org.htmlparser.lexer.nodes.TagNode extends AbstractNode implements Serializable

Serialized Fields

mAttributes

Vector mAttributes
The tag attributes. Objects of type Attribute.


Package org.htmlparser.lexerapplications.thumbelina

Class org.htmlparser.lexerapplications.thumbelina.Picture extends Rectangle implements Serializable

Serialized Fields

mURL

URL mURL
The URL for the picture.


mImage

Image mImage
The image for the picture.


mOrigin

Point mOrigin
The upper left hand corner of the image. This doesn't change, even if the image is cropped. For example, if the left half of the image is obscured by another, the Rectangle fields x, y, width and height will change, but the origin remains the same.

Class org.htmlparser.lexerapplications.thumbelina.PicturePanel extends JPanel implements Serializable

Serialized Fields

mThumbelina

Thumbelina mThumbelina
The thumbelina object in use.


mMosaic

TileSet mMosaic
The display mosaic.


mPreferredSize

Dimension mPreferredSize
The preferred size of this component. null initially, caches the results of calculatePreferredSize ().

Class org.htmlparser.lexerapplications.thumbelina.Thumbelina extends JPanel implements Serializable

Serialized Fields

mUrls

ArrayList mUrls
URL's to visit.


mVisited

HashMap mVisited
URL's visited.


mRequested

HashMap mRequested
Images requested.


mTracked

HashMap mTracked
Images being tracked currently.


mThread

Thread mThread
Background thread.


mActive

boolean mActive
Activity state. true means processing URLS, false not.


mSequencer

Sequencer mSequencer
The picture sequencer.


mPicturePanel

PicturePanel mPicturePanel
The central area for pictures.


mPropertySupport

PropertyChangeSupport mPropertySupport
Bound property support.


mCurrentURL

String mCurrentURL
The URL being currently being examined.


mDiscardCGI

boolean mDiscardCGI
If true, does not follow links containing cgi calls.


mDiscardQueries

boolean mDiscardQueries
If true, does not follow links containing queries (?).


mBackgroundToggle

JCheckBox mBackgroundToggle
Background thread checkbox in status bar.


mHistory

JList mHistory
History list.


mPicturePanelScroller

JScrollPane mPicturePanelScroller
Scroller for the picture panel.


mHistoryScroller

JScrollPane mHistoryScroller
Scroller for the history list.


mMainArea

JSplitPane mMainArea
Main panel in central area.


mPowerBar

JPanel mPowerBar
Status bar.


mQueueProgress

JProgressBar mQueueProgress
Image request queue monitor in status bar.


mReadyProgress

JProgressBar mReadyProgress
Image ready queue monitor in status bar.


mRunToggle

JCheckBox mRunToggle
Sequencer thread toggle in status bar.


mSpeedSlider

JSlider mSpeedSlider
Sequencer speed slider in status bar.


mUrlText

JTextField mUrlText
URL report in status bar.


mQueueSize

JLabel mQueueSize
URL queue size display in status bar.


mVisitedSize

JLabel mVisitedSize
URL visited count display in status bar.

Class org.htmlparser.lexerapplications.thumbelina.ThumbelinaFrame extends JFrame implements Serializable

Serialized Fields

mMenu

JMenuBar mMenu
Main menu.


mURL

JMenu mURL
URL submenu.


mOpen

JMenuItem mOpen
Open menu item.


mGoogle

JMenuItem mGoogle
Google menu item.


mSeparator1

JSeparator mSeparator1
MRU list separator #1.


mSeparator2

JSeparator mSeparator2
MRU list separator #2.


mExit

JMenuItem mExit
Exit menu item.


mView

JMenu mView
View submenu.


mStatusVisible

JCheckBoxMenuItem mStatusVisible
Status bar visible menu item.


mHistoryVisible

JCheckBoxMenuItem mHistoryVisible
History list visible menu item.


mCommand

JMenu mCommand
Vommand menu.


mReset

JMenuItem mReset
Reset menu item.


mClear

JMenuItem mClear
Clear menu item


mHelp

JMenu mHelp
Help submenu.


mAbout

JMenuItem mAbout
About menu item.


Package org.htmlparser.scanners

Class org.htmlparser.scanners.CompositeTagScanner extends TagScanner implements Serializable

Class org.htmlparser.scanners.JspScanner extends TagScanner implements Serializable

Class org.htmlparser.scanners.ScriptScanner extends CompositeTagScanner implements Serializable

Class org.htmlparser.scanners.StyleScanner extends CompositeTagScanner implements Serializable

Class org.htmlparser.scanners.TagScanner extends Object implements Serializable


Package org.htmlparser.tags

Class org.htmlparser.tags.AppletTag extends CompositeTag implements Serializable

Class org.htmlparser.tags.BaseHrefTag extends Tag implements Serializable

Class org.htmlparser.tags.BodyTag extends CompositeTag implements Serializable

Class org.htmlparser.tags.Bullet extends CompositeTag implements Serializable

Class org.htmlparser.tags.BulletList extends CompositeTag implements Serializable

Class org.htmlparser.tags.CompositeTag extends Tag implements Serializable

Serialized Fields

mEndTag

TagNode mEndTag
The tag that causes this tag to finish. May be a virtual tag generated by the scanning logic.

Class org.htmlparser.tags.Div extends CompositeTag implements Serializable

Class org.htmlparser.tags.DoctypeTag extends Tag implements Serializable

Class org.htmlparser.tags.FormTag extends CompositeTag implements Serializable

Serialized Fields

mFormLocation

String mFormLocation
This is the derived form location, based on action.

Class org.htmlparser.tags.FrameSetTag extends CompositeTag implements Serializable

Class org.htmlparser.tags.FrameTag extends Tag implements Serializable

Class org.htmlparser.tags.HeadTag extends CompositeTag implements Serializable

Class org.htmlparser.tags.Html extends CompositeTag implements Serializable

Class org.htmlparser.tags.ImageTag extends Tag implements Serializable

Serialized Fields

imageURL

String imageURL
Holds the set value of the SRC attribute, since this can differ from the attribute value due to relative references resolved by the scanner.

Class org.htmlparser.tags.InputTag extends Tag implements Serializable

Class org.htmlparser.tags.JspTag extends Tag implements Serializable

Class org.htmlparser.tags.LabelTag extends CompositeTag implements Serializable

Class org.htmlparser.tags.LinkTag extends CompositeTag implements Serializable

Serialized Fields

mLink

String mLink
The URL where the link points to


mailLink

boolean mailLink
Set to true when the link was a mailto: URL.


javascriptLink

boolean javascriptLink
Set to true when the link was a javascript: URL.

Class org.htmlparser.tags.MetaTag extends Tag implements Serializable

Class org.htmlparser.tags.OptionTag extends CompositeTag implements Serializable

Class org.htmlparser.tags.ScriptTag extends CompositeTag implements Serializable

Serialized Fields

mCode

String mCode
Script code if different from the page contents.

Class org.htmlparser.tags.SelectTag extends CompositeTag implements Serializable

Class org.htmlparser.tags.Span extends CompositeTag implements Serializable

Class org.htmlparser.tags.StyleTag extends CompositeTag implements Serializable

Class org.htmlparser.tags.TableColumn extends CompositeTag implements Serializable

Class org.htmlparser.tags.TableHeader extends CompositeTag implements Serializable

Class org.htmlparser.tags.TableRow extends CompositeTag implements Serializable

Class org.htmlparser.tags.TableTag extends CompositeTag implements Serializable

Class org.htmlparser.tags.Tag extends TagNode implements Serializable

Serialized Fields

mScanner

TagScanner mScanner
The scanner for this tag.

Class org.htmlparser.tags.TextareaTag extends CompositeTag implements Serializable

Class org.htmlparser.tags.TitleTag extends CompositeTag implements Serializable


Package org.htmlparser.util

Class org.htmlparser.util.ChainedException extends Exception implements Serializable

Serialized Fields

throwable

Throwable throwable

Class org.htmlparser.util.CharacterReference extends Object implements Serializable

Serialized Fields

mCharacter

int mCharacter
The character value as an integer.


mKernel

String mKernel
This entity reference kernel. The text between the ampersand and the semicolon.

Class org.htmlparser.util.DefaultParserFeedback extends Object implements Serializable

Serialized Fields

mMode

int mMode
Verbosity level. Corresponds to constructor arguments:
   DEBUG = 2;
   NORMAL = 1;
   QUIET = 0;
 

Class org.htmlparser.util.EncodingChangeException extends ParserException implements Serializable

Class org.htmlparser.util.LinkProcessor extends Object implements Serializable

Serialized Fields

baseUrl

String baseUrl
Overriding base URL. If set, this is used instead of a provided base URL in extract().

Class org.htmlparser.util.NodeList extends Object implements Serializable

Serialized Fields

nodeData

Node[] nodeData

size

int size

capacity

int capacity

capacityIncrement

int capacityIncrement

numberOfAdjustments

int numberOfAdjustments

Class org.htmlparser.util.ParserException extends ChainedException implements Serializable

Class org.htmlparser.util.SpecialHashtable extends Hashtable implements Serializable


© 2004 Somik Raha
Mar 14, 2004

HTML Parser is an open source library released under LGPL.
SourceForge.net