|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectnet.sourceforge.apphere.util.html.HtmlStreamTokenizer
HtmlStreamTokenizer is an HTML parser that is similar to the StreamTokenizer class but is specialized for HTML streams. This class is useful when you need to parse the structure of an HTML document.
import adc.parser.*;HtmlStreamTokenizer tok = new HtmlStreamTokenizer(inputstream); HtmlTag tag = new HtmlTag(); while (tok.nextToken() != HtmlStreamTokenizer.TT_EOF) { int ttype = tok.getTokenType(); if (ttype == HtmlStreamTokenizer.TT_TAG) { tok.parseTag(tok.getStringValue(), tag); System.out.println("tag: " + tag.toString()); } else if (ttype == HtmlStreamTokenizer.TT_TEXT) { System.out.println("text: " + tok.getStringValue()); } else if (ttype == HtmlStreamTokenizer.TT_COMMENT) { System.out.println("comment: <!--" + tok.getStringValue() + "-->"); } }
One of the motivations for designing parseTag() to take an HtmlTag argument rather than having parseTag() return a newly created HtmlTag is so you can create your own tag class derived from HtmlTag.
adc.parser.HtmlTag
,
adc.parser.Table
Field Summary | |
static int |
TT_COMMENT
comment token. |
static int |
TT_EOF
end of stream. |
static int |
TT_TAG
tag token. |
static int |
TT_TEXT
text token. |
Constructor Summary | |
HtmlStreamTokenizer(java.io.InputStream in)
|
|
HtmlStreamTokenizer(java.io.Reader in)
|
Method Summary | |
int |
getLineNumber()
|
java.lang.StringBuffer |
getStringValue()
|
int |
getTokenType()
|
java.lang.StringBuffer |
getWhiteSpace()
|
int |
nextToken()
|
static void |
parseTag(java.lang.StringBuffer sbuf,
HtmlTag tag)
The reason this function takes an HtmlTag argument rather than returning a newly created HtmlTag object is so that you can create your own tag class derived from HtmlTag if desired. |
static java.lang.String |
unescape(java.lang.String buf)
Replaces HTML escape sequences with its character equivalent, e.g. |
static void |
unescape(java.lang.StringBuffer buf)
Replaces HTML escape sequences with its character equivalent, e.g. |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final int TT_EOF
public static final int TT_TEXT
public static final int TT_TAG
public static final int TT_COMMENT
Constructor Detail |
public HtmlStreamTokenizer(java.io.Reader in)
in
- input readerpublic HtmlStreamTokenizer(java.io.InputStream in)
in
- input streamMethod Detail |
public final int getTokenType()
public final java.lang.StringBuffer getStringValue()
public final java.lang.StringBuffer getWhiteSpace()
public int getLineNumber()
public int nextToken() throws java.io.IOException
java.io.IOException
- if error reading input stream.public static void parseTag(java.lang.StringBuffer sbuf, HtmlTag tag) throws HtmlException
sbuf
- text buffer to parsetag
- parse the text buffer and store the result in this object
HtmlException
- if malformed tag.public static java.lang.String unescape(java.lang.String buf)
buf
- text buffer to unescape
public static void unescape(java.lang.StringBuffer buf)
buf
- will remove all HTML escape sequences from this buffer
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |