|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objecthu.midori.kosmos.server.util.WebCrawlingUtils
public class WebCrawlingUtils
Utility methods for web crawling.
Constructor Summary | |
---|---|
protected |
WebCrawlingUtils()
This class should never be instantiated. |
Method Summary | |
---|---|
static org.w3c.dom.Document |
downloadHtmlDom(java.net.URL url)
Downloads and tidies up an HTML document from the given URL and returns it as DOM. |
static org.w3c.dom.Document |
downloadXmlDom(java.net.URL url)
Downloads an XML document from the given URL and returns it as DOM. |
static java.lang.String |
eliminateEmptyValues(java.lang.String value)
Eliminates the empty items from a scraped value string to make the tokenizer happy. |
static org.w3c.dom.Node |
findDomNodeByAttribute(org.w3c.dom.NodeList nodes,
java.lang.String attribName,
java.lang.String attribValue)
Returns the node with the given attribute value from the given list or null if not found. |
static org.w3c.dom.Document |
parseStringDom(java.lang.String xmlString)
Parses an XML document from the given string. |
static java.util.List |
runXQuery(org.w3c.dom.Document dom,
java.lang.String query)
Runs an XQuery on the given DOM and returns the full result. |
static int |
runXQueryInt(org.w3c.dom.Document dom,
java.lang.String query)
Runs an XQuery on the given DOM and returns a single int as result. |
static java.lang.String |
runXQueryString(org.w3c.dom.Document dom,
java.lang.String query)
Runs an XQuery on the given DOM and returns a single String as result. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
protected WebCrawlingUtils()
Method Detail |
---|
public static org.w3c.dom.Document parseStringDom(java.lang.String xmlString) throws java.lang.Exception
java.lang.Exception
public static org.w3c.dom.Document downloadXmlDom(java.net.URL url) throws java.lang.Exception
java.lang.Exception
public static org.w3c.dom.Document downloadHtmlDom(java.net.URL url) throws java.lang.Exception
java.lang.Exception
public static org.w3c.dom.Node findDomNodeByAttribute(org.w3c.dom.NodeList nodes, java.lang.String attribName, java.lang.String attribValue)
null
if not found.
public static java.util.List runXQuery(org.w3c.dom.Document dom, java.lang.String query) throws net.sf.saxon.trans.XPathException
net.sf.saxon.trans.XPathException
public static int runXQueryInt(org.w3c.dom.Document dom, java.lang.String query) throws net.sf.saxon.trans.XPathException
net.sf.saxon.trans.XPathException
public static java.lang.String runXQueryString(org.w3c.dom.Document dom, java.lang.String query) throws net.sf.saxon.trans.XPathException
net.sf.saxon.trans.XPathException
public static java.lang.String eliminateEmptyValues(java.lang.String value)
||xxx|
will be transformed to | |xxx|
.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |