edu.splat.wikicat.xmlparsers
Class CategoryExtractor
java.lang.Object
org.xml.sax.helpers.DefaultHandler
edu.splat.wikicat.xmlparsers.CategoryExtractor
- All Implemented Interfaces:
- org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler
public class CategoryExtractor
- extends org.xml.sax.helpers.DefaultHandler
Extracts categories from Wikipedia by analyzing Category articles and then outputing
"cat1\tcat2" where cat1 is a parent of cat2.
For right now, the output prints out to the command line
so please redirect the output into a file.
Usage: java edu.splat.wikicat.xmlparsers.CategoryExtractor wiki.dump.xml
- Author:
- mhart
Method Summary |
void |
characters(char[] ch,
int start,
int length)
|
void |
endElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName)
|
static void |
main(java.lang.String[] args)
|
void |
startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
org.xml.sax.Attributes attributes)
|
Methods inherited from class org.xml.sax.helpers.DefaultHandler |
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CategoryExtractor
public CategoryExtractor()
startElement
public void startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
org.xml.sax.Attributes attributes)
throws org.xml.sax.SAXException
- Specified by:
startElement
in interface org.xml.sax.ContentHandler
- Overrides:
startElement
in class org.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
endElement
public void endElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName)
throws org.xml.sax.SAXException
- Specified by:
endElement
in interface org.xml.sax.ContentHandler
- Overrides:
endElement
in class org.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
characters
public void characters(char[] ch,
int start,
int length)
- Specified by:
characters
in interface org.xml.sax.ContentHandler
- Overrides:
characters
in class org.xml.sax.helpers.DefaultHandler
main
public static void main(java.lang.String[] args)