edu.splat.wikicat.xmlparsers
Class CategoryExtractor

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by edu.splat.wikicat.xmlparsers.CategoryExtractor
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

public class CategoryExtractor
extends org.xml.sax.helpers.DefaultHandler

Extracts categories from Wikipedia by analyzing Category articles and then outputing "cat1\tcat2" where cat1 is a parent of cat2.

For right now, the output prints out to the command line so please redirect the output into a file.

Usage: java edu.splat.wikicat.xmlparsers.CategoryExtractor wiki.dump.xml

Author:
mhart

Constructor Summary
CategoryExtractor()
           
 
Method Summary
 void characters(char[] ch, int start, int length)
           
 void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)
           
static void main(java.lang.String[] args)
           
 void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes attributes)
           
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CategoryExtractor

public CategoryExtractor()
Method Detail

startElement

public void startElement(java.lang.String uri,
                         java.lang.String localName,
                         java.lang.String qName,
                         org.xml.sax.Attributes attributes)
                  throws org.xml.sax.SAXException
Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String uri,
                       java.lang.String localName,
                       java.lang.String qName)
                throws org.xml.sax.SAXException
Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler

main

public static void main(java.lang.String[] args)