edu.splat.wikicat.xmlparsers
Class ExtractArticlesWithCategory

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by edu.splat.wikicat.xmlparsers.ExtractArticlesWithCategory
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

public class ExtractArticlesWithCategory
extends org.xml.sax.helpers.DefaultHandler

Given a file containing categories with one category per line, extracts articles with at least one of the categories. Writes out each article found on a separate line.

Usage: java edu.splat.wikicat.xmlparsers.ExtractArticlesWithCategory wiki.xml categories.file output.file

Author:
mhart

Constructor Summary
ExtractArticlesWithCategory(java.io.File output, java.util.Vector<java.lang.String> categories)
          Creates an instance of this parser
 
Method Summary
 void characters(char[] ch, int start, int length)
           
 void endDocument()
           
 void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)
           
static void main(java.lang.String[] args)
          Run program
static java.lang.String normalizeCategory(java.lang.String text)
          Normalizes the category by making it lowercase and replacing _ with spaces
 void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes attributes)
           
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ExtractArticlesWithCategory

public ExtractArticlesWithCategory(java.io.File output,
                                   java.util.Vector<java.lang.String> categories)
                            throws java.io.IOException
Creates an instance of this parser

Parameters:
output -
categories -
Throws:
java.io.IOException
Method Detail

normalizeCategory

public static java.lang.String normalizeCategory(java.lang.String text)
Normalizes the category by making it lowercase and replacing _ with spaces

Parameters:
text -
Returns:

startElement

public void startElement(java.lang.String uri,
                         java.lang.String localName,
                         java.lang.String qName,
                         org.xml.sax.Attributes attributes)
                  throws org.xml.sax.SAXException
Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String uri,
                       java.lang.String localName,
                       java.lang.String qName)
                throws org.xml.sax.SAXException
Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler

endDocument

public void endDocument()
Specified by:
endDocument in interface org.xml.sax.ContentHandler
Overrides:
endDocument in class org.xml.sax.helpers.DefaultHandler

main

public static void main(java.lang.String[] args)
Run program

Parameters:
args -