edu.splat.wikicat.xmlparsers
Class ExtractArticlesWithCategory
java.lang.Object
org.xml.sax.helpers.DefaultHandler
edu.splat.wikicat.xmlparsers.ExtractArticlesWithCategory
- All Implemented Interfaces:
- org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler
public class ExtractArticlesWithCategory
- extends org.xml.sax.helpers.DefaultHandler
Given a file containing categories with one category per line, extracts articles
with at least one of the categories. Writes out each article found on a separate
line.
Usage: java edu.splat.wikicat.xmlparsers.ExtractArticlesWithCategory wiki.xml categories.file output.file
- Author:
- mhart
Constructor Summary |
ExtractArticlesWithCategory(java.io.File output,
java.util.Vector<java.lang.String> categories)
Creates an instance of this parser |
Method Summary |
void |
characters(char[] ch,
int start,
int length)
|
void |
endDocument()
|
void |
endElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName)
|
static void |
main(java.lang.String[] args)
Run program |
static java.lang.String |
normalizeCategory(java.lang.String text)
Normalizes the category by making it lowercase and replacing _ with spaces |
void |
startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
org.xml.sax.Attributes attributes)
|
Methods inherited from class org.xml.sax.helpers.DefaultHandler |
endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ExtractArticlesWithCategory
public ExtractArticlesWithCategory(java.io.File output,
java.util.Vector<java.lang.String> categories)
throws java.io.IOException
- Creates an instance of this parser
- Parameters:
output
- categories
-
- Throws:
java.io.IOException
normalizeCategory
public static java.lang.String normalizeCategory(java.lang.String text)
- Normalizes the category by making it lowercase and replacing _ with spaces
- Parameters:
text
-
- Returns:
startElement
public void startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
org.xml.sax.Attributes attributes)
throws org.xml.sax.SAXException
- Specified by:
startElement
in interface org.xml.sax.ContentHandler
- Overrides:
startElement
in class org.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
endElement
public void endElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName)
throws org.xml.sax.SAXException
- Specified by:
endElement
in interface org.xml.sax.ContentHandler
- Overrides:
endElement
in class org.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
characters
public void characters(char[] ch,
int start,
int length)
- Specified by:
characters
in interface org.xml.sax.ContentHandler
- Overrides:
characters
in class org.xml.sax.helpers.DefaultHandler
endDocument
public void endDocument()
- Specified by:
endDocument
in interface org.xml.sax.ContentHandler
- Overrides:
endDocument
in class org.xml.sax.helpers.DefaultHandler
main
public static void main(java.lang.String[] args)
- Run program
- Parameters:
args
-