be.re.repo.mod
Class ZippedDocumentTextExtractor
java.lang.Object
be.re.repo.mod.ZippedDocumentTextExtractor
public class ZippedDocumentTextExtractor
- extends Object
Implements a mechanism to extract text from zipped documents containing XML
entities. Possible formats are ODF, ePub, Office Open XML, etc. The
documents are processed in a streaming-oriented fashion.
- Author:
- Werner Donné
ZippedDocumentTextExtractor
public ZippedDocumentTextExtractor()
create
public static Reader create(InputStream in,
ZippedDocumentTextExtractor.FilterFactory filterFactory,
String[] entryPatterns)
throws IOException
- Retrieves text from a document.
- Parameters:
in
- the original document stream.filterFactory
- a factory to create a filter that is selective about
which elements contribute to the text or that can transform the text. It
may be null
.entryPatterns
- the regular expressions that select the ZIP-entries
based on their name. If the array is empty no entries will be selected at
all.
- Returns:
- The extracted text stream.
- Throws:
IOException