org.alfresco.repo.content.transform
Class TextMiningContentTransformer
java.lang.Object
org.alfresco.repo.content.transform.ContentTransformerHelper
org.alfresco.repo.content.transform.AbstractContentTransformer2
org.alfresco.repo.content.transform.TextMiningContentTransformer
- All Implemented Interfaces:
- ContentWorker, ContentTransformer
public class TextMiningContentTransformer
- extends AbstractContentTransformer2
This badly named transformer turns Microsoft Word documents
(Word 6, 95, 97, 2000, 2003) into plain text.
Doesn't currently use Apache Tika
to
do this, pending TIKA-408. When Apache POI 3.7 beta 2 has been
released, we can switch to Tika and then handle Word 6,
Word 95, Word 97, 2000, 2003, 2007 and 2010 formats.
TODO Switch to Tika in November 2010 once 3.4 is out
Method Summary |
boolean |
isTransformable(java.lang.String sourceMimetype,
java.lang.String targetMimetype,
TransformationOptions options)
Currently the only transformation performed is that of text extraction from Word documents. |
void |
transformInternal(org.alfresco.service.cmr.repository.ContentReader reader,
org.alfresco.service.cmr.repository.ContentWriter writer,
TransformationOptions options)
Method to be implemented by subclasses wishing to make use of the common infrastructural code
provided by this class. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
TextMiningContentTransformer
public TextMiningContentTransformer()
isTransformable
public boolean isTransformable(java.lang.String sourceMimetype,
java.lang.String targetMimetype,
TransformationOptions options)
- Currently the only transformation performed is that of text extraction from Word documents.
- Parameters:
sourceMimetype
- the source mimetypeoptions
- the transformation options
- Returns:
- boolean true if this content transformer can satify the mimetypes and options specified, false otherwise
transformInternal
public void transformInternal(org.alfresco.service.cmr.repository.ContentReader reader,
org.alfresco.service.cmr.repository.ContentWriter writer,
TransformationOptions options)
throws java.lang.Exception
- Description copied from class:
AbstractContentTransformer2
- Method to be implemented by subclasses wishing to make use of the common infrastructural code
provided by this class.
- Specified by:
transformInternal
in class AbstractContentTransformer2
- Parameters:
reader
- the source of the content to transformwriter
- the target to which to write the transformed contentoptions
- a map of options to use when performing the transformation. The map
will never be null.
- Throws:
java.lang.Exception
- exceptions will be handled by this class - subclasses can throw anything
Copyright © 2005 - 2010 Alfresco Software, Inc. All Rights Reserved.