org.alfresco.repo.content.transform
Class TextMiningContentTransformer

java.lang.Object
  extended by org.alfresco.repo.content.transform.ContentTransformerHelper
      extended by org.alfresco.repo.content.transform.AbstractContentTransformer2
          extended by org.alfresco.repo.content.transform.TextMiningContentTransformer
All Implemented Interfaces:
ContentWorker, ContentTransformer

public class TextMiningContentTransformer
extends AbstractContentTransformer2

This badly named transformer turns Microsoft Word documents (Word 6, 95, 97, 2000, 2003) into plain text. Doesn't currently use Apache Tika to do this, pending TIKA-408. When Apache POI 3.7 beta 2 has been released, we can switch to Tika and then handle Word 6, Word 95, Word 97, 2000, 2003, 2007 and 2010 formats. TODO Switch to Tika in November 2010 once 3.4 is out


Constructor Summary
TextMiningContentTransformer()
           
 
Method Summary
 boolean isTransformable(java.lang.String sourceMimetype, java.lang.String targetMimetype, TransformationOptions options)
          Currently the only transformation performed is that of text extraction from Word documents.
 void transformInternal(org.alfresco.service.cmr.repository.ContentReader reader, org.alfresco.service.cmr.repository.ContentWriter writer, TransformationOptions options)
          Method to be implemented by subclasses wishing to make use of the common infrastructural code provided by this class.
 
Methods inherited from class org.alfresco.repo.content.transform.AbstractContentTransformer2
checkTransformable, getTransformationTime, recordTime, register, setRegistry, toString, transform, transform, transform
 
Methods inherited from class org.alfresco.repo.content.transform.ContentTransformerHelper
getMimetype, getMimetypeService, isExplicitTransformation, setExplicitTransformations, setMimetypeService
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.alfresco.repo.content.transform.ContentTransformer
isExplicitTransformation
 

Constructor Detail

TextMiningContentTransformer

public TextMiningContentTransformer()
Method Detail

isTransformable

public boolean isTransformable(java.lang.String sourceMimetype,
                               java.lang.String targetMimetype,
                               TransformationOptions options)
Currently the only transformation performed is that of text extraction from Word documents.

Parameters:
sourceMimetype - the source mimetype
options - the transformation options
Returns:
boolean true if this content transformer can satify the mimetypes and options specified, false otherwise

transformInternal

public void transformInternal(org.alfresco.service.cmr.repository.ContentReader reader,
                              org.alfresco.service.cmr.repository.ContentWriter writer,
                              TransformationOptions options)
                       throws java.lang.Exception
Description copied from class: AbstractContentTransformer2
Method to be implemented by subclasses wishing to make use of the common infrastructural code provided by this class.

Specified by:
transformInternal in class AbstractContentTransformer2
Parameters:
reader - the source of the content to transform
writer - the target to which to write the transformed content
options - a map of options to use when performing the transformation. The map will never be null.
Throws:
java.lang.Exception - exceptions will be handled by this class - subclasses can throw anything


Copyright © 2005 - 2010 Alfresco Software, Inc. All Rights Reserved.