org.alfresco.repo.content.transform
Class TikaPoweredContentTransformer

java.lang.Object
  extended by org.alfresco.repo.content.transform.ContentTransformerHelper
      extended by org.alfresco.repo.content.transform.AbstractContentTransformer2
          extended by org.alfresco.repo.content.transform.TikaPoweredContentTransformer
All Implemented Interfaces:
ContentWorker, ContentTransformer
Direct Known Subclasses:
ArchiveContentTransformer, MailContentTransformer, PdfBoxContentTransformer, PoiContentTransformer, PoiHssfContentTransformer, PoiOOXMLContentTransformer, TextMiningContentTransformer, TikaAutoContentTransformer, TikaSpringConfiguredContentTransformer

public abstract class TikaPoweredContentTransformer
extends AbstractContentTransformer2

Provides helpful services for ContentTransformer implementations which are powered by Apache Tika. To use Tika to transform some content into Text, Html or XML, create an implementation of this / use the Auto Detect transformer. For now, all transformers are registered as regular, rather than explicit transformations. This should allow you to register your own explicit transformers and have them nicely take priority.


Field Summary
protected static java.lang.String LINE_BREAK
          Windows carriage return line feed pair.
protected  java.util.List sourceMimeTypes
           
static java.lang.String WRONG_FORMAT_MESSAGE_ID
           
 
Constructor Summary
protected TikaPoweredContentTransformer(java.util.List sourceMimeTypes)
           
protected TikaPoweredContentTransformer(java.lang.String[] sourceMimeTypes)
           
 
Method Summary
protected  org.apache.tika.parser.ParseContext buildParseContext(org.apache.tika.metadata.Metadata metadata, java.lang.String targetMimeType, TransformationOptions options)
          By default returns a ParseContent that does not recurse
protected  org.xml.sax.ContentHandler getContentHandler(java.lang.String targetMimeType, java.io.Writer output)
          Returns an appropriate Tika ContentHandler for the requested content type.
protected abstract  org.apache.tika.parser.Parser getParser()
          Returns the correct Tika Parser to process the document.
 boolean isTransformable(java.lang.String sourceMimetype, java.lang.String targetMimetype, TransformationOptions options)
          Can we do the requested transformation via Tika? We support transforming to HTML, XML or Text
 void transformInternal(org.alfresco.service.cmr.repository.ContentReader reader, org.alfresco.service.cmr.repository.ContentWriter writer, TransformationOptions options)
          Method to be implemented by subclasses wishing to make use of the common infrastructural code provided by this class.
 
Methods inherited from class org.alfresco.repo.content.transform.AbstractContentTransformer2
checkTransformable, getTransformationTime, recordTime, register, setRegistry, toString, transform, transform, transform
 
Methods inherited from class org.alfresco.repo.content.transform.ContentTransformerHelper
getMimetype, getMimetypeService, isExplicitTransformation, setExplicitTransformations, setMimetypeService
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.alfresco.repo.content.transform.ContentTransformer
isExplicitTransformation
 

Field Detail

sourceMimeTypes

protected java.util.List sourceMimeTypes

LINE_BREAK

protected static final java.lang.String LINE_BREAK
Windows carriage return line feed pair.

See Also:
Constant Field Values

WRONG_FORMAT_MESSAGE_ID

public static final java.lang.String WRONG_FORMAT_MESSAGE_ID
See Also:
Constant Field Values
Constructor Detail

TikaPoweredContentTransformer

protected TikaPoweredContentTransformer(java.util.List sourceMimeTypes)

TikaPoweredContentTransformer

protected TikaPoweredContentTransformer(java.lang.String[] sourceMimeTypes)
Method Detail

getParser

protected abstract org.apache.tika.parser.Parser getParser()
Returns the correct Tika Parser to process the document. If you don't know which you want, use TikaAutoContentTransformer which makes use of the Tika auto-detection.


isTransformable

public boolean isTransformable(java.lang.String sourceMimetype,
                               java.lang.String targetMimetype,
                               TransformationOptions options)
Can we do the requested transformation via Tika? We support transforming to HTML, XML or Text

Parameters:
sourceMimetype - the source mimetype
options - the transformation options
Returns:
boolean true if this content transformer can satify the mimetypes and options specified, false otherwise

getContentHandler

protected org.xml.sax.ContentHandler getContentHandler(java.lang.String targetMimeType,
                                                       java.io.Writer output)
                                                throws javax.xml.transform.TransformerConfigurationException
Returns an appropriate Tika ContentHandler for the requested content type. Normally you'll let this work as default, but if you need fine-grained control of how the Tika events become text then override and supply your own.

Throws:
javax.xml.transform.TransformerConfigurationException

buildParseContext

protected org.apache.tika.parser.ParseContext buildParseContext(org.apache.tika.metadata.Metadata metadata,
                                                                java.lang.String targetMimeType,
                                                                TransformationOptions options)
By default returns a ParseContent that does not recurse


transformInternal

public void transformInternal(org.alfresco.service.cmr.repository.ContentReader reader,
                              org.alfresco.service.cmr.repository.ContentWriter writer,
                              TransformationOptions options)
                       throws java.lang.Exception
Description copied from class: AbstractContentTransformer2
Method to be implemented by subclasses wishing to make use of the common infrastructural code provided by this class.

Specified by:
transformInternal in class AbstractContentTransformer2
Parameters:
reader - the source of the content to transform
writer - the target to which to write the transformed content
options - a map of options to use when performing the transformation. The map will never be null.
Throws:
java.lang.Exception - exceptions will be handled by this class - subclasses can throw anything


Copyright © 2005 - 2010 Alfresco Software, Inc. All Rights Reserved.