org.alfresco.repo.content.transform
Class TikaPoweredContentTransformer
java.lang.Object
org.alfresco.repo.content.transform.ContentTransformerHelper
org.alfresco.repo.content.transform.AbstractContentTransformer2
org.alfresco.repo.content.transform.TikaPoweredContentTransformer
- All Implemented Interfaces:
- ContentWorker, ContentTransformer
- Direct Known Subclasses:
- PdfBoxContentTransformer, PoiContentTransformer, PoiHssfContentTransformer, PoiOOXMLContentTransformer, TikaAutoContentTransformer
public abstract class TikaPoweredContentTransformer
- extends AbstractContentTransformer2
Provides helpful services for ContentTransformer
implementations which are powered by Apache Tika.
To use Tika to transform some content into Text, Html or XML, create an
implementation of this / use the Auto Detect transformer.
For now, all transformers are registered as regular, rather than explicit
transformations. This should allow you to register your own explicit
transformers and have them nicely take priority.
Method Summary |
protected org.xml.sax.ContentHandler |
getContentHandler(java.lang.String targetMimeType,
java.io.Writer output)
Returns an appropriate Tika ContentHandler for the
requested content type. |
protected abstract org.apache.tika.parser.Parser |
getParser()
Returns the correct Tika Parser to process
the document. |
boolean |
isTransformable(java.lang.String sourceMimetype,
java.lang.String targetMimetype,
TransformationOptions options)
Can we do the requested transformation via Tika?
We support transforming to HTML, XML or Text |
void |
transformInternal(org.alfresco.service.cmr.repository.ContentReader reader,
org.alfresco.service.cmr.repository.ContentWriter writer,
TransformationOptions options)
Method to be implemented by subclasses wishing to make use of the common infrastructural code
provided by this class. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
sourceMimeTypes
protected java.util.List sourceMimeTypes
LINE_BREAK
protected static final java.lang.String LINE_BREAK
- Windows carriage return line feed pair.
- See Also:
- Constant Field Values
WRONG_FORMAT_MESSAGE_ID
public static final java.lang.String WRONG_FORMAT_MESSAGE_ID
- See Also:
- Constant Field Values
TikaPoweredContentTransformer
protected TikaPoweredContentTransformer(java.util.List sourceMimeTypes)
TikaPoweredContentTransformer
protected TikaPoweredContentTransformer(java.lang.String[] sourceMimeTypes)
getParser
protected abstract org.apache.tika.parser.Parser getParser()
- Returns the correct Tika Parser to process
the document.
If you don't know which you want, use
TikaAutoContentTransformer
which
makes use of the Tika auto-detection.
isTransformable
public boolean isTransformable(java.lang.String sourceMimetype,
java.lang.String targetMimetype,
TransformationOptions options)
- Can we do the requested transformation via Tika?
We support transforming to HTML, XML or Text
- Parameters:
sourceMimetype
- the source mimetypeoptions
- the transformation options
- Returns:
- boolean true if this content transformer can satify the mimetypes and options specified, false otherwise
getContentHandler
protected org.xml.sax.ContentHandler getContentHandler(java.lang.String targetMimeType,
java.io.Writer output)
throws javax.xml.transform.TransformerConfigurationException
- Returns an appropriate Tika ContentHandler for the
requested content type. Normally you'll let this
work as default, but if you need fine-grained
control of how the Tika events become text then
override and supply your own.
- Throws:
javax.xml.transform.TransformerConfigurationException
transformInternal
public void transformInternal(org.alfresco.service.cmr.repository.ContentReader reader,
org.alfresco.service.cmr.repository.ContentWriter writer,
TransformationOptions options)
throws java.lang.Exception
- Description copied from class:
AbstractContentTransformer2
- Method to be implemented by subclasses wishing to make use of the common infrastructural code
provided by this class.
- Specified by:
transformInternal
in class AbstractContentTransformer2
- Parameters:
reader
- the source of the content to transformwriter
- the target to which to write the transformed contentoptions
- a map of options to use when performing the transformation. The map
will never be null.
- Throws:
java.lang.Exception
- exceptions will be handled by this class - subclasses can throw anything
Copyright © 2005 - 2010 Alfresco Software, Inc. All Rights Reserved.