public abstract class TikaPoweredMetadataExtracter extends AbstractMappingMetadataExtracter
author: -- cm:author title: -- cm:title subject: -- cm:description created: -- cm:created comments:
| Modifier and Type | Class and Description |
|---|---|
protected static class |
TikaPoweredMetadataExtracter.HeadContentHandler
This content handler will capture entries from within
the header of the Tika content XHTML, but ignore the
rest.
|
protected static class |
TikaPoweredMetadataExtracter.MapCaptureContentHandler
This content handler will grab all tags and attributes,
and record the textual content of the last seen one
of them.
|
protected static class |
TikaPoweredMetadataExtracter.NullContentHandler
A content handler that ignores all the content it finds.
|
MetadataExtracter.OverwritePolicy| Modifier and Type | Field and Description |
|---|---|
protected static java.lang.String |
KEY_AUTHOR |
protected static java.lang.String |
KEY_COMMENTS |
protected static java.lang.String |
KEY_CREATED |
protected static java.lang.String |
KEY_DESCRIPTION |
protected static java.lang.String |
KEY_SUBJECT |
protected static java.lang.String |
KEY_TITLE |
protected static org.apache.commons.logging.Log |
logger |
NAMESPACE_PROPERTY_PREFIX, supportedDateFormats| Constructor and Description |
|---|
TikaPoweredMetadataExtracter(java.util.ArrayList supportedMimeTypes) |
TikaPoweredMetadataExtracter(java.util.HashSet supportedMimeTypes) |
| Modifier and Type | Method and Description |
|---|---|
protected static java.util.ArrayList |
buildSupportedMimetypes(java.lang.String[] explicitTypes,
org.apache.tika.parser.Parser[] tikaParsers)
Builds up a list of supported mime types by merging an explicit
list with any that Tika also claims to support
|
protected java.util.Map |
extractRaw(org.alfresco.service.cmr.repository.ContentReader reader)
Override to provide the raw extracted metadata values.
|
protected java.util.Map |
extractSpecific(org.apache.tika.metadata.Metadata metadata,
java.util.Map properties,
java.util.Map headers)
Allows implementation specific mappings
to be done.
|
protected abstract org.apache.tika.parser.Parser |
getParser()
Returns the correct Tika Parser to process
the document.
|
protected java.util.Date |
makeDate(java.lang.String dateStr)
Version which also tries the ISO-8601 formats (in order..),
and similar formats, which Tika makes use of
|
protected boolean |
needHeaderContents()
Do we care about the contents of the
extracted header, or nothing at all?
|
checkIsSupported, extract, extract, extract, filterSystemProperties, getDefaultMapping, getExtractionTime, getMapping, getMimetypeService, getReliability, init, isSupported, newRawMap, putRawValue, readMappingProperties, readMappingProperties, register, setDictionaryService, setFailOnTypeConversion, setInheritDefaultMapping, setMapping, setMappingProperties, setMimetypeService, setOverwritePolicy, setOverwritePolicy, setRegistry, setSupportedDateFormats, setSupportedMimetypesprotected static org.apache.commons.logging.Log logger
protected static final java.lang.String KEY_AUTHOR
protected static final java.lang.String KEY_TITLE
protected static final java.lang.String KEY_SUBJECT
protected static final java.lang.String KEY_CREATED
protected static final java.lang.String KEY_DESCRIPTION
protected static final java.lang.String KEY_COMMENTS
public TikaPoweredMetadataExtracter(java.util.ArrayList supportedMimeTypes)
public TikaPoweredMetadataExtracter(java.util.HashSet supportedMimeTypes)
protected static java.util.ArrayList buildSupportedMimetypes(java.lang.String[] explicitTypes,
org.apache.tika.parser.Parser[] tikaParsers)
protected java.util.Date makeDate(java.lang.String dateStr)
makeDate in class AbstractMappingMetadataExtracterprotected abstract org.apache.tika.parser.Parser getParser()
TikaAutoMetadataExtracter which
makes use of the Tika auto-detection.protected boolean needHeaderContents()
protected java.util.Map extractSpecific(org.apache.tika.metadata.Metadata metadata,
java.util.Map properties,
java.util.Map headers)
protected java.util.Map extractRaw(org.alfresco.service.cmr.repository.ContentReader reader)
throws java.lang.Throwable
AbstractMappingMetadataExtracterdefault mapping doesn't handle all properties, it is
possible for each instance of the extracter to be configured differently and more or
less of the properties may be used in different installations.
Raw values must not be trimmed or removed for any reason. Null values and empty strings are
OverwritePolicyProperties extracted and their meanings and types should be thoroughly described in the class-level javadocs of the extracter implementation, for example:
editor: - the document editor --> cm:author title: - the document title --> cm:title user1: - the document summary user2: - the document description --> cm:description user3: - user4: -
extractRaw in class AbstractMappingMetadataExtracterreader - the document to extract the values from. This stream provided by
the reader must be closed if accessed directly.java.lang.ThrowableAbstractMappingMetadataExtracter.getDefaultMapping()Copyright © 2005 - 2010 Alfresco Software, Inc. All Rights Reserved.