|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter
org.alfresco.repo.content.metadata.TikaPoweredMetadataExtracter
public abstract class TikaPoweredMetadataExtracter
The parent of all Metadata Extractors which use Apache Tika under the hood. This handles all the common parts of processing the files, and the common mappings. Individual extractors extend from this to do custom mappings.
author: -- cm:author title: -- cm:title subject: -- cm:description created: -- cm:created comments:
Nested Class Summary | |
---|---|
protected static class |
TikaPoweredMetadataExtracter.HeadContentHandler
This content handler will capture entries from within the header of the Tika content XHTML, but ignore the rest. |
protected static class |
TikaPoweredMetadataExtracter.MapCaptureContentHandler
This content handler will grab all tags and attributes, and record the textual content of the last seen one of them. |
protected static class |
TikaPoweredMetadataExtracter.NullContentHandler
A content handler that ignores all the content it finds. |
Nested classes/interfaces inherited from interface org.alfresco.repo.content.metadata.MetadataExtracter |
---|
MetadataExtracter.OverwritePolicy |
Field Summary | |
---|---|
protected static java.lang.String |
KEY_AUTHOR
|
protected static java.lang.String |
KEY_COMMENTS
|
protected static java.lang.String |
KEY_CREATED
|
protected static java.lang.String |
KEY_DESCRIPTION
|
protected static java.lang.String |
KEY_SUBJECT
|
protected static java.lang.String |
KEY_TITLE
|
protected static org.apache.commons.logging.Log |
logger
|
Fields inherited from class org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter |
---|
NAMESPACE_PROPERTY_PREFIX, supportedDateFormats |
Constructor Summary | |
---|---|
TikaPoweredMetadataExtracter(java.util.ArrayList supportedMimeTypes)
|
|
TikaPoweredMetadataExtracter(java.util.HashSet supportedMimeTypes)
|
Method Summary | |
---|---|
protected static java.util.ArrayList |
buildSupportedMimetypes(java.lang.String[] explicitTypes,
org.apache.tika.parser.Parser tikaParser)
Builds up a list of supported mime types by merging an explicit list with any that Tika also claims to support |
protected java.util.Map |
extractRaw(org.alfresco.service.cmr.repository.ContentReader reader)
Override to provide the raw extracted metadata values. |
protected java.util.Map |
extractSpecific(org.apache.tika.metadata.Metadata metadata,
java.util.Map properties,
java.util.Map headers)
Allows implementation specific mappings to be done. |
protected abstract org.apache.tika.parser.Parser |
getParser()
Returns the correct Tika Parser to process the document. |
protected java.util.Date |
makeDate(java.lang.String dateStr)
Version which also tries the ISO-8601 formats (in order..), and similar formats, which Tika makes use of |
protected boolean |
needHeaderContents()
Do we care about the contents of the extracted header, or nothing at all? |
Methods inherited from class org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter |
---|
checkIsSupported, extract, extract, extract, filterSystemProperties, getDefaultMapping, getExtractionTime, getMapping, getMimetypeService, getReliability, init, isSupported, newRawMap, putRawValue, readMappingProperties, readMappingProperties, register, setDictionaryService, setFailOnTypeConversion, setInheritDefaultMapping, setMapping, setMappingProperties, setMimetypeService, setOverwritePolicy, setOverwritePolicy, setRegistry, setSupportedDateFormats, setSupportedMimetypes |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static org.apache.commons.logging.Log logger
protected static final java.lang.String KEY_AUTHOR
protected static final java.lang.String KEY_TITLE
protected static final java.lang.String KEY_SUBJECT
protected static final java.lang.String KEY_CREATED
protected static final java.lang.String KEY_DESCRIPTION
protected static final java.lang.String KEY_COMMENTS
Constructor Detail |
---|
public TikaPoweredMetadataExtracter(java.util.ArrayList supportedMimeTypes)
public TikaPoweredMetadataExtracter(java.util.HashSet supportedMimeTypes)
Method Detail |
---|
protected static java.util.ArrayList buildSupportedMimetypes(java.lang.String[] explicitTypes, org.apache.tika.parser.Parser tikaParser)
protected java.util.Date makeDate(java.lang.String dateStr)
makeDate
in class AbstractMappingMetadataExtracter
protected abstract org.apache.tika.parser.Parser getParser()
TikaAutoMetadataExtracter
which
makes use of the Tika auto-detection.
protected boolean needHeaderContents()
protected java.util.Map extractSpecific(org.apache.tika.metadata.Metadata metadata, java.util.Map properties, java.util.Map headers)
protected java.util.Map extractRaw(org.alfresco.service.cmr.repository.ContentReader reader) throws java.lang.Throwable
AbstractMappingMetadataExtracter
default mapping
doesn't handle all properties, it is
possible for each instance of the extracter to be configured differently and more or
less of the properties may be used in different installations.
Raw values must not be trimmed or removed for any reason. Null values and empty strings are
OverwritePolicy
Properties extracted and their meanings and types should be thoroughly described in the class-level javadocs of the extracter implementation, for example:
editor: - the document editor --> cm:author title: - the document title --> cm:title user1: - the document summary user2: - the document description --> cm:description user3: - user4: -
extractRaw
in class AbstractMappingMetadataExtracter
reader
- the document to extract the values from. This stream provided by
the reader must be closed if accessed directly.
java.lang.Throwable
AbstractMappingMetadataExtracter.getDefaultMapping()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |