org.alfresco.repo.content.metadata
Class OfficeMetadataExtracter

java.lang.Object
  extended by org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter
      extended by org.alfresco.repo.content.metadata.TikaPoweredMetadataExtracter
          extended by org.alfresco.repo.content.metadata.OfficeMetadataExtracter
All Implemented Interfaces:
ContentWorker, MetadataExtracter

public class OfficeMetadataExtracter
extends TikaPoweredMetadataExtracter

Office file format Metadata Extracter. This extracter uses the POI library to extract the following:

   author:             --      cm:author
   title:              --      cm:title
   subject:            --      cm:description
   createDateTime:     --      cm:created
   lastSaveDateTime:   --      cm:modified
   comments:
   editTime:
   format:
   keywords:
   lastAuthor:
   lastPrinted:
   osVersion:
   thumbnail:
   pageCount:
   wordCount:
 
Uses Apache Tika


Nested Class Summary
 
Nested classes/interfaces inherited from class org.alfresco.repo.content.metadata.TikaPoweredMetadataExtracter
TikaPoweredMetadataExtracter.HeadContentHandler, TikaPoweredMetadataExtracter.MapCaptureContentHandler, TikaPoweredMetadataExtracter.NullContentHandler
 
Nested classes/interfaces inherited from interface org.alfresco.repo.content.metadata.MetadataExtracter
MetadataExtracter.OverwritePolicy
 
Field Summary
static java.lang.String KEY_CREATE_DATETIME
           
static java.lang.String KEY_EDIT_TIME
           
static java.lang.String KEY_FORMAT
           
static java.lang.String KEY_KEYWORDS
           
static java.lang.String KEY_LAST_AUTHOR
           
static java.lang.String KEY_LAST_PRINTED
           
static java.lang.String KEY_LAST_SAVE_DATETIME
           
static java.lang.String KEY_OS_VERSION
           
static java.lang.String KEY_PAGE_COUNT
           
static java.lang.String KEY_PARAGRAPH_COUNT
           
static java.lang.String KEY_THUMBNAIL
           
static java.lang.String KEY_WORD_COUNT
           
static java.util.ArrayList SUPPORTED_MIMETYPES
           
 
Fields inherited from class org.alfresco.repo.content.metadata.TikaPoweredMetadataExtracter
KEY_AUTHOR, KEY_COMMENTS, KEY_CREATED, KEY_DESCRIPTION, KEY_SUBJECT, KEY_TITLE, logger
 
Fields inherited from class org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter
NAMESPACE_PROPERTY_PREFIX, supportedDateFormats
 
Constructor Summary
OfficeMetadataExtracter()
           
 
Method Summary
protected  java.util.Map extractSpecific(org.apache.tika.metadata.Metadata metadata, java.util.Map properties, java.util.Map headers)
          Allows implementation specific mappings to be done.
protected  org.apache.tika.parser.Parser getParser()
          Returns the correct Tika Parser to process the document.
 
Methods inherited from class org.alfresco.repo.content.metadata.TikaPoweredMetadataExtracter
buildSupportedMimetypes, extractRaw, makeDate, needHeaderContents
 
Methods inherited from class org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter
checkIsSupported, extract, extract, extract, filterSystemProperties, getDefaultMapping, getExtractionTime, getMapping, getMimetypeService, getReliability, init, isSupported, newRawMap, putRawValue, readMappingProperties, readMappingProperties, register, setDictionaryService, setFailOnTypeConversion, setInheritDefaultMapping, setMapping, setMappingProperties, setMimetypeService, setOverwritePolicy, setOverwritePolicy, setRegistry, setSupportedDateFormats, setSupportedMimetypes
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

KEY_CREATE_DATETIME

public static final java.lang.String KEY_CREATE_DATETIME
See Also:
Constant Field Values

KEY_LAST_SAVE_DATETIME

public static final java.lang.String KEY_LAST_SAVE_DATETIME
See Also:
Constant Field Values

KEY_EDIT_TIME

public static final java.lang.String KEY_EDIT_TIME
See Also:
Constant Field Values

KEY_FORMAT

public static final java.lang.String KEY_FORMAT
See Also:
Constant Field Values

KEY_KEYWORDS

public static final java.lang.String KEY_KEYWORDS
See Also:
Constant Field Values

KEY_LAST_AUTHOR

public static final java.lang.String KEY_LAST_AUTHOR
See Also:
Constant Field Values

KEY_LAST_PRINTED

public static final java.lang.String KEY_LAST_PRINTED
See Also:
Constant Field Values

KEY_OS_VERSION

public static final java.lang.String KEY_OS_VERSION
See Also:
Constant Field Values

KEY_THUMBNAIL

public static final java.lang.String KEY_THUMBNAIL
See Also:
Constant Field Values

KEY_PAGE_COUNT

public static final java.lang.String KEY_PAGE_COUNT
See Also:
Constant Field Values

KEY_PARAGRAPH_COUNT

public static final java.lang.String KEY_PARAGRAPH_COUNT
See Also:
Constant Field Values

KEY_WORD_COUNT

public static final java.lang.String KEY_WORD_COUNT
See Also:
Constant Field Values

SUPPORTED_MIMETYPES

public static java.util.ArrayList SUPPORTED_MIMETYPES
Constructor Detail

OfficeMetadataExtracter

public OfficeMetadataExtracter()
Method Detail

getParser

protected org.apache.tika.parser.Parser getParser()
Description copied from class: TikaPoweredMetadataExtracter
Returns the correct Tika Parser to process the document. If you don't know which you want, use TikaAutoMetadataExtracter which makes use of the Tika auto-detection.

Specified by:
getParser in class TikaPoweredMetadataExtracter

extractSpecific

protected java.util.Map extractSpecific(org.apache.tika.metadata.Metadata metadata,
                                        java.util.Map properties,
                                        java.util.Map headers)
Description copied from class: TikaPoweredMetadataExtracter
Allows implementation specific mappings to be done.

Overrides:
extractSpecific in class TikaPoweredMetadataExtracter


Copyright © 2005 - 2010 Alfresco Software, Inc. All Rights Reserved.