org.alfresco.repo.content.metadata
Class PdfBoxMetadataExtracter
java.lang.Object
org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter
org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter
- All Implemented Interfaces:
- ContentWorker, MetadataExtracter
public class PdfBoxMetadataExtracter
- extends AbstractMappingMetadataExtracter
Metadata extractor for the PDF documents.
author: -- cm:author
title: -- cm:title
subject: -- cm:description
created: -- cm:created
Any custom property: -- [not mapped]
TIKA Note - all the fields (plus a few others) are present
in the tika metadata.
Method Summary |
java.util.Map |
extractRaw(ContentReader reader)
Override to provide the raw extracted metadata values. |
Methods inherited from class org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter |
checkIsSupported, extract, extract, extract, getDefaultMapping, getExtractionTime, getMapping, getMimetypeService, getReliability, init, isSupported, newRawMap, putRawValue, readMappingProperties, readMappingProperties, register, setDictionaryService, setFailOnTypeConversion, setInheritDefaultMapping, setMapping, setMappingProperties, setMimetypeService, setOverwritePolicy, setOverwritePolicy, setRegistry, setSupportedDateFormats, setSupportedMimetypes |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
pdfLogger
protected static org.apache.commons.logging.Log pdfLogger
SUPPORTED_MIMETYPES
public static java.lang.String[] SUPPORTED_MIMETYPES
PdfBoxMetadataExtracter
public PdfBoxMetadataExtracter()
extractRaw
public java.util.Map extractRaw(ContentReader reader)
throws java.lang.Throwable
- Description copied from class:
AbstractMappingMetadataExtracter
- Override to provide the raw extracted metadata values. An extracter should extract
as many of the available properties as is realistically possible. Even if the
default mapping
doesn't handle all properties, it is
possible for each instance of the extracter to be configured differently and more or
less of the properties may be used in different installations.
Raw values must not be trimmed or removed for any reason. Null values and empty
strings are
- Null: Removed
- Empty String: Passed to the
OverwritePolicy
- Non Serializable: Converted to String or fails if that is not possible
Properties extracted and their meanings and types should be thoroughly described in
the class-level javadocs of the extracter implementation, for example:
editor: - the document editor --> cm:author
title: - the document title --> cm:title
user1: - the document summary
user2: - the document description --> cm:description
user3: -
user4: -
- Specified by:
extractRaw
in class AbstractMappingMetadataExtracter
- Parameters:
reader
- the document to extract the values from. This stream provided by
the reader must be closed if accessed directly.
- Returns:
- Returns a map of document property values keyed by property name.
- Throws:
java.lang.Throwable
- See Also:
AbstractMappingMetadataExtracter.getDefaultMapping()
Copyright © 2005 - 2010 Alfresco Software, Inc. All Rights Reserved.