|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter
org.alfresco.repo.content.metadata.HtmlMetadataExtracter
public class HtmlMetadataExtracter
Extracts the following values from HTML documents:
author: -- cm:author title: -- cm:title description: -- cm:descriptionTIKA note - all metadata will be present, but will need to search for the varient names ourselves as tika puts them in as-is.
Nested Class Summary |
---|
Nested classes/interfaces inherited from interface org.alfresco.repo.content.metadata.MetadataExtracter |
---|
MetadataExtracter.OverwritePolicy |
Field Summary | |
---|---|
static java.util.Set |
MIMETYPES
|
Fields inherited from class org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter |
---|
logger, NAMESPACE_PROPERTY_PREFIX, supportedDateFormats |
Constructor Summary | |
---|---|
HtmlMetadataExtracter()
|
Method Summary | |
---|---|
protected java.util.Map |
extractRaw(org.alfresco.service.cmr.repository.ContentReader reader)
Override to provide the raw extracted metadata values. |
Methods inherited from class org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter |
---|
checkIsSupported, extract, extract, extract, filterSystemProperties, getDefaultMapping, getExtractionTime, getMapping, getMimetypeService, getReliability, init, isSupported, makeDate, newRawMap, putRawValue, readMappingProperties, readMappingProperties, register, setDictionaryService, setFailOnTypeConversion, setInheritDefaultMapping, setMapping, setMappingProperties, setMimetypeService, setOverwritePolicy, setOverwritePolicy, setRegistry, setSupportedDateFormats, setSupportedMimetypes |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final java.util.Set MIMETYPES
Constructor Detail |
---|
public HtmlMetadataExtracter()
Method Detail |
---|
protected java.util.Map extractRaw(org.alfresco.service.cmr.repository.ContentReader reader) throws java.lang.Throwable
AbstractMappingMetadataExtracter
default mapping
doesn't handle all properties, it is
possible for each instance of the extracter to be configured differently and more or
less of the properties may be used in different installations.
Raw values must not be trimmed or removed for any reason. Null values and empty strings are
OverwritePolicy
Properties extracted and their meanings and types should be thoroughly described in the class-level javadocs of the extracter implementation, for example:
editor: - the document editor --> cm:author title: - the document title --> cm:title user1: - the document summary user2: - the document description --> cm:description user3: - user4: -
extractRaw
in class AbstractMappingMetadataExtracter
reader
- the document to extract the values from. This stream provided by
the reader must be closed if accessed directly.
java.lang.Throwable
AbstractMappingMetadataExtracter.getDefaultMapping()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |