org.alfresco.encoding
Interface CharactersetFinder

All Known Implementing Classes:
AbstractCharactersetFinder, BomCharactersetFinder, GuessEncodingCharsetFinder

public interface CharactersetFinder

Interface for classes that are able to read a text-based input stream and determine the character encoding.

There are quite a few libraries that do this, but none are perfect. It is therefore necessary to abstract the implementation to allow these finders to be configured in as required.

Implementations should have a default constructor and be completely thread safe and stateless. This will allow them to be constructed and held indefinitely to do the decoding work.

Where the encoding cannot be determined, it is left to the client to decide what to do. Some implementations may guess and encoding or use a default guess - it is up to the implementation to specify the behaviour.

Since:
2.1

Method Summary
 java.nio.charset.Charset detectCharset(byte[] buffer)
          Attempt to detect the character set encoding for the given buffer.
 java.nio.charset.Charset detectCharset(java.io.InputStream is)
          Attempt to detect the character set encoding for the give input stream.
 

Method Detail

detectCharset

java.nio.charset.Charset detectCharset(java.io.InputStream is)
Attempt to detect the character set encoding for the give input stream. The input stream will not be altered or closed by this method, and must therefore support marking. If the input stream available doesn't support marking, then it can be wrapped with a BufferedInputStream.

The current state of the stream will be restored before the method returns.

Parameters:
is - an input stream that must support marking
defaultCharset - the character set to use if nothing could be taken from the stream
Returns:
Returns the encoding of the stream, or null if encoding cannot be identified

detectCharset

java.nio.charset.Charset detectCharset(byte[] buffer)
Attempt to detect the character set encoding for the given buffer.

Parameters:
buffer - the first n bytes of the character stream
Returns:
Returns the encoding of the buffer, or null if encoding cannot be identified


Copyright © 2005 - 2010 Alfresco Software, Inc. All Rights Reserved.