org.alfresco.repo.avm.util
Class UriSchemeNameMatcher

java.lang.Object
  extended by org.alfresco.repo.avm.util.UriSchemeNameMatcher
All Implemented Interfaces:
java.io.Serializable, org.alfresco.util.NameMatcher

public class UriSchemeNameMatcher
extends java.lang.Object
implements org.alfresco.util.NameMatcher, java.io.Serializable

A NameMatcher that matches an incoming URL against list of schemes (less formally known as "protocols"), case insensitively. The formal spec for parsing URIs is RFC-3986

Perhaps someday, it might be worthwhile to create a specific parser for each registered scheme-specific part, and validate that; for now, we'll just be be more lax, and assume the URI is alwasy scheme-qualified. This matcher will look no further than the leading colon, and declare "no match" otherwise. The discussion below explains why.

See: http://tools.ietf.org/html/rfc3986):

  The following regex parses URIs:
       ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

  Given the following URI:   
        http://www.ics.uci.edu/pub/ietf/uri/#Related

  The captured subexpressions are:

        $1 = http:
        $2 = http
        $3 = //www.ics.uci.edu
        $4 = www.ics.uci.edu
        $5 = /pub/ietf/uri/
        $6 = 
        $7 = 
        $8 = #Related
        $9 = Related   

   N0TE:
      A URI can be non-scheme qualified because $1 is optional.  Therefore,
      the following are all exaples of valid non-scheme qualified URIS:

         ""
         "moo@cow.com"
         "moo@cow.com?wow"
         "moo@cow.com?wow#zow"
         "moo@cow.com#zow"
         "/"
         "/moo/cow"
         "/moo/cow?wow"
         "/moo/cow?wow#zow"
         "/moo/cow#zow"
         "//moo/cow"
         "//moo.com/cow"
         "//moo.com/cow/"
         "//moo.com/cow?wow"
         "//moo.com/cow?wow#zow"
         "//moo.com/cow#zow"
         "//moo.com:8080/cow"
         "//moo.com:/cow"
         "//moo.com:8080/cow?wow"
         "//moo.com:8080/cow?wow#zow"
         "//moo.com:8080/cow#zow"
         "///moo/cow"
         "///moo/cow?wow"
         "///moo/cow?wow#zow"
         "///moo/cow#zow"

      And so forth...
      
 

  Thus the business end of things as far as scheme matching is: $2,
  Most schemes will have a $3 that starts with '//', but not all.
  Specificially, the following have no "network path '//' segment,
  or aren't required to (source: http://en.wikipedia.org/wiki/URI_scheme):
  

      cid data dns fax go h323 iax2 mailto mid news pres sip
      sips tel urn xmpp about aim callto feed magnet msnim 
      psyc skype sms stream xfire ymsgr

  
Visually the parts are as follows:
 
         foo://example.com:10042/over/there?name=ferret#nose
         \_/   \_______________/\_________/ \_________/ \__/
          |           |            |            |        |
       scheme     authority       path        query   fragment
          |   _____________________|__
         / \ /                        \
         urn:example:animal:ferret:nose

 
This is useful for classifying URLs for things like whether or not they're supported by an application. For example, the LinkValidationService supports http, and https, is willing to ignore certain well-formed URLs, but treats URLs will unknown and unsupported protocols as broken. Concretely, we'd like to avoid treating something like the following one as being non-broken even though you can't apply GET or HEAD to it.
 Email
 
As of June 2007,IANA had over 70 registered and provisional protocols listed at http://www.iana.org/assignments/uri-schemes.html but sometimes people create their own too (e.g.: cvs). Here's the official list:

    aaa aaas acap afs cap cid crid data dav dict dns dtn fax file
    ftp go gopher h323 http https iax2 icap im imap info ipp iris
    iris.beep iris.lwz iris.xpc iris.xpcs ldap mailserver mailto
    mid modem msrp msrps mtqp mupdate news nfs nntp opaquelocktoken
    pop pres prospero rtsp service shttp sip sips snmp soap.beep
    soap.beeps tag tel telnet tftp thismessage tip tn3270 tv urn
    vemmi wais xmlrpc.beep xmlrpc.beeps xmpp z39.50r z39.50s
 

See Also:
Serialized Form

Constructor Summary
UriSchemeNameMatcher()
          Default constructor.
 
Method Summary
 boolean matches(java.lang.String uri)
          Returns true if the URL's protocol is in the of being matched.
 void setExtensions(java.util.List protocols)
          Set the protocols case insensitively (cannonicalized to lower-case).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UriSchemeNameMatcher

public UriSchemeNameMatcher()
Default constructor.

Method Detail

setExtensions

public void setExtensions(java.util.List protocols)
Set the protocols case insensitively (cannonicalized to lower-case).

Parameters:
protocols -

matches

public boolean matches(java.lang.String uri)
Returns true if the URL's protocol is in the of being matched. Everything up to but not including the intial colon is

Specified by:
matches in interface org.alfresco.util.NameMatcher


Copyright © 2005 - 2010 Alfresco Software, Inc. All Rights Reserved.