org.alfresco.repo.avm.util
Class UriSchemeNameMatcher
java.lang.Object
org.alfresco.repo.avm.util.UriSchemeNameMatcher
- All Implemented Interfaces:
- java.io.Serializable, org.alfresco.util.NameMatcher
public class UriSchemeNameMatcher
- extends java.lang.Object
- implements org.alfresco.util.NameMatcher, java.io.Serializable
A NameMatcher that matches an incoming URL against list of schemes
(less formally known as "protocols"), case insensitively.
The formal spec for parsing URIs is RFC-3986
Perhaps someday, it might be worthwhile to create a specific
parser for each registered scheme-specific part, and validate
that; for now, we'll just be be more lax, and assume the URI
is alwasy scheme-qualified. This matcher will look no further
than the leading colon, and declare "no match" otherwise.
The discussion below explains why.
See: http://tools.ietf.org/html/rfc3986):
The following regex parses URIs:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
Given the following URI:
http://www.ics.uci.edu/pub/ietf/uri/#Related
The captured subexpressions are:
$1 = http:
$2 = http
$3 = //www.ics.uci.edu
$4 = www.ics.uci.edu
$5 = /pub/ietf/uri/
$6 =
$7 =
$8 = #Related
$9 = Related
N0TE:
A URI can be non-scheme qualified because $1 is optional. Therefore,
the following are all exaples of valid non-scheme qualified URIS:
""
"moo@cow.com"
"moo@cow.com?wow"
"moo@cow.com?wow#zow"
"moo@cow.com#zow"
"/"
"/moo/cow"
"/moo/cow?wow"
"/moo/cow?wow#zow"
"/moo/cow#zow"
"//moo/cow"
"//moo.com/cow"
"//moo.com/cow/"
"//moo.com/cow?wow"
"//moo.com/cow?wow#zow"
"//moo.com/cow#zow"
"//moo.com:8080/cow"
"//moo.com:/cow"
"//moo.com:8080/cow?wow"
"//moo.com:8080/cow?wow#zow"
"//moo.com:8080/cow#zow"
"///moo/cow"
"///moo/cow?wow"
"///moo/cow?wow#zow"
"///moo/cow#zow"
And so forth...
Thus the business end of things as far as scheme matching is: $2,
Most schemes will have a $3 that starts with '//', but not all.
Specificially, the following have no "network path '//' segment,
or aren't required to (source: http://en.wikipedia.org/wiki/URI_scheme):
cid data dns fax go h323 iax2 mailto mid news pres sip
sips tel urn xmpp about aim callto feed magnet msnim
psyc skype sms stream xfire ymsgr
Visually the parts are as follows:
foo://example.com:10042/over/there?name=ferret#nose
\_/ \_______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose
This is useful for classifying URLs for things like whether or not
they're supported by an application.
For example, the LinkValidationService supports http, and https,
is willing to ignore certain well-formed URLs, but treats URLs
will unknown and unsupported protocols as broken. Concretely,
we'd like to avoid treating something like the following one
as being non-broken even though you can't apply GET or HEAD
to it.
Email
As of June 2007,IANA had over 70 registered and provisional protocols
listed at http://www.iana.org/assignments/uri-schemes.html but sometimes
people create their own too (e.g.: cvs). Here's the official list:
aaa aaas acap afs cap cid crid data dav dict dns dtn fax file
ftp go gopher h323 http https iax2 icap im imap info ipp iris
iris.beep iris.lwz iris.xpc iris.xpcs ldap mailserver mailto
mid modem msrp msrps mtqp mupdate news nfs nntp opaquelocktoken
pop pres prospero rtsp service shttp sip sips snmp soap.beep
soap.beeps tag tel telnet tftp thismessage tip tn3270 tv urn
vemmi wais xmlrpc.beep xmlrpc.beeps xmpp z39.50r z39.50s
- See Also:
- Serialized Form
Method Summary |
boolean |
matches(java.lang.String uri)
Returns true if the URL's protocol is in the of
being matched. |
void |
setExtensions(java.util.List protocols)
Set the protocols case insensitively (cannonicalized to lower-case). |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
UriSchemeNameMatcher
public UriSchemeNameMatcher()
- Default constructor.
setExtensions
public void setExtensions(java.util.List protocols)
- Set the protocols case insensitively (cannonicalized to lower-case).
- Parameters:
protocols
-
matches
public boolean matches(java.lang.String uri)
- Returns true if the URL's protocol is in the of
being matched. Everything up to but not including
the intial colon is
- Specified by:
matches
in interface org.alfresco.util.NameMatcher
Copyright © 2005 - 2010 Alfresco Software, Inc. All Rights Reserved.