Email addresses, URLs and telephone numbers are examples of strings that have to match a
certain pattern, typically defined by a regular expression. It's quite common to have attributes
with such values. For readability, it's preferable to use predefined datatypes, like
Email
, URL
and PhoneNumber
, in an information design
model, instead of defining them as string-valued attributes with a pattern constraint.
In a (JavaScript or Java) model class definition, these attributes have to be implemented as string-valued attributes with a pattern constraint. It is, however, desirable to have built-in support for them, like in the data/document format definition language XML Schema.
There is much variation in the format of phone numbers across
different countries. Consequently, it doesn't make sense trying to
capture all these different formatting options with one pattern. The
HTML5 specification doesn't define any syntax rules for phone numbers in
the context of input
elements of type "tel", except that
they can't include line breaks. The precise formatting assumptions and
corresponding validation is left to the application developer who may
employ a specific pattern for enforcing a particular national or
international phone number format.
A simple international phone number format has been defined by the International Telecommunication Union's ITU-T E123 standard. It requires that phone numbers include a leading plus sign and allows only spaces to separate groups of digits. ITU has also limited the length of phone numbers to at most 15 digits. Since the shortest international phone numbers in use contain seven digits, we can use these two constraints in the following JavaScript definition of a regular expression for international phone numbers:
var ituPhoneNumberPattern = /^\+(?:[0-9] ?){6,14}[0-9]$/;
The format of email addresses has originally been defined by the
International Engineering Task Force
(IETF), but due to several weaknesses it has been redefined
in the HTML5 specification in the context of introducing
input
elements of type "email". The HTML5 specification
provides a regular
expression for email addresses, which can be used for validating
strings that are supposed to be email addresses.
The format of URLs has originally been defined by the International Engineering Task Force (IETF),
but due to several weaknesses and the historical confusion about URLs
versus URIs versus IRIs, it has been redefined in the URL specification of the
WHAT Working Group. The syntax rules
defined in this specification also define the validity of URL strings in
input
elements of type "url" introduced by HTML5.
There are many
proposals how to capture the complex URL parsing rules with a
regular expression, but most of them are either too strict or too lax.
The solution by Diego
Perini stands out and seems to be a pretty good approximation. As
soon as the new built-in JavaScript object type URL
is
supported by all browsers, a precise validation check will be possible
by trying to create a URL
object with a given URL string,
and then catch any DOMException of type
SYNTAX_ERROR
.