2. String Patterns

Email addresses, URLs and telephone numbers are examples of strings that have to match a certain pattern, typically defined by a regular expression. It's quite common to have attributes with such values. For readability, it's preferable to use predefined datatypes, like Email, URL and PhoneNumber, in an information design model, instead of defining them as string-valued attributes with a pattern constraint.

In a (JavaScript or Java) model class definition, these attributes have to be implemented as string-valued attributes with a pattern constraint. It is, however, desirable to have built-in support for them, like in the data/document format definition language XML Schema.

2.1. Telephone numbers

There is much variation in the format of phone numbers across different countries. Consequently, it doesn't make sense trying to capture all these different formatting options with one pattern. The HTML5 specification doesn't define any syntax rules for phone numbers in the context of input elements of type "tel", except that they can't include line breaks. The precise formatting assumptions and corresponding validation is left to the application developer who may employ a specific pattern for enforcing a particular national or international phone number format.

A simple international phone number format has been defined by the International Telecommunication Union's ITU-T E123 standard. It requires that phone numbers include a leading plus sign and allows only spaces to separate groups of digits. ITU has also limited the length of phone numbers to at most 15 digits. Since the shortest international phone numbers in use contain seven digits, we can use these two constraints in the following JavaScript definition of a regular expression for international phone numbers:

var ituPhoneNumberPattern = /^\+(?:[0-9] ?){6,14}[0-9]$/;

2.2. Email addresses

The format of email addresses has originally been defined by the International Engineering Task Force (IETF), but due to several weaknesses it has been redefined in the HTML5 specification in the context of introducing input elements of type "email". The HTML5 specification provides a regular expression for email addresses, which can be used for validating strings that are supposed to be email addresses.

2.3. URLs

The format of URLs has originally been defined by the International Engineering Task Force (IETF), but due to several weaknesses and the historical confusion about URLs versus URIs versus IRIs, it has been redefined in the URL specification of the WHAT Working Group. The syntax rules defined in this specification also define the validity of URL strings in input elements of type "url" introduced by HTML5.

There are many proposals how to capture the complex URL parsing rules with a regular expression, but most of them are either too strict or too lax. The solution by Diego Perini stands out and seems to be a pretty good approximation. As soon as the new built-in JavaScript object type URL is supported by all browsers, a precise validation check will be possible by trying to create a URL object with a given URL string, and then catch any DOMException of type SYNTAX_ERROR.