4. Excursion: Formalizing Information Models with RDF and OWL

4. Excursion: Formalizing Information Models with RDF and OWL
Prev	Chapter 5. Information Modeling	Next

The Resource Description Framework (RDF), together with its extension RDF Schema, is a logical formalism that allows

formalizing information models in the form of RDF vocabularies consisting of class definitions and property definitions, where both class names and property names are URIs (representing globally unique identifiers);
representing propositional information (in the form of statements about individuals) on the Web, embedded in web pages or in the form of special web data sources.

RDF is the basis of the Semantic Web. It has several syntaxes, including the textual XML-based syntax of RDF/XML and the visual syntax of RDF Graphs.

4.1. RDF vocabularies

Consider the Book class defined in the following class diagram

The corresponding RDF vocabulary, with one class definition and three property definitions, is defined in the following RDF graph:

In an RDF graph, nodes with an elliptic shape represent "resources" (like properties and classes), and arrows represent relationships defined by a property. Each arrow between two nodes represents a statement (also called "triple"). For instance the rdfs:range arrow between year and xs:int represents the statement that the range of the property year is the XML Schema datatype xs:int, where xs is a namespace prefix for the XML Schema namespace.

Notice that RDF has the predefined meta-classes rdfs:Class and rdf:Property, used to define classes and their properties with the help of the predefined property rdf:type. For instance the rdfs:type arrow between year and rdf:Property represents the statement that year is of type rdf:Property, that is, it is defined to be an RDF property.

RDF graphs are a formalism for theoretical purposes. They can be used for illustrating simple examples. As opposed to UML class diagrams, they are not useful for visually expressing realistic vocabularies, due to their convolution and unnecessary visual complexity.

The domain of a property has to be defined explicitly in an RDF vocabulary (with an rdfs:domain property statement), as opposed to a UML class diagram where it is defined implicitly. While it is natural to define properties in the context of a class, as in UML, RDF allows defining properties independently of any class.

The RDF/XML syntax allows publishing an RDF vocabulary on the Web. For instance, the simple Book vocabulary defined in the RDF graph above, can be represented by the following RDF/XML document:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ex="http://example.org/ex1">
 <rdfs:Class rdf:ID="#Book"/>
 <rdf:Property rdf:ID="#isbn">
  <rdfs:domain rdf:resource="#Book"/>
  <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
 </rdf:Property>
 <rdf:Property rdf:ID="#title">
  <rdfs:domain rdf:resource="#Book"/>
  <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
 </rdf:Property>
 <rdf:Property rdf:ID="#year">
  <rdfs:domain rdf:resource="#Book"/>
  <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#int"/>
 </rdf:Property>
</rdf:RDF>

Notice that the values of the rdf:resource attribute must be URIs. If an attribute value is a fragment identifier like #Book, it represents a relative URI and is resolved into a full URI by appending the fragment identifier to the in-scope base URI, which may be defined with the xml:base attribute.

If an attribute value is an absolute URI like "http://www.w3.org/2001/XMLSchema#string", it contains a full namespace URI (like "http://www.w3.org/2001/XMLSchema"), even if a namespace prefix (like "xsd" or "xs") is defined for it. This is because namespace prefixes can only be used for XML element and attribute names, but not for attribute values, which unfortunately makes RDF/XML hard to read for human users.

Notice that the RDF formalization of our simple UML class model above has several shortcomings:

It does not express the constraints that all three properties are mandatory and single-valued, which they are by default in UML.
It does not express the constraints that the ISBN property, as a standard identifier (or primary key) attribute, is mandatory and unique.

We show how to solve these two issues with the greater expressivity of OWL below.

4.2. RDF fact statements

The propositional information items, or fact statements, expressible with RDF are

classification statements like "ex:Book is a rdfs:Class" or "urn:isbn:006251587X is a ex:Book", and
property statements of the sort "the ex:isbn property value of urn:isbn:006251587X is '006251587X'".

Consequently, for a UML object definition like

we obtain several RDF fact statements:

the classification statement

<rdf:Description rdf:about=“urn:isbn:006251587X”>
 <rdf:type rdf:resource="http://example.org/ex1#Book"/>
</rdf:Description>

which can alternatively be expressed in a more concise way as

<ex:Book rdf:ID="urn:isbn:006251587X"/>

the three property statements

<rdf:Description rdf:about=“urn:isbn:006251587X”>
 <ex:isbn>006251587X</ex:isbn>
 <ex:title>Weaving the Web</ex:title>
 <ex:year>2000</ex:year>
</rdf:Description>

which can also be merged into one rdf:Description element:

<rdf:Description rdf:about=“urn:isbn:006251587X”>
 <rdf:type rdf:resource="http://example.org/ex1#Book"/>
 <ex:isbn>006251587X</ex:isbn>
 <ex:title>Weaving the Web</ex:title>
 <ex:year>2000</ex:year>
</rdf:Description>

4.3. Expressing structured data in web documents

There are many use cases for machine-readable data (e.g., about people, events, products, etc.) embedded in web documents. For instance, search engines like Google can use such structured data for providing more meaningful search results.

Structured data, or meta-data, can be embedded in a web document by either adding a JSON-LD script element containing it, or by annotating the document's content, e.g., the HTML elements of a web page, with RDFa.

Very limited annotation approaches, called "microformats" (proposed around 2005), are the historic predecessors of the general annotation language RDFa, which is derived from RDF. Some microfomats, like vCard and vEvent, are still being used today, but they are increasingly replaced with one of the two general formats RDFa and JSON-LD.

The main author of HTML5, Ian Hickson, has proposed an alternative general annotation language, called microdata, with the goal to simplify RDFa and remedy its usability issues (in particular, by dropping its use of XML namespaces). Despite the (rather unfortunate) choice of using different names for the same annotation concepts (like "itemprop" instead of "property"), Hickson's microdata proposal succeeded to show

how to get essentially the same annotation functionality at lower usability costs, and
how to integrate annotations with the DOM.

Since Hickson ended his collaboration with the W3C, the microdata proposal did not succeed to get an official W3C status, and web browsers have discontinued their support for it. However, it triggered a W3C proposal to use the RDFa Lite subset of RDFa, which "can be applied to most simple to moderate structured data markup tasks, without burdening the authors with additional complexities".

We present a simple example for using structured data in a web page. Consider the following HTML fragment:

<p>
 My name is Carly Rae Jepsen. 
 Call me maybe at 1-800-2437715.
</p>

For this content, we may want to code the information that

the available information is about an entity of type Person, which has been defined as a class by the search engine standard vocabulary schema.org;
the name of the person is "Carly Rae Jepsen";
the telephone number of the person is "1-800-2437715".

Using the RDFa attributes typeof, vocab and property, we can code this information by adding the following annotations to the HTML content:

<p typeof="Person" vocab="http://schema.org/">
 My name is <span property="name">Carly Rae Jepsen</span>. 
 Call me maybe at <span property="telephone">1-800-2437715</span>.
<p>

Using JSON-LD, as recommended by Google, we need to add a script element of type "application/ld+json" containing the meta-data:

<script type="application/ld+json">
{
  "@context": "http://schema.org",
  "@type": "Person",
  "name": "Carly Rae Jepsen",
  "telephone": "1-800-2437715"
}
</script>

The propositional information expressed with RDFa annotations and JSON-LD corresponds to the following RDF/XML code:

<rdf:Description xmlns:schema="http://schema.org/">
  <rdf:type rdf:resource="http://schema.org/Person"/>
  <schema:name> Carly Rae Jepsen </schema:name>
  <schema:telephone> 1-800-2437715 </schema:url>
</rdf:Description>

4.4. OWL vocabularies and constraints

OWL extends RDF by adding many additional language elements for expressing constraints, equalities and derived classes and properties in the context of defining vocabularies. Facts are expressed as in RDF (e.g., with rdf:Description).

OWL provides its own predefined language elements for defining classes and properties:

The predefined class owl:Class is a subclass of rdfs:Class.
The predefined class owl:DatatypeProperty is a subclass of rdf:Property. It classifies attributes. Therefore, the values of an owl:DatatypeProperty are data literals.
The predefined class owl:ObjectProperty is a subclass of rdf:Property. It classifies reference properties corresponding to unidirectional binary associations. Since the values of a reference property are object references, the values of an owl:ObjectProperty are object references in the form of resource URIs.

We only show with the help of an example that an OWL vocabulary can represent a class diagram more faithfully than the corresponding RDF vocabulary by allowing to express certain constraints.

Consider the standard identifier attribute isbn defined in the Book class. In an RDF vocabulary, this attribute is defined in the following way:

<rdf:Property rdf:ID="#isbn">
 <rdfs:domain rdf:resource="#Book"/>
 <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</rdf:Property>

There are two issues with this RDF definition of an attribute:

It doesn't make it explicit that the property defined is an attribute, and not a reference property. This can only be inferred by finding out that the range class is a datatype, and not an object type.
It does not constrain the attribute to have exactly one value, as implied by the defaults of UML class diagram semantics.

Using OWL, we can remedy these shortcomings of RDF. The following OWL property definition makes it explicit that the property http://example.org/ex1#isbn is an attribute, while the added OWL restriction defines an "exactly one" cardinality constraint for it:

<owl:DatatypeProperty rdf:ID="#isbn">
 <rdfs:domain rdf:resource="#Book"/>
 <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:Restriction>
  <owl:onProperty rdf:resource="#isbn" />
  <owl:cardinality>1</owl:cardinality>
</owl:Restriction>

Since the ISBN attribute of the Book class has been designated as the standard identifier attribute in the UML class diagram above, we should define a uniqueness constraint for it. We can do this by including an owl:hasKey element within the class definition:

<owl:Class rdf:ID="#Book">
 <owl:hasKey rdf:parseType="Collection">
  <owl:ObjectProperty rdf:about="hasSSN">
 </owl:hasKey>
</owl:Class>

4.5. Usability issues of RDF and OWL

Both RDF and OWL have many usability issues. Especially OWL is so difficult to use that most potential users will be discouraged by it.

Because OWL was created by a community that is more concerned with formal logic than with information modeling and is not familiar with the concepts and terminology established in information modeling, they have introduced many new unfamiliar terms for concepts that had already been established and named in information modeling. They have even introduced duplicate names within OWL: an attribute is in most places called "data property", but in some places it is called "datatype property" (specifically in OWL/RDF).

Usability issues of RDF are:

For historical reasons, RDF comes with a strange jargon. Especially, its "subject"-"predicate"-"object" terminology sucks.
For historical reasons, RDF comes with two different XML namespaces, typically in the from of the two namespace prefixes "rdf" and "rdfs". The history of a language should not be imposed on its syntax. Users shouldn't have to bother about which prefix to use.
RDF is using the uncommon term "IRI" (as an abbreviation of "International Resource Identifier"), following the unfortunate naming history from "URL" via "URI" to "IRI", while the What Working Group's URL Living Standard has reverted this naming history.
For practical purposes, RDF is incomplete:
1. it does not make an explicit syntactic distinction between attributes (having a datatype as range) and reference properties (having an object type as range);
2. it does not allow expressing simple class definitions, which include mandatory value and single-value constraints, in an RDF vocabulary.
OWL is needed for getting these fundamental features.

Usability issues of OWL are:

it uses an uncommon terminology: e.g., "data property" instead of attribute, "restriction" instead of constraint;
some of its elements have confusing names: e.g., "ObjectIntersectionOf" does not denote an intersection of objects, but of object types, and "DataSomeValuesFrom" actually refers to "some data values from";
many of its language elements are kind of unnatural and hard to grasp (much less to remember): e.g., an exactly-one-value property constraint cannot be expressed in the definition of a class along with the property declaration, but requires a separate Restriction element (as shown above).

Prev	Up	Next
3. From a Conceptual Model via a Design Model to Class Models	Home	5. Summary