The Resource Description Framework (RDF), together with its extension RDF Schema, is a logical formalism that allows
formalizing information models in the form of RDF vocabularies consisting of class definitions and property definitions, where both class names and property names are URIs (representing globally unique identifiers);
representing propositional information (in the form of statements about individuals) on the Web, embedded in web pages or in the form of special web data sources.
RDF is the basis of the Semantic Web. It has several syntaxes, including the textual XML-based syntax of RDF/XML and the visual syntax of RDF Graphs.
Consider the Book
class defined in the following
class diagram
The corresponding RDF vocabulary, with one class definition and three property definitions, is defined in the following RDF graph:
In an RDF graph, nodes with an elliptic shape represent "resources" (like
properties and classes), and arrows represent relationships defined by a
property. Each arrow between two nodes represents a statement (also
called "triple"). For instance the rdfs:range
arrow between
year
and xs:int
represents the statement that
the range of the property year
is the XML Schema datatype
xs:int
, where xs
is a namespace prefix for the
XML Schema namespace.
Notice that RDF has the predefined meta-classes
rdfs:Class
and rdf:Property
, used to define
classes and their properties with the help of the predefined property
rdf:type
. For instance the rdfs:type
arrow
between year
and rdf:Property
represents the
statement that year
is of type rdf:Property
,
that is, it is defined to be an RDF property.
RDF graphs are a formalism for theoretical purposes. They can be used for illustrating simple examples. As opposed to UML class diagrams, they are not useful for visually expressing realistic vocabularies, due to their convolution and unnecessary visual complexity.
The domain of a property has to be defined explicitly in an RDF
vocabulary (with an rdfs:domain
property statement), as
opposed to a UML class diagram where it is defined implicitly. While it
is natural to define properties in the context of a class, as in UML,
RDF allows defining properties independently of any class.
The RDF/XML syntax allows publishing an RDF vocabulary on the Web.
For instance, the simple Book
vocabulary defined in the RDF
graph above, can be represented by the following RDF/XML document:
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/ex1"> <rdfs:Class rdf:ID="#Book"/> <rdf:Property rdf:ID="#isbn"> <rdfs:domain rdf:resource="#Book"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/> </rdf:Property> <rdf:Property rdf:ID="#title"> <rdfs:domain rdf:resource="#Book"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/> </rdf:Property> <rdf:Property rdf:ID="#year"> <rdfs:domain rdf:resource="#Book"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#int"/> </rdf:Property> </rdf:RDF>
Notice that the values of the rdf:resource
attribute
must be URIs. If an attribute value is a fragment identifier like
#Book
, it represents a relative URI and is resolved into a
full URI by appending the fragment identifier to the in-scope base URI,
which may be defined with the xml:base
attribute.
If an attribute value is an absolute URI like "http://www.w3.org/2001/XMLSchema#string", it contains a full namespace URI (like "http://www.w3.org/2001/XMLSchema"), even if a namespace prefix (like "xsd" or "xs") is defined for it. This is because namespace prefixes can only be used for XML element and attribute names, but not for attribute values, which unfortunately makes RDF/XML hard to read for human users.
Notice that the RDF formalization of our simple UML class model above has several shortcomings:
It does not express the constraints that all three properties are mandatory and single-valued, which they are by default in UML.
It does not express the constraints that the ISBN property, as a standard identifier (or primary key) attribute, is mandatory and unique.
We show how to solve these two issues with the greater expressivity of OWL below.
The propositional information items, or fact statements, expressible with RDF are
classification statements like "ex:Book
is a
rdfs:Class
" or "urn:isbn:006251587X
is a
ex:Book
", and
property statements of the sort "the ex:isbn
property value of urn:isbn:006251587X
is
'006251587X'".
Consequently, for a UML object definition like
we obtain several RDF fact statements:
the classification statement
<rdf:Description rdf:about=“urn:isbn:006251587X”> <rdf:type rdf:resource="http://example.org/ex1#Book"/> </rdf:Description>
which can alternatively be expressed in a more concise way as
<ex:Book rdf:ID="urn:isbn:006251587X"/>
the three property statements
<rdf:Description rdf:about=“urn:isbn:006251587X”> <ex:isbn>006251587X</ex:isbn> <ex:title>Weaving the Web</ex:title> <ex:year>2000</ex:year> </rdf:Description>
which can also be merged into one rdf:Description
element:
<rdf:Description rdf:about=“urn:isbn:006251587X”> <rdf:type rdf:resource="http://example.org/ex1#Book"/> <ex:isbn>006251587X</ex:isbn> <ex:title>Weaving the Web</ex:title> <ex:year>2000</ex:year> </rdf:Description>
There are many use cases for machine-readable data (e.g., about people, events, products, etc.) embedded in web documents. For instance, search engines like Google can use such structured data for providing more meaningful search results.
Structured data, or meta-data,
can be embedded in a web document by either adding a JSON-LD script
element containing it, or by annotating the document's content, e.g.,
the HTML elements of a web page, with RDFa.
Very limited annotation approaches, called "microformats" (proposed around 2005), are the historic predecessors of the general annotation language RDFa, which is derived from RDF. Some microfomats, like vCard and vEvent, are still being used today, but they are increasingly replaced with one of the two general formats RDFa and JSON-LD.
The main author of HTML5, Ian Hickson, has proposed an alternative general annotation language, called microdata, with the goal to simplify RDFa and remedy its usability issues (in particular, by dropping its use of XML namespaces). Despite the (rather unfortunate) choice of using different names for the same annotation concepts (like "itemprop" instead of "property"), Hickson's microdata proposal succeeded to show
how to get essentially the same annotation functionality at lower usability costs, and
how to integrate annotations with the DOM.
Since Hickson ended his collaboration with the W3C, the microdata proposal did not succeed to get an official W3C status, and web browsers have discontinued their support for it. However, it triggered a W3C proposal to use the RDFa Lite subset of RDFa, which "can be applied to most simple to moderate structured data markup tasks, without burdening the authors with additional complexities".
We present a simple example for using structured data in a web page. Consider the following HTML fragment:
<p> My name is Carly Rae Jepsen. Call me maybe at 1-800-2437715. </p>
For this content, we may want to code the information that
the available information is about an entity of type
Person
, which has been defined as a class by the
search engine standard vocabulary schema.org
;
the name of the person is "Carly Rae Jepsen";
the telephone number of the person is "1-800-2437715".
Using the RDFa attributes typeof
, vocab
and property
, we can code this information by adding the
following annotations to the HTML content:
<p typeof="Person" vocab="http://schema.org/"> My name is <span property="name">Carly Rae Jepsen</span>. Call me maybe at <span property="telephone">1-800-2437715</span>. <p>
Using JSON-LD, as recommended by Google, we need to add a
script
element of type "application/ld+json" containing the
meta-data:
<script type="application/ld+json"> { "@context": "http://schema.org", "@type": "Person", "name": "Carly Rae Jepsen", "telephone": "1-800-2437715" } </script>
The propositional information expressed with RDFa annotations and JSON-LD corresponds to the following RDF/XML code:
<rdf:Description xmlns:schema="http://schema.org/"> <rdf:type rdf:resource="http://schema.org/Person"/> <schema:name> Carly Rae Jepsen </schema:name> <schema:telephone> 1-800-2437715 </schema:url> </rdf:Description>
OWL extends RDF by adding many additional language elements for
expressing constraints, equalities and derived classes and properties in
the context of defining vocabularies. Facts are expressed as in RDF
(e.g., with rdf:Description
).
OWL provides its own predefined language elements for defining classes and properties:
The predefined class owl:Class
is a subclass of
rdfs:Class
.
The predefined class owl:DatatypeProperty
is a
subclass of rdf:Property
. It classifies attributes. Therefore, the values of an
owl:DatatypeProperty
are data literals.
The predefined class owl:ObjectProperty
is a
subclass of rdf:Property
. It classifies reference properties corresponding to
unidirectional binary associations. Since the values of a
reference property are object references, the values of an
owl:ObjectProperty
are object references in the form
of resource URIs.
We only show with the help of an example that an OWL vocabulary can represent a class diagram more faithfully than the corresponding RDF vocabulary by allowing to express certain constraints.
Consider the standard identifier attribute isbn
defined in the Book
class. In an RDF vocabulary, this
attribute is defined in the following way:
<rdf:Property rdf:ID="#isbn"> <rdfs:domain rdf:resource="#Book"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/> </rdf:Property>
There are two issues with this RDF definition of an attribute:
It doesn't make it explicit that the property defined is an attribute, and not a reference property. This can only be inferred by finding out that the range class is a datatype, and not an object type.
It does not constrain the attribute to have exactly one value, as implied by the defaults of UML class diagram semantics.
Using OWL, we can remedy these shortcomings of RDF. The
following OWL property definition makes it explicit that the property
http://example.org/ex1#isbn
is an attribute, while the
added OWL restriction defines an "exactly one" cardinality constraint
for it:
<owl:DatatypeProperty rdf:ID="#isbn"> <rdfs:domain rdf:resource="#Book"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/> </owl:DatatypeProperty> <owl:Restriction> <owl:onProperty rdf:resource="#isbn" /> <owl:cardinality>1</owl:cardinality> </owl:Restriction>
Since the ISBN attribute of the Book
class has been
designated as the standard identifier attribute in the UML class diagram
above, we should define a uniqueness constraint for it. We can do this
by including an owl:hasKey
element within the class
definition:
<owl:Class rdf:ID="#Book">
<owl:hasKey rdf:parseType="Collection">
<owl:ObjectProperty rdf:about="hasSSN">
</owl:hasKey>
</owl:Class>
Both RDF and OWL have many usability issues. Especially OWL is so difficult to use that most potential users will be discouraged by it.
Because OWL was created by a community that is more concerned with formal logic than with information modeling and is not familiar with the concepts and terminology established in information modeling, they have introduced many new unfamiliar terms for concepts that had already been established and named in information modeling. They have even introduced duplicate names within OWL: an attribute is in most places called "data property", but in some places it is called "datatype property" (specifically in OWL/RDF).
For historical reasons, RDF comes with a strange jargon. Especially, its "subject"-"predicate"-"object" terminology sucks.
For historical reasons, RDF comes with two different XML namespaces, typically in the from of the two namespace prefixes "rdf" and "rdfs". The history of a language should not be imposed on its syntax. Users shouldn't have to bother about which prefix to use.
RDF is using the uncommon term "IRI" (as an abbreviation of "International Resource Identifier"), following the unfortunate naming history from "URL" via "URI" to "IRI", while the What Working Group's URL Living Standard has reverted this naming history.
For practical purposes, RDF is incomplete:
it does not make an explicit syntactic distinction between attributes (having a datatype as range) and reference properties (having an object type as range);
it does not allow expressing simple class definitions, which include mandatory value and single-value constraints, in an RDF vocabulary.
OWL is needed for getting these fundamental features.
it uses an uncommon terminology: e.g., "data property" instead of attribute, "restriction" instead of constraint;
some of its elements have confusing names: e.g., "ObjectIntersectionOf" does not denote an intersection of objects, but of object types, and "DataSomeValuesFrom" actually refers to "some data values from";
many of its language elements are kind of unnatural and hard to grasp (much less to remember): e.g., an exactly-one-value property constraint cannot be expressed in the definition of a class along with the property declaration, but requires a separate Restriction element (as shown above).