7. Subtyping and Inheritance in Computational Languages

Subtyping and inheritance have been supported in Object-Oriented Programming (OOP), in database languages (such as SQL99), in the XML schema definition language XML Schema, and in other computational languages, in various ways and to different degrees. At its core, subtyping in computational languages is about defining type hierarchies and the inheritance of features: properties, constraints and methods in OOP; table columns and constraints in SQL99; elements, attributes and constraints in XML Schema.

In general, it is desirable to have support for multiple classification and multiple inheritance in type hierarchies. Both language features are closely related and are considered to be advanced features, which may not be needed in many applications or can be dealt with by using workarounds.

Multiple classification means that an object has more than one direct type. This is mainly the case when an object plays multiple roles at the same time, and therefore directly instantiates multiple classes defining these roles.

Multiple inheritance is typically also related to role classes. For instance, a student assistant is a person playing both the role of a student and the role of an academic staff member, so a corresponding OOP class StudentAssistant inherits from both role classes Student and AcademicStaffMember. In a similar way, in our example model above, an AmphibianVehicle inherits from both role classes LandVehicle and WaterVehicle.

7.1. Subtyping and inheritance in OOP

The minimum level of support for subtyping in OOP, as provided, for instance, by Java and C#, allows defining inheritance of properties and methods in single-inheritance hierarchies, which can be inspected with the help of an is-instance-of predicate that allows testing if a class is the direct or an indirect type of an object. In addition, it is desirable to be able to inspect inheritance hierarchies with the help of

  1. a predefined instance-level property for retrieving the direct type of an object (or its direct types, if multiple classification is allowed);

  2. a predefined type-level property for retrieving the direct supertype of a type (or its direct supertypes, if multiple inheritance is allowed).

A special case of an OOP language is JavaScript, which did originally not have an explicit language element for defining classes, but only for defining constructor functions. Due to its dynamic programming features, JavaScript allows using various code patterns for implementing classes, subtyping and inheritance. In modern JavaScript, starting from ES2015, defining a superclass and a subclass is straightforward. First, we define a base class, Person, with two properties, firstName and lastName:

class Person {
  constructor (first, last) {
    // assign base class properties
    this.firstName = first; 
    this.lastName = last; 
  }
}

Then, we define a subclass, Student, with one additional property, studentNo:

class Student extends Person {
  constructor (first, last, studNo) {
    // invoke constructor of superclass
    super( first, last);
    // assign additional properties
    this.studentNo = studNo;
  }
}

Notice how the constructor of the superclass is invoked with super( first, last) for assigning the superclass properties.

7.2. Subtyping and inheritance with XML Schema

In XML Schema, a subtype can be defined by extending or by restricting an existing complex type. While extending a complex type means extending its intension by adding elements or attributes, restricting a complex type means restricting its extension by adding constraints.

We can define a complex type Person and a subtype Student by extending Person in the following way:

<xs:complexType name="Person">
  <xs:attribute name="firstName" type="xs:string" />
  <xs:attribute name="lastName" type="xs:string" />
  <xs:attribute name="gender" type="GenderValue" />
</xs:complexType>

<xs:complexType name="Student">
  <xs:extension base="Person">
    <xs:attribute name="studentNo" type="xs:string" />
  </xs:extension>
</xs:complexType>

We can define a subtype FemalePerson by restricting Person in the following way:

<xs:complexType name="FemalePerson">
  <xs:restriction base="Person">
    <xs:attribute name="firstName" type="xs:string" />
    <xs:attribute name="lastName" type="xs:string" />
    <xs:attribute name="gender" type="GenderValue" 
        use="fixed" value="f" />
  </xs:restriction>
</xs:complexType>

Notice that by fixing the value of the gender attribute to "f", we define a constraint that is only satisfied by the female instances of Person.

7.3. Subtyping and inheritance with OWL

In the Web Ontology Language OWL, property definitions are separated from class definitions and properties are not single-valued, but multi-valued by default. Consequently, standard properties need to be declared as functional. Thus, we obtain the following code for expressing that Person is a class having the property name:

<owl:Class rdf:ID="Person"/>
<owl:DatatypeProperty rdf:ID="name">
    <rdfs:domain rdf:resource="#Person"/>
    <rdfs:range rdf:resource="xsd:string"/>
    <rdf:type rdf:resource="owl:FunctionalProperty"/>
</owl:DatatypeProperty>

OWL allows stating that a class is a subclass of another class in the following way:

<owl:Class rdf:ID="Student">
    <rdfs:subClassOf rdf:resource="#Person"/>
</owl:Class>
<owl:DatatypeProperty rdf:ID="studentNo">
    <rdfs:domain rdf:resource="#Student"/>
    <rdfs:range rdf:resource="xsd:string"/>
    <rdf:type rdf:resource="owl:FunctionalProperty"/>
</owl:DatatypeProperty>

For better usability, OWL should allow to define the properties of a class within a class definition, using the case of functional properties as the default case.

7.4. Representing class hierarchies with SQL database tables

A standard DBMS stores information (objects) in the rows of tables, which have been conceived as set-theoretic relations in classical relational database systems. The relational database language SQL is used for defining, populating, updating and querying such databases. But there are also simpler data storage techniques that allow to store data in the form of table rows, but do not support SQL. In particular, key-value storage systems, such as JavaScript's Local Storage API, allow storing a serialization of a JS entity table (a map of entity records) as the string value associated with the table name as a key.

While in the classical version of SQL (SQL92) there is no support for subtyping and inheritance, this has been changed in SQL99. However, the subtyping-related language elements of SQL99 have only been implemented in some DBMS, for instance in the open source DBMS PostgreSQL. As a consequence, for making a design model that can be implemented with various frameworks using various SQL DBMSs (including weaker technologies such as MySQL and SQLite), we cannot use the SQL99 features for subtyping, but have to model inheritance hierarchies in database design models by means of plain tables and foreign key dependencies. This mapping from class hierarchies to relational tables (and back) is the business of Object-Relational-Mapping frameworks such as JPA Providers (like Hibernate), Microsoft's Entity Framework, or the Active Record approach of the Rails framework.

There are essentially three alternative approaches how to represent a class hierarchy with database tables:

  1. Single Table Inheritance (STI) is the simplest approach, where the entire class hierarchy is represented by a single table, containing columns for all attributes of the root class and of all its subclasses, and named after the name of the root class.

  2. Table per Class Inheritance (TCI) is an approach, where each class of the hierarchy is represented by a corresponding table containing also columns for inherited properties, thus repeating the columns of the tables that represent its superclasses.

  3. Joined Tables Inheritance (JTI) is a more logical approach, where each segment subclass is represented by a corresponding table (subtable) connected to the table representing its superclass (supertable) via its primary key referencing the primary key of the supertable, such that the (inherited) properties of the superclass are not represented as columns in subtables.

Notice that the STI approach is closely related to the Class Hierarchy Merge design pattern discussed in Section 6 above. Whenever this design pattern has already been applied in the design model, or the design model has already been re-factored according to this design pattern, the class hierarchies concerned (their subclasses) have been eliminated in the design, and consequently also in the data model to be coded in the form of class definitions in the app's model layer, so there is no need anymore to map class hierarchies to single tables. Otherwise, the design model contains a class hierarchy that is implemented with a corresponding class hierarchy in the app's model layer, which would be mapped to database tables with the help of the STI approach.

We illustrate the use of these approaches with the help of two simple examples. The first example is the Book class hierarchy, which is shown in Figure 12.1 above. The second example is the class hierarchy of the Person roles Employee, Manager and Author, shown in the class diagram in Figure 12.8 below.

Figure 12.8. An information design model with a Person roles hierarchy
An information design model with a Person roles hierarchy

7.4.1. Single Table Inheritance

Consider the single-level class hierarchy shown in Figure 12.1 above, which is an incomplete disjoint segmentation of the class Book, as the design for the model classes of an MVC app. In such a case, whenever we have a model class hierarchy with only one level (or only a few levels) of subtyping and each subtype has only a few additional properties, it's preferable to use STI, so we model a single table containing columns for all attributes such that the columns representing additional attributes of segment subclasses ("segment attributes") are optional, as shown in the SQL table model in Figure 12.9 below.

Figure 12.9. An SQL table model with a single table representing the Book class hierarchy
An SQL table model with a single table representing the Book class hierarchy

It is a common approach to add a special discriminator column for representing the category of each row corresponding to the subclass instantiated by the represented object. Such a column would normally be string-valued, but constrained to one of the names of the subclasses. If the DBMS supports enumerations, it could also be enumeration-valued. We use the name category for the discriminator column, which, in the case of our Book class hierarchy example, has a frozen value constraint since the textbook-biography segmentation is rigid.

Based on the category of a book, we have to enforce that if and only if it is "TextBook", its attribute subjectArea has a value, and if and only if it is "Biography", its attribute about has a value. This implied constraint is expressed in the invariant box attached to the Book table class in the class diagram above, where the logical operator keyword "IFF" represents the logical equivalence operator "if and only if". It needs to be implemented in the database, e.g., with an SQL table CHECK clause or with SQL triggers.

When the given segmentation is disjoint, a single-valued enumeration attribute category is used for representing the information to which subclass an instance belongs. Otherwise, if it is non-disjoint, a multi-valued enumeration attribute categories is used for representing the information to which subclasses an instance belongs. Such an attribute can be implemented in SQL by defining a string-valued column for representing a set of enumeration codes or labels as corresponding string concatenations.

Consider the class hierarchy shown in Figure 12.8 above. With only three additional attributes defined in the subclasses Employee, Manager and Author, this class hierarchy can again be mapped with the STI approach, as shown in the SQL table model Figure 12.10 below.

Figure 12.10. An STI table model representing the Person roles hierarchy
An STI table model representing the Person roles hierarchy

Notice that now the discriminator column categories is multi-valued, since the segmentation of Person is not disjoint, but overlapping, implying that a Person object may belong to several categories. Notice also that, since a role segmentation (like Employee, Manager, Author) is not rigid, the discriminator column categories does not have a frozen value constraint.

An example of an admissible population for this model is the following:

people
person_idnamecategoriesbiographyemp_nodepartment
1001Harry WagnerAuthor, EmployeeBorn in Boston, MA, in 1956, ...21035
1002Peter BossManager23107Sales
1003Tom Daniels
1077Immanuel KantAuthorImmanuel Kant (1724-1804) was a German philosopher ...

Notice that the Person table contains four different types of people:

  1. A person, Harry Wagner, who is both an author (with a biography) and an employee (with an employee number).

  2. A person, Peter Boss, who is a manager (a special type of employee), managing the Sales department.

  3. A person, Tom Daniels, who is neither an author nor an employee.

  4. A person, Immanuel Kant, who is an author (with a biography).

Pros of the STI approach: It leads to a faithful representation of the subtype relationships expressed in the original class hierarchy; in particular, any row representing a subclass instance (an employee, manager or author) also represents a superclass instance (a person).

Cons: (1) In the case of a multi-level class hierarchy where the subclasses have little in common, the STI approach does not lead to a good representation. (2) The structure of the given class hierarchy in terms of its elements (classes) is only implicitly preserved.

7.4.2. Table per Class Inheritance

In a more realistic model, the subclasses of Person shown in Figure 12.8 above would have many more attributes, so the STI approach would be no longer feasible. In the TCI approach we get the SQL table model shown in Figure 12.11 below. A TCI model represents each concrete class of the class hierarchy as a table, such that each segment subclass is represented by a table that also contains columns for inherited properties, thus repeating the columns of the table that represents the superclass.

Figure 12.11. A TCI table model representing the Person roles hierarchy
A TCI table model representing the Person roles hierarchy

A TCI table model can be derived from the information design model by performing the following steps:

  1. Replacing the standard ID property modifier {id} in all classes with {pkey} for indicating that the standard ID property is a primary key.

  2. Replacing the singular (capitalized) class names (Person, Author, etc.) with pluralized lowercase table names (people, authors, etc.), and replacing camel case property names (personId and empNo) with lowercase underscore-separated names for columns (person_id and emp_no).

  3. Adding a «table» stereotype to all class rectangles.

  4. Replacing the platform-independent datatype names with SQL datatype names.

  5. Dropping all generalization/inheritance arrows and adding all columns of supertables (such as person_id and name from people) to their subtables (authors and employees).

Each table would only be populated with rows corresponding to the direct instances of the represented class. An example of an admissible population for this model is the following:

people
personIdname
1003Tom Daniels
authors
person_idnamebiography
1001Harry WagnerBorn in Boston, MA, in 1956, ...
1077Immanuel KantImmanuel Kant (1724-1804) was a German philosopher ...
employees
person_idnameemp_no
1001Harry Wagner21035
managers
person_idnameemp_nodepartment
1002Peter Boss23107Sales

Pros of the TCI approach: (1) The structure of the given class hierarchy in terms of its elements (classes) is explicitly preserved. (2) When the segmentations of the given class hierarchy are disjoint, TCI leads to memory-efficient non-redundant storage.

Cons: (1) The TCI approach does not yield a faithful representation of the subtype relationships expressed in the original class hierarchy. In particular, for any row representing a subclass instance (an employee, manager or author) there is no information that it represents a superclass instance (a person). Thus, the TCI database schema does not inform about the represented subtype relationships; rather, this meta-information, which is kept in the app's class model, is de-coupled from the database. (2) The TCI approach requires repeating column definitions, which is a form of schema redundancy. (3) The TCI approach may imply data redundancy whenever the segment subclasses overlap. In our example, authors can also be employees, so for any person in the overlap, we would need to duplicate the data storage for all columns representing properties of the superclass (in our example, this only concerns the property name).

7.4.3. Joined Tables Inheritance

For avoiding the data redundancy problem of TCI in the case of overlapping segmentations, we could take the JTI approach as exemplified in the SQL table model shown in Figure 12.12 below. This model connects tables representing subclasses (subtables) to tables representing their superclasses (supertables) by defining their primary key column(s) to be at the same time a foreign key referencing their supertable's primary key. Notice that foreign keys are visualized in the form of UML dependency arrows stereotyped with «fkey» and annotated at their source table side with the name of the foreign key column.

Figure 12.12. A JTI table model representing the Person roles hierarchy
A JTI table model representing the Person roles hierarchy

An example of an admissible population for this model is the following:

people
person_idname
1001Harry Wagner
1002Peter Boss
1003Tom Daniels
1077Immanuel Kant
authors
person_idbiography
1001Born in Boston, MA, in 1956, ...
1077Immanuel Kant (1724-1804) was a German philosopher ...
employees
person_idemp_no
100121035
100223107
managers
person_iddepartment
1002Sales

Pros of the JTI approach: (1) Subtyping relationships and the structure of class hierarchies are explicitly preserved. (2) Data redundancy in the case of overlapping segmentations is avoided.

Cons: (1) The main disadvantage of the JTI approach is that for querying a subclass, join queries (for joining the segregated entity data) are required, which may create performance issues.