4.4. Specialization and Generalization

The concept of a subtype, or subclass, is a fundamental concept in natural language, linguistics, mathematics, and informatics. For instance, we say that a bird is an animal, or that the class of all birds is a subclass of the class of all animals. In linguistics, the noun "bird" is a hyponym of the noun "animal".

An entity type may be specialized by subtypes (for instance, Bird is specialized by Parrot) or generalized by supertypes (for instance, Bird and Mammal are generalized by Animal). Specialization and generalization are two sides of the same coin.

A subtype inherits all features from its supertypes. When a subtype inherits attributes, associations and constraints from a supertype, this means that these features need not be repeatedly rendered for the subtype in the class diagram, but the reader of the diagram has to understand that all features of a supertype also apply to its subtypes.

When an object type has more than one direct supertype, we have a case of multiple inheritance, which is common in conceptual modeling, but prohibited in many object-oriented programming languages, such as Java and C#, which only allow class hierarchies with a unique direct supertype for each object type.

Introducing Subtypes by Specialization

A new entity type may be introduced by specialization whenever it represents a special case of another entity type. We illustrate this for our example model shown in Figure 4-1, which we are going to extend by introducing episodes of TV series and biographies as special cases of movies. This means that TV episodes and biographies also have a title, a release date and a director, but in addition they have further attributes such as series name, season number and episode number for TV episodes and an about attribute for biographies. Consequently, we introduce the entity types TV episodes and biographies by specializing the entity type movies, that is, as subtypes of movies.

Figure 4-12. The entity types TV episodes and biographies specialize the entity type movies, thus inheriting its attributes.

Introducing Supertypes by Generalization

Consider the following model, which associates a director and actors with a movie.

Figure 4-13. The entity type "movies" is associated with the entity types "directors" and "actors".

Notice that the entity types actors and directors share a number of attributes due to the fact that both actors and directors are people, and being an actor as well as being a director are roles played by people. So, we may generalize these two entity types by adding a joint supertype people, with the two attributes name and birth date that are shared by them, as shown in the following diagram.

Figure 4-14. The entity types "directors" and "actors" have been generalized by the entity type "people".

When generalizing two or more entity types, we move those features that are shared by them to the newly added supertype where they get "centralized". In the case of actors and directors, this set of shared features includes name and birth date. In general, shared features may include attributes, associations and constraints.

Segmentations and Type Hierarchies

When an entity type is specialized by several subtypes that form a logical group, this is called a segmentation. A segmentation is complete, if the union of all subclass extensions is equal to the extension of the superclass (or, in other words, if all instances of the superclass instantiate some subclass). A segmentation is disjoint, if all subclasses are pairwise disjoint (or, in other words, if no instance of the superclass instantiates more than one subclass), otherwise, it is called overlapping. A complete and disjoint segmentation is a partition.

By default, when no constraint keyword is shown in a class diagram for a segmentation, then it is incomplete and overlapping. This is the case for the segmentation of people into directors and actors shown in Figure 4-14 above. Notice that this segmentation is overlapping because the director of a movie may also be an actor playing a certain role in that movie.

The segmentation of movies into TV episodes and biographies shown in Figure 4-12 above is disjoint, but incomplete. Consequently, the shared generalization arrow has to be annotated by the constraint keyword "disjoint" enclosed by curly braces, as shown in Figure 4-15.

Figure 4-15. The segmentation of movies into TV episodes and biographies is disjoint, but incomplete.

An example of a disjoint and complete segmentation, or partition, is the segmentation of customers into private customers and corporate customers shown in Figure 4-16.

Figure 4-16. The segmentation of customers into private customers and corporate customers is a partition.

In general, segment subtypes may be further segmented, thus resulting in a type hierarchy (or class hierarchy). An example of such a type hierarchy is shown in Figure 4-17

Figure 4-17. A hierarchy of vehicle types with three segmentations.