Review of Dean Allemang and James Hendler: Semantic Web for the Working Ontologist
Posted on September 28, 2016
In 2001, Tim Berners-Lee, creator of the world wide web, wrote a Scientific American article, The Semantic Web, in which he outlined a vision of a web accessible to intelligent agents that would be able to consume, and act on, information stored in a machine readable form. The idea is that hypertext is very useful tot humans, but not so much to applications that lack the linguistic and cognitive abilities of human consumers of the web. Anyone with a passing familiarity with the web will understand the usefulness of linked data to human users. What is less obvious is how computer programs can assign meaning to data found on the web in a useful way. More generally, any application processing text (regardless of the delivery mechanism) can benefit from similar technology.
In the years since then, efforts toward building a semantic web have focused on associating metadata with web resources, and on creating standard vocabularies known as ontologies, with a sufficiently rich structure that it is possible to reason about them in a useful way. In Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL, 2nd ed., Morgan Kauffman, 2011, Dean Allemang and James Hendler undertake the task of describing the core technologies succinctly, and in a way that will be useful both ontologists (those who build and maintain vocabularies), but also programmers and other technologically inclined readers. In this sense, the book fills an important niche. Other books on the semantic web are either too abstract, or focus too much on technologies such as RDF/XML and RDFa, and tend to be oriented toward content authors.
The book proceeds logically, beginning with the foundational technology of Resource Description Format (RDF). Conceptually, RDF is quite simple, presenting information in the form of a set of triples, each having the form Subject, Predicate, Value (e.g., William Shakespeare wrote Hamlet). Each term in the triple is identified by a URI (or it may be anonymous when no more specific identifier is needed). This means terms are identified unambiguously, but not necessarily uniquely. To borrow an example, from the book: Pluto may be known to professional and amateur astronomers, the IAU and astrologers. Each group may have different things to say about Pluto (“Is it a planet, a dwarf planet, or a small solar system body?) and, indeed their own concept of Pluto. Fortunately, confusion can be avoided if different groups use different URIs to refer to their concept of Pluto. This is really an application of namespaces.
Now, perhaps you are wondering what comes next. Now that we can make statements in the form of triples, what can we do with them. As it turns out, RDF has very little to say here: it is just a language for ascribing properties to resources. If you look up the standards, you will find various ways of representing RDF, such as RDF/XML and the Turtle syntax (used in this book because it represents triples in a very direct way), or microformats such as RDFa. Many books on RDF spend a great deal of time discussing these various syntactic representations of RDF without providing much insight into what it is, and why it’s useful. That’s why I like this book. It’s a good place to look if you want to know why its a useful technology, and what you can do with it.
In fact, to be able to take full advantage of RDF, you need a language that allows you to speak about classes, and to say things like “Hamlet is play.” This is where RDF Schema (or RDFS) comes in. What RDFS does not allow you to do is constrain categories or express relationships between them very well. But it does provide an important step by providing language you can use to express such statements. We may not be able to say that plays and poetry are examples of literary works, but at least we can name them and identify them as categories. To take the next step, we need some version of Web Ontology Language (known as OWL). As is well known, once we step into the domain of predicate logic, we quickly run into questions of decidability and complexity, so there are actually many subsets of OWL, each optimized for a different purpose. And this book spends some time discussing the various subsets of OWL, why they exist, and the tradeoffs between them.
I highly recommend this book to programmers, analysts, or ontologists who want a good overview of RDF, RDFS and OWL, and who want to understand why they might be useful. There are other books that go into more detail about the syntax of RDF/XML, tools such as triple stores, and which delve more deeply into ontologies, but this book provides surprising depth of coverage and an excellent foundation for other more specialized books.