The paradox of semantic technologies

An ongoing debate in the humanities is that of methodology: hypothetico-deductive approaches are often rejected, and analytical definitions of the objects of study are usually believed impossible or not desirable. However, suitable alternatives are rarely produced without friction or heated discussion.

It is true that methodologies are taught at universities and considered an important aspect of research by most scholars; however, I have not found many occasions where methodologies were documented and communicated in a systematic way by practitioners. For this reason, I argue that methodological guidance in the digital humanities is scarce, which makes reproducibility of interim and final research results, as well as the communication of work processes to colleagues or the public, very difficult. This is especially relevant in relation to the separation of descriptive vs. interpretive processes. By descriptive processes I refer to formalising tasks by which researchers generate data from observed evidence, e.g. by recording finds at an archaeological excavation site. By interpretive processes I refer to deductive, inductive or abductive tasks by which scholars generate new knowledge from existing data and through argumentation, sense-making and other cognitive devices. Such poor separation of descriptive and interpretive processes in the digital humanities leads to bigger reuse barriers and scenarios where collaboration is harder, especially between individuals of different disciplinary backgrounds.

Most digital humanities scholars are aware that knowledge generation is a crucial issue. To tackle it, they have embraced a fuzzy set of technologies and approaches often named “semantic technologies”, most notably those corresponding to a family of World Wide Web Consortium (W3C) specifications collectively referred to as the “semantic web”. Semantic technologies include those such as the Web Ontology Language (OWL), the Simple Knowledge Organization System (SKOS), or the Resource Description Framework (RDF). Other notable technologies do not come from W3C, such as the Text Encoding Initiative (TEI).

In this post, I will challenge the suitability of these technologies and the underlying approaches as being of significant utility to knowledge generation in the digital humanities, and I will question the significance of the “semantic” qualifier as often employed in the field.

But what is the semantic web? When explaining what the semantic web is, the W3C states that

The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.

However, and while sharing and reusing data across boundaries is a valuable goal, this is hardly related to semantics. Semantics, or meaning, is usually defined as the relationship between a symbol (such as a word or an icon) and the entity in the world that it stands for. For example, the semantics of the word “house” in the sentence “That is my new house” as uttered by me last weekend when showing my new house to my friends is the relationship between that word and my house itself. Semantic relationships are not fully provided by the symbols that we use; we need to look at additional elements in the communication, such as the speaker, the listeners, and the context, in order to determine what entities in the world the message is referring to. In the previous example, the meaning of the phrase “my new house” cannot be determined from the words alone, and you need to know that I was the speaker in order to identify which house the words refer to. Similarly, the mode of communication (written text, intimate conversation, public address, online chat, etc.) and its social context also shape the meaning of the words we utter; unless you see me point at the house, you won’t be able to determine which house is mine. And, finally, you need to consider the relationships of each symbol to other symbols in the language, which also contributes to a symbol’s semantics.

Semantic technologies, however, are mostly concerned with the symbols and their manipulation, rather than the entities in the world that they stand for. Even more blatantly, they neglect semantically-relevant elements, including the actors involved in the communication, the communication mode, or its social context. In fact, technologies such as RDF, SKOS or OWL represent the symbols that we use and the connections between them, but cannot represent the entities that they refer to, the speaker, or other relevant context information.

This problem has been carefully described by some works such as [Uschold 2003], but proponents of the semantic web seem to have paid little attention. Correspondingly, most digital humanities scholars are oblivious to this. On the contrary, the popularity of technologies such as SKOS and RDF, as well as the emphasis on artefacts such as thesauri and controlled vocabularies, reveal that digital humanities community keeps the focus on trying to organise the ways in which we refer to things, instead of worrying about the things themselves. You may argue that thesauri and controlled vocabularies are very relevant, because agreeing on a common terminology is clearly useful. However, a shared vocabulary is of little help if each of us assigns different meanings to the same terms. And semantic technologies cannot deal with this.

In addition, the crucial part that interpretation and knowledge generation processes play in the digital humanities has been claimed by works such as [Gonzalez-Perez et al. 2015 or Kintigh 2006). A technology toolset that addresses symbol manipulation only, no matter how powerful, cannot account for the genuine semantics of humanities contents, and will only be able to assist with sorting, searching, comparing and, in general, processing data, leaving the challenge of abstraction unaddressed.

Furthermore, a lot of emphasis is being made on semantic interoperability within the digital humanities. However, I find “semantic interoperability” to be a misnomer, because data can never be made interoperable in a language-philosophical sense. Data elements represent facts in the world, and the semantics of these data constitutes the relationship between data elements and the corresponding facts. Therefore, data alone can never capture semantics; one needs the facts in the world as well as the data in order to operate (or inter-operate) semantically.

To solve this, representational relationships should be introduced in order to manage meaning itself. For example, it is common in current digital humanities practice to record and maintain significant datasets about the observed evidences, and also to produce reports or other documents describing and explaining, in a narrative manner, the outcomes of our interpretations. However, the process of obtaining narrative documents from data seems to happen by magic, since digital humanities practice rarely captures, describes or otherwise documents the relationship between narrative and data. True semantic technologies should allow us to “climb the ladder” of abstraction from data to information, and from here to knowledge, and describe, and even assist in, the interpretation and knowledge generation process.

In summary, the semantic web actually lacks semantics, especially in its referential sense, and semantic technologies, in the usual sense of this term, cannot help with meaning.

Is there anything that we can do? Is there a way to represent the knowledge in our heads, link it to the relevant data, and help us with the necessary cognitive processes so that new knowledge is generated? I believe so. The key involves using conceptual models which, despite being similar to ontologies in their superficial appearance, have very different goals. I will explain the details in future posts.

References

  1. Gonzalez-Perez, C., & Martín-Rodilla, P. 2015. Integration of Archaeological Datasets through the Gradual Refinement of Models. In F. Giligny, F. Djindjian, L. Costa, P. Moscati & S. Robert (eds.), “21st Century Archaeology: Concepts, Methods and Tools – Proceedings of the 42nd Annual Conference on Computer Applications and Quantitative Methods in Archaeology”, 193-204. Archaeopress. [Link]
  2. Kintigh, K. 2006. The Promise and Challenge of Archaeological Data Integration. American Antiquity, 71(3), 567-578. [Link]
  3. Uschold, M. 2003. Where Are the Semantics in the Semantic Web? AI Magazine, 24(3), 25-36. [Link]
Advertisements

One thought on “The paradox of semantic technologies

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s