Computer science, software engineering, and humanities

A recent article on Wired (Hey, Computer Scientists! Stop Hating on the Humanities by Emma Pierson) claims that most computer scientists are deeply oblivious to humanities issues such as ethics or cultural aspects. I quite agree. However, I think the issue is a bit more complicated than this. I will explain how but, first, I’ll take a short detour.

I spent a few years working as a researcher at the University of Technology, Sydney. My home there was the Faculty of Information Technologies, which was organised in three departments:

  • Computer Systems. These guys dealt with wires, routers, CPUs and other physical things. They also worked with non-physical things such as operating systems, data structures, algorithms and other low-level software entities. And by “low-level” I mean software that usually runs very close to the hardware and is far removed from the experience of a regular computer user.
  • Software Engineering. These guys were concerned with understanding the world so that relevant problems (how to distribute water across a city, how to manage trans-oceanic container shippings) could be solved through the application of software. They studied how to design and create good-quality software, basically.
  • Information Systems. These guys dealt with organisations or even the society at large, and how computers and related information technologies work within them. Usual themes included e-commerce, social networks, or smart cities.

Continue reading “Computer science, software engineering, and humanities”

Advertisements

Structuring content with semantic technologies

In a recent post, I discussed the principles of modularity and layering in software engineering, and how they affect information (or data) in the digital humanities. Today I am focussing on a related but different aspect: how to structure content in digital humanities, and how conventional semantic technologies lack the necessary features to do it.

But first let’s introduce some concepts. When I say “content”, I am referring to anything that is represented inside a computer system. Data and information are both content. Knowledge is not content since it resides in our heads rather than a computer system, but it can be represented in a software system in terms of information. Many definitions have been offered for data, information and knowledge (the DIKW pyramid is a classic), but I will use very simple ones:

  • Data consist of simple, “naked” quantities or qualities that represent individual properties of something. For example, Age = 38 is a piece of data, as is the more complex p: Person { Name = “Alice”; Age = 38 }.
  • Information is data in a message. In other words, information is data being conveyed from someone to someone else (possible many people) for a certain purpose. For example, if you ask me “Who’s that person over there?” and I reply with “Oh, that’s Alice”, that’s information.
  • Knowledge is justified true belief, as expressed by the ancient Greeks. Knowledge resides in our heads, but can be represented as information to be communicated.

From these definitions it should be clear that computer systems store and manipulate data which, when transmitted, constitutes the core of information. This information, in turn, may represent existing knowledge, and be assimilated by us and help us create new knowledge or change existing one.

Continue reading “Structuring content with semantic technologies”

Ontologies and conceptual models

If you are familiar with the field of digital humanities, you have probably heard or read about ontologies. However, you are less likely to have read or heard about conceptual models. In this post I will describe what these are, how they are related, and how they may be useful in digital humanities.

In philosophy, ontology is the study of being, that is, the branch of philosophy concerned with what is (and isn’t), how we can organise what there is, and what major categories of existence and things there are. In information technology (IT), the word “ontology” has been borrowed to refer to a machine-readable representation of a portion of the world. In the words of [Gruber 1995],

An ontology is a specification of a conceptualization.

First of all, ontologies in IT are about conceptualisations. An ontology represents a specific manner to think about a part of the world. This conceptualisation is usually shared by a number of people so that the ontology can be useful to all. Second, ontologies in IT are specifications, that is, they consist of instructions or declarations targeted at a computer. In fact, IT ontologies arose in the 1990s from the works previously carried out in artificial intelligence (AI) during the 1970s and 1980s. The AI community was focused on developing methods to represent the world so that computers could “understand” and “reason” about it. After all, AI was part of computer science, and very much computer-oriented.

Continue reading “Ontologies and conceptual models”

Modularity and layering in semantic technologies

In the previous post, I argued that most semantic technologies (as usually understood in the digital humanities community) are anything but semantic. In this post I turn to engineering concerns. Here I argue that most semantic technologies such as RDF, SKOS and OWL, especially as employed in digital humanities for linked open data, present serious flaws due to the lack of good engineering practices. Engineering is crucial in this regard, because thesauri, repositories and other similar artefacts that are routinely developed within the digital humanities field are information systems and, as such, exhibit behaviours that are well understood in engineering and are subject to engineering principles.

My main point today is that semantic technologies are usually oblivious to the well-known engineering concerns of modularity and layering. Modularity refers to the organisation of a system into parts, named modules, which exhibit high internal cohesion and low external coupling. This means that modules are not just random chunks of a system. Rather, a module is a portion of a system that, to start with, must have a high degree of internal cohesion. This means that individual elements inside the module are tightly related to each other, leaving no elements disconnected or isolated. In other words, a module must have a clear and single purpose, and every element in it must be concerned with this purpose. In addition, a module must have a low degree of external coupling. This means that the relationships between a module and other modules must be few and weak, rather than many and strong. This is so in order to concentrate connections inside modules rather than across them, which benefits the overall quality of the system. A good introduction to why this is so, and how quality factors such as robustness and extensibility are improved through modularity, is given in the introductory chapters of [Meyer 1997].

Continue reading “Modularity and layering in semantic technologies”