Co-research and transdisciplinarity

In the previous post, I mentioned the issue of co-research between the humanities and digital technologies quite in passing. Today I will go deeper into this issue.

Usually, the digital humanities of the first kind use information technologies as an auxiliary collection of techniques to help solve humanistic problems. This is not unlike archaeology using radiocarbon dating or art history using analytical chemistry. In other words, the research questions that we often see in digital humanities are about humanistic objects of study and of humanistic relevance, rather than being related to information technologies.

This does not constitute a problem by itself, but leaves much room for improvement in two different but connected aspects. First of all, humanities benefits from information technologies (IT) by adopting and using them, but information technologies rarely obtain much in exchange. In other words, the relationship between humanities and IT usually takes the form of a mere service-providing scenario. However, we can envision a situation where IT, while providing support to the humanities, are also inspired and extended by them, thus achieving a symmetrical relationship where the results provided to research questions of one discipline boost the other, and the other way around. This would need that the posed research questions incorporate elements from both the humanities and IT, and that the objects of study are also combined in meaningful manners.

Continue reading “Co-research and transdisciplinarity”

Computer science, software engineering, and humanities

A recent article on Wired (Hey, Computer Scientists! Stop Hating on the Humanities by Emma Pierson) claims that most computer scientists are deeply oblivious to humanities issues such as ethics or cultural aspects. I quite agree. However, I think the issue is a bit more complicated than this. I will explain how but, first, I’ll take a short detour.

I spent a few years working as a researcher at the University of Technology, Sydney. My home there was the Faculty of Information Technologies, which was organised in three departments:

  • Computer Systems. These guys dealt with wires, routers, CPUs and other physical things. They also worked with non-physical things such as operating systems, data structures, algorithms and other low-level software entities. And by “low-level” I mean software that usually runs very close to the hardware and is far removed from the experience of a regular computer user.
  • Software Engineering. These guys were concerned with understanding the world so that relevant problems (how to distribute water across a city, how to manage trans-oceanic container shippings) could be solved through the application of software. They studied how to design and create good-quality software, basically.
  • Information Systems. These guys dealt with organisations or even the society at large, and how computers and related information technologies work within them. Usual themes included e-commerce, social networks, or smart cities.

Continue reading “Computer science, software engineering, and humanities”

Structuring content with semantic technologies

In a recent post, I discussed the principles of modularity and layering in software engineering, and how they affect information (or data) in the digital humanities. Today I am focussing on a related but different aspect: how to structure content in digital humanities, and how conventional semantic technologies lack the necessary features to do it.

But first let’s introduce some concepts. When I say “content”, I am referring to anything that is represented inside a computer system. Data and information are both content. Knowledge is not content since it resides in our heads rather than a computer system, but it can be represented in a software system in terms of information. Many definitions have been offered for data, information and knowledge (the DIKW pyramid is a classic), but I will use very simple ones:

  • Data consist of simple, “naked” quantities or qualities that represent individual properties of something. For example, Age = 38 is a piece of data, as is the more complex p: Person { Name = “Alice”; Age = 38 }.
  • Information is data in a message. In other words, information is data being conveyed from someone to someone else (possible many people) for a certain purpose. For example, if you ask me “Who’s that person over there?” and I reply with “Oh, that’s Alice”, that’s information.
  • Knowledge is justified true belief, as expressed by the ancient Greeks. Knowledge resides in our heads, but can be represented as information to be communicated.

From these definitions it should be clear that computer systems store and manipulate data which, when transmitted, constitutes the core of information. This information, in turn, may represent existing knowledge, and be assimilated by us and help us create new knowledge or change existing one.

Continue reading “Structuring content with semantic technologies”

Ontologies and conceptual models

If you are familiar with the field of digital humanities, you have probably heard or read about ontologies. However, you are less likely to have read or heard about conceptual models. In this post I will describe what these are, how they are related, and how they may be useful in digital humanities.

In philosophy, ontology is the study of being, that is, the branch of philosophy concerned with what is (and isn’t), how we can organise what there is, and what major categories of existence and things there are. In information technology (IT), the word “ontology” has been borrowed to refer to a machine-readable representation of a portion of the world. In the words of [Gruber 1995],

An ontology is a specification of a conceptualization.

First of all, ontologies in IT are about conceptualisations. An ontology represents a specific manner to think about a part of the world. This conceptualisation is usually shared by a number of people so that the ontology can be useful to all. Second, ontologies in IT are specifications, that is, they consist of instructions or declarations targeted at a computer. In fact, IT ontologies arose in the 1990s from the works previously carried out in artificial intelligence (AI) during the 1970s and 1980s. The AI community was focused on developing methods to represent the world so that computers could “understand” and “reason” about it. After all, AI was part of computer science, and very much computer-oriented.

Continue reading “Ontologies and conceptual models”

Modularity and layering in semantic technologies

In the previous post, I argued that most semantic technologies (as usually understood in the digital humanities community) are anything but semantic. In this post I turn to engineering concerns. Here I argue that most semantic technologies such as RDF, SKOS and OWL, especially as employed in digital humanities for linked open data, present serious flaws due to the lack of good engineering practices. Engineering is crucial in this regard, because thesauri, repositories and other similar artefacts that are routinely developed within the digital humanities field are information systems and, as such, exhibit behaviours that are well understood in engineering and are subject to engineering principles.

My main point today is that semantic technologies are usually oblivious to the well-known engineering concerns of modularity and layering. Modularity refers to the organisation of a system into parts, named modules, which exhibit high internal cohesion and low external coupling. This means that modules are not just random chunks of a system. Rather, a module is a portion of a system that, to start with, must have a high degree of internal cohesion. This means that individual elements inside the module are tightly related to each other, leaving no elements disconnected or isolated. In other words, a module must have a clear and single purpose, and every element in it must be concerned with this purpose. In addition, a module must have a low degree of external coupling. This means that the relationships between a module and other modules must be few and weak, rather than many and strong. This is so in order to concentrate connections inside modules rather than across them, which benefits the overall quality of the system. A good introduction to why this is so, and how quality factors such as robustness and extensibility are improved through modularity, is given in the introductory chapters of [Meyer 1997].

Continue reading “Modularity and layering in semantic technologies”

The paradox of semantic technologies

An ongoing debate in the humanities is that of methodology: hypothetico-deductive approaches are often rejected, and analytical definitions of the objects of study are usually believed impossible or not desirable. However, suitable alternatives are rarely produced without friction or heated discussion.

It is true that methodologies are taught at universities and considered an important aspect of research by most scholars; however, I have not found many occasions where methodologies were documented and communicated in a systematic way by practitioners. For this reason, I argue that methodological guidance in the digital humanities is scarce, which makes reproducibility of interim and final research results, as well as the communication of work processes to colleagues or the public, very difficult. This is especially relevant in relation to the separation of descriptive vs. interpretive processes. By descriptive processes I refer to formalising tasks by which researchers generate data from observed evidence, e.g. by recording finds at an archaeological excavation site. By interpretive processes I refer to deductive, inductive or abductive tasks by which scholars generate new knowledge from existing data and through argumentation, sense-making and other cognitive devices. Such poor separation of descriptive and interpretive processes in the digital humanities leads to bigger reuse barriers and scenarios where collaboration is harder, especially between individuals of different disciplinary backgrounds.

Continue reading “The paradox of semantic technologies”

The concept of “digital humanities”

I will start by delving into an issue that has shown to be heavily contentious: I don’t think that the digital humanities constitute a new field of enquiry, one that is significantly different to good old humanities as we know them. I may be wrong, or you may disagree with me, so help yourself to the comments section. I will be glad to discuss, and I may as well be persuaded to the contrary of what I state below.

In a well-known interview, Kathleen Fitzpatrick [Lopez et al. 2015] states that digital humanities

…[is] bringing the tools and techniques of digital media to bear on traditional humanistic questions. But it’s also bringing humanistic modes of inquiry to bear on digital media.

According to this, digital humanities are about doing humanities by using digital technologies, or about digital technologies. Let’s explore both options.

On the first sense, I would argue that any current scientific research endeavour must necessarily employ digital technologies. In today’s world, you cannot be an active researcher without digital technologies. In fact, we do not hear about “digital biology” or “digital physics”, because the “digital” aspect is taken for granted in these disciplines. Similarly, I would think that you need digital technologies to carry out research in the humanities, and there is no way around it. In this regard, I argue that the “digital” qualifier in “digital humanities” of the first kind does not contribute much to the meaning of the expression.

Continue reading “The concept of “digital humanities””

What I am planning to write about

I have wanted to write about the digital humanities for some time now, and I hope this blog will be the place. I chose a blog because many things I’d like to say are too informal or too opinion-based as to be part of a scientific publication. Also, I am interested in hearing comments from other people and seeing how much my views are shared.

As the blog title implies, I will be writing about the digital humanities but in an unconventional way. This is so because my understanding of what the digital humanities are, what they should be, and how to practise them, is quite different to that of most of my colleagues, as far as I understand. My background is in software engineering and science (biology, in particular), although I’ve been working in the digital humanities for over 25 years, before the “digital humanities” term was mainstream. This means that I have come to this place from a direction that is very different to that of most other people. According to my experience, most people in the digital humanities are humanists (archaeologists, linguists, anthropologists or whatever) that have been attracted towards the digital and, very often, learnt about it. On the contrary, I am a software engineer who has been attracted towards the humanities, and (hopefully) have learnt about them. This difference will be very patent in my posts.

I am planning to write about what the digital humanities are and what they should be, as well as on whether they make sense at all, to start with. I am also planning to write about specific trends and technologies within the digital humanities, such as TEI, linked open data, thesauri, repositories, databases, metadata, etc. And I am also planning to write about topics that are rarely discussed in the digital humanities community, but which I believe should be, such as conceptual modelling, abstractionsystems design or research management.

Usually, I will write in a quite informal tone, like this. I will try to provide scholarly references when needed, and sometimes I’ll get into the nitty-gritty of things. However, I hope my posts stay accessible to a wide audience, even outside the digital humanities community. Please read on and leave your comments.

Thank you.