Linked Data 2: The Cataloger Strikes Back

- 6 mins

This post is a follow up to my last, “Wherefore art thou, linked data?”

Friend of the blog Becky Yoose alerted me to the existence of other friend of the blog Tom Johnson’s talk from Semantic Web in Libraries, 2015. I suggest you watch it if you can speak linked data or application development:

When I was talking with Tom on Twitter and still in the headspace of the last note, something clicked. While I made the assertion that linked data technologists in general talk past catalogers and that the idea that the graph must usurp the record is invalid, I missed something.

In Tom’s words: “Graphs are mathematical objects.” This may be a large cause of the semantic spat between linked data technologists and library metadataists and catalogers. The RDF graph generated by the series of assertions about a particular thing is a surrogate of the thing we are describing, but not in the same way that a bibliographic record is. The oversimplified thesis I am attempting to put forth: linked data graphs are a human-readable abstraction of computer computation whereas the bibliographic record is a human-readable abstraction of an intellectual manifestation. At the end of the day, it’s all the same metadata insofar as the abstractions are semantically equivalent in representating the same real object. In an epistemological sense, though, they are serving two different and equally important functions. The ~catalog of the future!~ must incorporate both the graph and the record in order to be a functioning system of human-computer interaction that achieves the catalog’s aims.

The graph doesn’t really help people who are not technologists. This point is absolutely critical for linked data technologists to understand. The graph does not represent human semantics. It represents mathematical equations that can be imbued with human meaning in a very specific predicate logical sense. These mathematical equations are how we work as metadataists and technologists, yes. They are critical to the system, yes. They are not the whole system. Like Tom’s example above, the simplest way to represent this to people who do not work with linked data on a regular basis is the subject-predicate-object construction.

Assertion set 1 of object x, using Dublin Core terms

<x> <dc:title> <Graphs, Records, and Transmission>
<x> <dc:creator> <Shockey, Kyle>
<x> <dc:subject> <Johnson, Tom>
<x> <dc:subject> <linked data>
<x> <dc:date> <2015-03-26>
<x> <dc:type> <blog post>
<x> <dc:description> <a blog post about graphs and records>

This is, of course, a very skeleton description of this very web page1. Let’s say I wanted to edit it to be more representative. I come out with the following:

Assertion set 2 of object x

<x> <dc:title> <<Graphs, Records, and Transmission>
<x> <dc:creator> <Shockey, Kyle, 1990->
<x> <dc:subject> <linked data>
<x> <dc:subject> <documentation>
<x> <dc:subject> <machine-readable bibliographic records>
<x> <dc:date> <2016-03-26>
<x> <dc:type> <blog post>
<x> <dc:description> <a blog post about graphs and records>
<x> <dc:references> <Johnson, Tom>

Tom makes an important point that pertains to these two graphs and their metaphysical being: they are two different graphs. Unlike the bibliographic record model that MARC uses which has a definite single master record with a history of edits, the documents above are two completely different records in linked data world. Having these two separate graphs allows me to make a mathematical comparison of the changes, represented below with ++ and – :

Mathematical comparison between assertion sets 1 and 2

<x> <dc:title> <Graphs, Records, and Transmission>
-- <x> <dc:creator> <Shockey, Kyle>
++ <x> <dc:creator> <Shockey, Kyle, 1990->
-- <x> <dc:subject> <Johnson, Tom>
++ <x> <dc:references> <Johnson, Tom>
<x> <dc:subject> <linked data>
++ <x> <dc:subject> <documentation>
++ <x> <dc:subject> <machine-readable bibliographic records>
<x> <dc:date> <2016-03-26>
<x> <dc:type> <blog post>
<x> <dc:description> <a blog post about graphs and records>

But that is not how representation of an object works epistemologically in the human world. We know that the object is still the same object and that the human meaning of the metadata was edited, not replaced. MARC metadata has a method for version control of a master record2: the 040 field.

040 __ IUL $b eng $e rda $c IUL $d IJZ $d EAU $d PUL $d DLC

In human words: Indiana University (IUL) transcribed ($c) Indiana University’s cataloging (blank node, or $a), which was in turn edited ($d) by Indiana Archives of Traditional Music (IJZ), American University (EAU), Princeton University (PUL), and the Library of Congress (DLC). The record is transcribed in the English language ($b eng) and uses the RDA description convention ($e rda).

Or: The record has existed in 5 iterations, including the current one. That isn’t a statement you can make about graphs. It literally does not compute, with the possible exception of the RDF Source (discussed in the video). This is the disconnect between graphs and records that require both intellectual models to exist in order to get from computer computation to human meaning-making. The gap between these two abstract models is generally bridged by front-end programming–the discovery layer of the OPAC, etc3. A handy chart:

computer computation – representative graph – programming black magic – bibliographic record – human understanding

It’s not so linear or straightforward, but I am attempting to come up with something better through formal research. This explication I’ve put forth also requires catalogers to know something of equal importance: a MARC record does not automatically equal a bibliographic record. MARC is a data standard and a markup language. I think many catalogers who do not necessarily think about their craft in a meta sense have never unpacked the intellectual shortcut that causes “bibliographic record” to equal “MARC record.” If you see the chart above, it fits MARC just as much as it fits linked data. The representative “graph” in this case is your MARC record. The semantic meaning in MARC for a non-specialist is explicated only after the computer computation takes the representative graph (this thing) through the programming black magic to make the thing you see on the OPAC display (i.e. the bibliographic record).

With this chart, I can say again: the only difference between the two with regard to the library catalog is the first (and secondarily the third) node. Building semantics in linked data dramatically shifts the concept of computer programming, but has little to no effect on bibliographic representation except for decentralizing the database. The record and the graph are both useful models for understanding metadata but they represent two completely different things in the process of bibliographic representation automation. With this knowledge, we can build the big tent and get linked data technologists and library catalogers and metadataists to talk to each other without denigrating the other’s work.

  1. I am of course not using all of these elements with proper formality. This is a proof of concept. 

  2. Not a great one, since it only records literals and doesn’t actually have the functionality to revert to a previous form of record. 

  3. This kind of programming is too out of my depth for me to be able to make any other sort of meaningful statement about this kind of bridging. 

rss facebook twitter github youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora