By Richard Light
You may have come across an enthusiast (like me!) who tells you that you should be publishing your museum collection as Linked Data. Your reaction may well have been to shrug, say “I don’t know what it is and I don’t know how to do this”, and get back to cataloguing your collection and recording your collections management work. At this stage in the game, that would probably be a wise choice.
This post tries to explain “what Linked Data is” from a cultural heritage point of view, what the possibilities are, and why it is currently really hard to do it.
The Web as a distributed database
We all know how the Web works. You find a page containing information that interests you: this usually involves using a well-known search engine. This initial page of search results contains lots of links to relevant pages, and you simply click on the links that look relevant to go to those pages. On each new page there are more links to follow. If you’re really lucky you can end up going round in circles. This is ‘browsing the Web’. It’s fine as far as it goes, for looking up and reading information, one page at a time.
However, if you want to treat these pages as data (for example, to add background information into an object catalogue record), you will find they are quite limited. You can copy and paste some (or all!) of a web page into one of your records, but you will find that you either end up with annoying HTML markup in your data along with the text, or that the markup disappears and all the text is kludged together. Either way, you can’t expect to extract data from web pages in a format which is compatible with your collections management system.
Linked Data works in the same way as web pages. The key difference is that each ‘page’ is actually a (sort of) database entry, containing structured data. You can browse from one Linked Data page to another, just as you browse web pages. The Linked Data web is, in effect, a loosely joined-up database that spans the entire Internet.
Using URLs to identify concepts
Linked Data, from our perspective, is something we could use to describe the entities that make up the cultural heritage world. These include people, places, events … and objects. A key feature of the Linked Data approach is that each concept has its own unique identifier. This is a URL, which follows exactly the same rules as the URLs which identify web pages. So this is a Linked Data identifier for a person from the Getty’s ULAN (Unified List of Artist Names) thesaurus:
Pop that URL into your browser, and you will see a slightly strange web page, which lists the facts known about this person. The page heading makes it clear that this person is John Gerald Platt – something that isn’t clear from the URL.
So far, not very exciting – but this is where the Linked Data magic comes in. Ask for the same URL in a different way, and you get real data back. I’ll gloss over the exact way you do this1 and the technical details of the data2 , and give you a sense of how it looks. This is a fragment of the XML version of John Gerald’s data:
This fragment lists the biographical data that is available. The key point is that each biographical statement has its own Linked Data URL, for example http://vocab.getty.edu/ulan/bio/4000231223, which you can look up:
<schema:description>English printmaker and painter, 1892-after 1956</schema:description>
This biographical fragment contains some real data: two dates and a summary description. There are also URLs for John Gerald’s gender and place of birth, which you could track down and extract data from. You’ll notice that these URLs come from different Getty thesauri: the gender URL comes from the AAT (Art and Architecture Thesaurus) and the place of birth from the TGN (Thesaurus of Geographic Names). This is a good way to do Linked Data: use existing frameworks to express the concepts you want to make statements about, rather than inventing new ones.
The really nice thing about using someone else’s Linked Data URLs in your records is that they give you additional data ‘for free’. For example, if you use a geographical resource like Geonames3 you get access to geolocation data for each place, which means you can publish distribution maps full of little pins at the cost of a little programming.
Publishing your collection as Linked Data
So let’s return to my original suggestion: that you publish information about your collection objects as Linked Data. There are two good reasons to do this: you stake a claim to your own material in the Linked Data world; and you provide an API for others to use when they want to access your data. I’ve had a go at doing this for U.K. museums, and a couple of them have taken up the opportunity4.
However, as I flagged up at the start, there are also good reasons not to publish your collection as Linked Data. Three which spring to mind: I’ll bet your collections management system lacks any support to help you add Linked Data URLs to your catalogue records; your web publishing software environment lacks any means of using Linked Data to add value to your web presence; and (perhaps most importantly) we currently lack Linked Data frameworks for the concepts we really want to share information about: people, places and events.
I’ll talk about these topics in more detail in a future post: in the meantime I look forward to responding to your comments and questions.
Richard Light is a U.K.-based information scientist and software developer who has been involved in museum information systems for nearly all his career. He helped computerize the Sedgwick Museum, Cambridge back in the days of punched paper tape and mainframes, and then worked on data standards and systems with the Museum Documentation Association (now Collections Trust). Since 1991 he has been an independent cultural heritage consultant, specializing in markup languages and Linked Data. He is the Chair of Free UK Genealogy5 and is a regular attendee at CIDOC6 meetings: something all museum documentation folk should do!
- it involves the Accept header of the HTTP request ↩
- it’s RDF, a directed graph structure: https://www.w3.org/RDF/ ↩
- e.g. http://sws.geonames.org/7298484/about.rdf ↩
- e.g. http://collections.wordsworth.org.uk/Object/WTcoll/id/rdf/GRMDC.C144.9 ↩
- http://www.freeukgenealogy.org.uk/ ↩
- http://network.icom.museum/cidoc/ ↩