Libriotech på...

SemantiKoha

SemantiKoha

Recently, there has been a fair amount of discussion about RDF/Semantic Web technologies being central to the efforts to replace the MARC family of "standards", including from the US Library of Congress (here and here). However, there does not seem to be much experimenting with these technologies in the context of actual ILSes. I aim to do something about that with the "SemantiKoha" project.

The goal of SemantiKoha is to explore how openly available semantic/linked data can be used in the Koha OPAC to enhance the user experience and aid in discovery. The practical work is at a very early proof-of-concept stage, but there is a live demo available (see e.g. the page for Charles Darwin) and the source code is also available. See especially the README, which explains how the demo is set up.

This page will function as a blog in which I dump ideas as they pop up, link to new developments etc. RSS

Next generation bibliographic data entry screens

Here's a message I sent to the BIBFRAME email list last night:

On 15 January 2013 19:13, Tom Morris <[log in to unmask]> wrote:
> One good path forward here might be the open source library software
> systems. Someone could prototype the data entry screen of the future
> in a real-live system.

Thanks for bringing that up, I have been thinking along the same lines
myself for some time now. I am involved in the Koha community, and I
have been thinking specifically about adding "semantic capabilities"
to that ILS.

Specifically I have been thinking about:

  • Getting records out of Koha with OAI-PMH and transforming them to
    RDF, using the marc2rdf software [1]
  • Storing the RDF in a triplestore
  • Creating interfaces for enhancing and supplementing the transformed
    data in the triplestore (by describing relationships between the
    records, pulling in data from other sources etc etc)
  • Enhancing the OPAC with the data from the triplestore. (I think this
    step is important - this shouldn't just be about creating "data entry
    screens", but about how we can make the ILS and the OPAC a good
    platform for mediation and a more useful tool both for librarians and
    patrons.)

My hypothesis is that once we start to see all the wonderful and cool
and useful things we can do with the semantic data we will one day
"wake up" and wonder why we ever bothered with MARC. ;-)

I actually have a half baked demo of the scenario described above
available [2]. Sadly the interface for working with the semantic data
here are command line scripts... ;-) I do hope to turn this into a
proper project with a proper interface and get it integrated into
Koha, though. The only problem is time/money... Maybe I'll team up
with some adventurous library and apply for a grant or maybe I'll
start a Kickstarter [3] campaign to raise money for it. Or both. Not
because I think I have the perfect idea for what the interfaces should
look like, mind you, but just to get the ball rolling and start the
evolution towards something useful.

The way forward? I think free software can be key, in that it allows
us to experiment and test things in real systems. I think the way to
do it is with "rough concensus and running code", and to iterate and
iterate and iterate, throwing away the bad ideas and holding on to the
good ones. And I think that goes *both* for creating the interfaces
and for figuring out what exactly should replace MARC...

Best regards,
Magnus Enger
libriotech.no

[1] https://github.com/digibib/marc2rdf - this is a project based at
the Oslo public library and they have recently got funding from the
Norwegian national library to develop it further.

[2] http://semantikoha.libriotech.no/ - a couple of examples:
http://semantikoha.libriotech.no/cgi-bin/koha/opac-view.pl?uri=http://esme.priv.bibkat.no/records/id_108
http://semantikoha.libriotech.no/cgi-bin/koha/opac-view.pl?uri=http://data.deichman.no/person/darwin_charles
There is a somewhat old RFC for Linked Data in Koha here, outlining
some more ideas:
http://wiki.koha-community.org/wiki/Linked_Data_RFC
I hope to add some more ideas here in the not too distant future:
http://libriotech.no/blogs/semantikoha/

[3] Well actually not Kickstarter, since that is limited to US and UK
residents, but something similar, at least.

To standalone or not to standalone

So this blog did not get off to a flying start (mainly due to a lack of time, of course, which has also kept me from actually working on SemantiKoha), but hey look, here is another post!

One of the the things I have been mulling over while I have been unable to actually work on SemantiKoha is the question of whether what I want to do is best done as a standalone "application" or as something tightly integrated into Koha. Here's what I'm thinking:

Background

So, the basic question is "What do I want to do?"

The answer isn't that hard: I want to create a public interface for a library catalogue, that is not based on MARC data, but on MARC data transformed into Linked Data/RDF and supplemented/enhanced by new kinds of data in the same format.

The basic layout of the system will have

  • MARC records retrieved from the ILS via OAI-PMH
  • transformed to Linked Data/RDF
  • stored in a triplestore, and
  • queried in response to end user actions (searching and browsing).

(And yes, I do include a step for retrieving MARC records from the ILS. Sure, we could build something that does not involve MARC right now, but would any libraries start using it? I doubt it. I think the only way to move forward is to let libraries and librarians keep their MARC for a bit longer, so we can show them the potential in Linked Data/RDF, and then, when they see how cumbersome and unfit-for-purpose MARC really is, we can also show them that we can throw that part of the system away, and just keep the Linked Data/RDF bits. That's my hypothesis, anyway.)

So what is the best way to do that? Here are the two extremes on the continuum of possible answers to that question: standalone or tightly integrated.

Standalone

This would work similar to solutions like VuFind and Blacklight and XC.

Pros

  • The project will benefit all kinds of libraries, not just the ones using Koha. This will also mean attracting more developers, not just developers interested in Koha.

Cons

  • To be a full replacement for an existing OPAC, there will need to be integration with the existing ILS, for doing things like
    • real time availability information for physical items
    • current loans, renewals etc

    There will probably have to be plugins for different ILSes or at least for different protocols (Z39.50, SRU, ILS-DI, any system-specific protocols).

Tightly integrated (into Koha)

Pros

  • Information like real time availability can be fetched directly, not from an API
  • No need to worry about re-implementing things like renewals, that's already in place in Koha
  • It would give Koha a "unique selling point" (Yes, I do love Koha and I want it to thrive, and I think more libraries switching to Koha is a good thing)

Cons

  • It would only benefit libraries using Koha, and probably only attract developers interested in Koha

Middle ground

The middle ground would be to integrate the new functionality tightly into Koha, but keep the central parts of it clearly separated from other parts of Koha (e.g. as one or more Perl modules that do not rely on other Koha modules), so that other projects could reuse those parts with a minimum of extra effort.

Conclusion

None, yet. But if you have opinions or advice, I'm all ears (also on Google+ and Twitter)!

LOADing data from GeoNames

Let's say I have a book about Kapiti Island in my collection, and I want to express this aboutness in a Semantic Web way. One source I could relate to would of course be DBpedia but another interesting one is GeoNames. Here's the drill:

1. Do a search for kapiti in GeoNames.

2. Find Kapiti Island in the result list

3. Click on the red marker for Kapiti Island in the list below the map, and see that "GeoNameId : 2189083"

4. Read about the Geonames Ontology and figure out that the URI for Kapiti Island should look like this:

http://sws.geonames.org/2189083/

5. Construct and run the appropriate (for Virtuoso) LOAD in the triplestore:

LOAD <http://sws.geonames.org/2189083/> INTO <http://sws.geonames.org/2189083/>

6. Et voila!

(And keep in mind that GeoNames data is licensed as CC-BY...)

Moving things around

I'm moving SemantiKoha to a Virtuoso triplestore and changing some identifiers, so there will be some weirdness in the coming time.

Here is the new triplestore: http://data.libriotech.no:8890/sparql/