Recently, there has been a fair amount of discussion about RDF/Semantic Web technologies being central to the efforts to replace the MARC family of "standards", including from the US Library of Congress (here and here). However, there does not seem to be much experimenting with these technologies in the context of actual ILSes. I aim to do something about that with the "SemantiKoha" project.
The goal of SemantiKoha is to explore how openly available semantic/linked data can be used in the Koha OPAC to enhance the user experience and aid in discovery. The practical work is at a very early proof-of-concept stage, but there is a live demo available (see e.g. the page for Charles Darwin) and the source code is also available. See especially the README, which explains how the demo is set up.
This page will function as a blog in which I dump ideas as they pop up, link to new developments etc. RSS
Here's a message I sent to the BIBFRAME email list last night:
On 15 January 2013 19:13, Tom Morris <[log in to unmask]> wrote:
> One good path forward here might be the open source library software
> systems. Someone could prototype the data entry screen of the future
> in a real-live system.Thanks for bringing that up, I have been thinking along the same lines
myself for some time now. I am involved in the Koha community, and I
have been thinking specifically about adding "semantic capabilities"
to that ILS.Specifically I have been thinking about:
- Getting records out of Koha with OAI-PMH and transforming them to
RDF, using the marc2rdf software [1]- Storing the RDF in a triplestore
- Creating interfaces for enhancing and supplementing the transformed
data in the triplestore (by describing relationships between the
records, pulling in data from other sources etc etc)- Enhancing the OPAC with the data from the triplestore. (I think this
step is important - this shouldn't just be about creating "data entry
screens", but about how we can make the ILS and the OPAC a good
platform for mediation and a more useful tool both for librarians and
patrons.)My hypothesis is that once we start to see all the wonderful and cool
and useful things we can do with the semantic data we will one day
"wake up" and wonder why we ever bothered with MARC. ;-)I actually have a half baked demo of the scenario described above
available [2]. Sadly the interface for working with the semantic data
here are command line scripts... ;-) I do hope to turn this into a
proper project with a proper interface and get it integrated into
Koha, though. The only problem is time/money... Maybe I'll team up
with some adventurous library and apply for a grant or maybe I'll
start a Kickstarter [3] campaign to raise money for it. Or both. Not
because I think I have the perfect idea for what the interfaces should
look like, mind you, but just to get the ball rolling and start the
evolution towards something useful.The way forward? I think free software can be key, in that it allows
us to experiment and test things in real systems. I think the way to
do it is with "rough concensus and running code", and to iterate and
iterate and iterate, throwing away the bad ideas and holding on to the
good ones. And I think that goes *both* for creating the interfaces
and for figuring out what exactly should replace MARC...Best regards,
Magnus Enger
libriotech.no[1] https://github.com/digibib/marc2rdf - this is a project based at
the Oslo public library and they have recently got funding from the
Norwegian national library to develop it further.[2] http://semantikoha.libriotech.no/ - a couple of examples:
http://semantikoha.libriotech.no/cgi-bin/koha/opac-view.pl?uri=http://esme.priv.bibkat.no/records/id_108
http://semantikoha.libriotech.no/cgi-bin/koha/opac-view.pl?uri=http://data.deichman.no/person/darwin_charles
There is a somewhat old RFC for Linked Data in Koha here, outlining
some more ideas:
http://wiki.koha-community.org/wiki/Linked_Data_RFC
I hope to add some more ideas here in the not too distant future:
http://libriotech.no/blogs/semantikoha/[3] Well actually not Kickstarter, since that is limited to US and UK
residents, but something similar, at least.
So this blog did not get off to a flying start (mainly due to a lack of time, of course, which has also kept me from actually working on SemantiKoha), but hey look, here is another post!
One of the the things I have been mulling over while I have been unable to actually work on SemantiKoha is the question of whether what I want to do is best done as a standalone "application" or as something tightly integrated into Koha. Here's what I'm thinking:
So, the basic question is "What do I want to do?"
The answer isn't that hard: I want to create a public interface for a library catalogue, that is not based on MARC data, but on MARC data transformed into Linked Data/RDF and supplemented/enhanced by new kinds of data in the same format.
The basic layout of the system will have
(And yes, I do include a step for retrieving MARC records from the ILS. Sure, we could build something that does not involve MARC right now, but would any libraries start using it? I doubt it. I think the only way to move forward is to let libraries and librarians keep their MARC for a bit longer, so we can show them the potential in Linked Data/RDF, and then, when they see how cumbersome and unfit-for-purpose MARC really is, we can also show them that we can throw that part of the system away, and just keep the Linked Data/RDF bits. That's my hypothesis, anyway.)
So what is the best way to do that? Here are the two extremes on the continuum of possible answers to that question: standalone or tightly integrated.
This would work similar to solutions like VuFind and Blacklight and XC.
There will probably have to be plugins for different ILSes or at least for different protocols (Z39.50, SRU, ILS-DI, any system-specific protocols).
The middle ground would be to integrate the new functionality tightly into Koha, but keep the central parts of it clearly separated from other parts of Koha (e.g. as one or more Perl modules that do not rely on other Koha modules), so that other projects could reuse those parts with a minimum of extra effort.
None, yet. But if you have opinions or advice, I'm all ears (also on Google+ and Twitter)!
Let's say I have a book about Kapiti Island in my collection, and I want to express this aboutness in a Semantic Web way. One source I could relate to would of course be DBpedia but another interesting one is GeoNames. Here's the drill:
1. Do a search for kapiti in GeoNames.
2. Find Kapiti Island in the result list
3. Click on the red marker for Kapiti Island in the list below the map, and see that "GeoNameId : 2189083"
4. Read about the Geonames Ontology and figure out that the URI for Kapiti Island should look like this:
http://sws.geonames.org/2189083/
5. Construct and run the appropriate (for Virtuoso) LOAD in the triplestore:
LOAD <http://sws.geonames.org/2189083/> INTO <http://sws.geonames.org/2189083/>
6. Et voila!
(And keep in mind that GeoNames data is licensed as CC-BY...)
I'm moving SemantiKoha to a Virtuoso triplestore and changing some identifiers, so there will be some weirdness in the coming time.
Here is the new triplestore: http://data.libriotech.no:8890/sparql/