Over the last couple of weeks 3 very interesting reports have drifted through my news feeds on libraries and linked data:
- The library of congress has announced plans for pursuing a replacement for MARC and these plans “will be focused on the Web environment, Linked Data principles and mechanisms, and the Resource Description Framework (RDF) as a basic data model”.
- The W3C library linked data incubator group released their report. This report recommends that librarians experiment more with linked data by releasing data, building on top of linked data sets, engaging with standards bodies and bring their preservation skills to bear on datasets and vocabularies.
- A CLIR report has been published on a linked data workshop and survey run by Stanford. The purpose of the workshop was to discuss the “the prospects for a large scale, multi-national, multi-institutional prototype of a Linked Data environment for discovery of and navigation among the rapidly, chaotically expanding array of academic information resources.” The report itself is useful for everyone as it contains sections on the value of a linked data approach for library content and talks about potential killer apps linked data could support.
These seem significant to me and I am inclined to believe that they represent a growing interest in linked data in libraries. Naturally I have some observational bias in this area since JISC has been funding a fair bit of work investigating the potential for linked library data.
- The Discovery programme funded 8 projects that made metadata openly available, most of these took a linked data approach. Summaries of lessons from these projects will be available very soon from the Discovery website
- Andy Powell and Pete Johnston produced a discussion document on a possible metadata approach to support discovery of library, museum and archive content based on linked data. This attracted detailed and passionate discussion from metadata experts.
- The OpenBib project investigated the issues and possibilities offered by open linked data for bibliographic metadata. We have recently funded this team to build on their initial work to show how this approach to bibliographic data can benefit researchers.
- The ArchivesHub are engaged in a project called linking lives which will use linked data to enable researchers to explore the relationships between people and things that are contained in the archives metadata that the archiveshub aggregates. This builds on the earlier Locah project .
- Suncat are making their journal bibliographic information available as linked data
There is lots of interesting linked data work happening in the wider world of cultural heritage:
- Europeana is taking a linked data approach
- The British Museum is up to some very interesting stuff as part of the Mellon funded ResearchSpace project
- The British Library is engaged in some exciting experiments with a linked data version of the British National Bibliography
- The BBC Digital Public Space project is making use of RDF data to produce a very exciting aggregation of content with many possibilities
This is just a flavour of some of the developments that I am aware of, there are many more, and I don’t doubt that I’ve missed some of the most interesting ones.
So why are so many organisations putting resources into engaging with linked data? Well the advantages of linked data at a very simple level are:
- It enables us to make links between different items in different collections to enable the development of new interfaces that support new ways of exploring collections.
- It can make aggregation and exploration of very different types of data and resources easier
- It works very well on the web enabling clever people to reuse the data to create new tools for engaging with the resources.
- It breaks down the concept of a record of a resource to allow us to make better use of the fields in the record such as people’s names, place names, dates etc.
- It can potentially lead to reduction of duplication of effort if key datasets are shared, this could mean that you just need to link to a trusted dataset rather than devoting effort to creating that data yourself.
However, it is far from certain whether linked data will transform the way libraries work or simply become a tool that is used for some datasets. Many people that I trust still have reservations about linked data as the skills required to model and create linked data are not commonly held by people in most libraries and it is not clear yet that there is an obvious return on the investment required to create and exploit linked data.
My personal opinion is that judging by the amount of effort and work that influential organisations are putting into linked data projects then it is not something that is going away soon. It seems likely that linked data will develop into a useful tool for at least some of the metadata or sets of metadata that librarians use. Senior librarians or those interested in personal development will probably need to think about the skills required to engage with this emerging technology.
As part of the JISC Discovery project we will be dedicating effort to making sure that librarians can learn from the projects we fund to investigate linked data. We hope that this will be a useful learning tool for those with an interest in developing their linked data knowledge or skills. This should include high level messages on value of the approach and detailed lessons on the technical and licensing issues involved. All of our resources will be made available on the Discovery website. We are also planning to provide training on some key topics so keep your eyes peeled for developments.
If any UK libraries are interested in experimenting in this space or in following the innovations of others, they may want to look at our current funding call which makes money available for UK HE libraries, museums and archives to make metadata openly available. There may be just enough time to put a bid together before the deadline of the 21st of November.
Finally, if you are interested in linked data it is worth watching this blog as my colleague David Flanders is planning some further posts to talk about the possibilities linked data offers for higher education.
The information environment programme 2009-11 (mercifully shortened to inf11) is drawing to a close and we are starting to reflect on what it has achieved.
We chose to manage this programme as one very broad programme rather than a number of smaller programmes and it has included work on:
- Activity data
- Automatic metadata generation
- Infrastructure for resource discovery
- Repositories – enhancement, take up and embedding and improving deposit
- Linked data
- Scholarly communication
- Rapid Innovation
- Library management systems – includes work on a shared ERM system with SCONUL
- Research Information management
- Developer community
This represents a lot of work that has produced some exciting outputs and interesting results. To try and help people see what outputs and results are relevant to them, we have prepared a list of 27 questions that the programme has addressed or started to address. This was put together by Jo Alcock from Evidence Base who are evaluating the programme.
The programme won’t finish until July so we will continue to add to these questions. If you have any suggestions for things to be included, please let me know.
For our next programme of work we will have 4 separate programmes:
- Information and Library Infrastructure
- Research Management
- Digital Infrastructure Directions
We will be blogging more about these programmes soon.
Henry S. Thompson
W3C Technical Architecture Group
1. What is the TAG?
The W3C (the World Wide Web Consortium, founded by Tim Berners-Lee, is responsible for most of the foundational standards which ensure the inter-operability of the technologies which make up the Web, such as HTML, XML, CSS, SVG and MathML. It also strives to protect the interests of all Web users, in areas such as accessibility and internationalisation.
The TAG (Technical Architecture Group has perhaps the widest ranging remit of any of the groups which do the work of the W3C. Its remit is to “to document and build consensus around principles of Web architecture and to interpret and clarify these principles when necessary, to resolve issues involving general Web architecture brought to the TAG, and to help coordinate cross-technology architecture developments inside and outside W3C.”
2. What is my role?
I’ve been an elected member of the TAG since 2005, with support from JISC for travel costs. Although TAG members don’t explicitly represent particular constituencies (they are elected by the W3C membership as a whole), I’ve tried to pay attention to issues of particular relevance outside the United States in general, and to the UK in particular.
3. What has the TAG been up to lately?
Since the publication of Architecture of the World Wide Web, Volume One, the TAG has mostly operated in a more focussed, issue-driven mode. There have been three specific topics under consideration over the last year or so: the future of HTML, the revision of HTTP and a cluster of issues around URIs, including persistence, semantics and conversion between different formats. (There is a full public listing of TAG issues and open action items available online.) On a broader canvas, the TAG has begun working towards a possible new publication on the Architecture of Web Applications. The following subsections look at each of these areas of work in turn.
3.1. The future of HTML
The TAG is not directly involved in the W3C’s work to produce HTML5, the next version of HTML. But it has been actively engaged in monitoring the progress of that work, particularly in areas of relevance to Web Architecture. This has involved the TAG in discussion with the HTML WG and its chairs on such matters as XML compatibility, modularity, (distributed) extensibility, accessibility and approach to language definition. In some of these areas, for example XML compatibility and language definition, TAG intervention seems to have had a significant positive impact, leading to new work and/or substantive revision to the HTML5 spec. In others, the discussions have been less fruitful, at least so far.
3.2. The revision of HTTP
The HTTP working group of the IETF has begun work on the first revision in over 10 years of the specification of the Web’s key transport protocol, HTTP. There has been excellent liaison between the TAG and the working group, with a number of the changes in the draft revision arising directly from TAG input. Modifying a specification of such importance requires great care, and the TAG is helping to provide independent review as the work goes forward.
URIs keep being used in new ways and in new circumstances. A number of issues have arisen or come to the fore recently at the intersection of their use on what one might call the ‘old-fashioned’ Web with their use on the ‘Semantic’ Web, or the Web of Linked Data, as it is now often referred to. These include deep questions about the precise meaning of response codes such as 200, 303 and even our old friend 404, more specific issues including the appropriate level of commonality for the interpretation of fragment identifiers (that is, the part after the hash (#) in a URI) across all the so-called ‘+xml’ media types and issues which are almost as much organisational as technical, notably the question of just how many places we need to define the mapping from the kinds of strings identifying web resources that we find in XML or HTML documents to the rather more constrained form the IRIs and/or URIs are mandated to take in HTTP requests.
3.4. The Architecture of Web Applications
The growth of ‘Web 2.0’ and the mobile Web has given rise to many new questions about how rich and powerful client-side actors, much more diverse than simple browsers, can and should be governed by the principles of Web Architecture as already understood, and to what extent we need new architectural principles in this area. A lot of the TAG’s current work is focussed on sub-parts of these questions, for example the use of URIs to ‘store’ client-side application state, privacy considerations which arise when device-resident sensors such as GPS expose APIs to web applications and security models and vulnerabilities. The TAG expects to publish drafts in these and related areas in the new year.
4. Find out more
The TAG conducts almost all its work in public, either in or linked from email sent to www-tag. Browsing the public archives is the best way to catch up on the current state-of-play. Minutes from the TAG’s weekly telcons and quarterly face-to-face meetings are always announced on that list. For a slightly longer-term perspective, the public archives of the public-tag-announce list give regular summary updates.
Finally, input on any topic that is, or that you think ought to be, under consideration by the TAG is always in order on the public mailing list.
Henry’s web page is here:
The JISC Information Environment and eResearch team have been working on a grant funding call which should be released on or around 4th October. There will be a briefing day in London on 11th October; more information to follow about that. Those who keep a close eye on the JISC website and in particular the funding roadmap will be aware of the general direction of the call, but in this post I’ll spell out a little more about it. The call broadly represents a tranche of investment in the technologies, policies and practices that make up the infrastructure to support research and learning. The structure of the call document is still being finalised, but for the purpose of this post we can say that it calls for projects under ten main strands:
Research Information Management refers to administrative data about research (projects, outputs, etc). This is the second round of funding for projects in this area and will focus on interoperability around the CERIF model, simplifying the exchange of research information between and within HE organisations. One aim is to support the community of practice that is emerging in the UK around the use of this standard, and so help universities benefit from shared lessons, skills and experiences. A total of £300,000 may be available for this area of work.
Identity management is an increasing important role for any organisation providing digital identities, for example to its staff and students. JISC has funded the creation of an online Identity Management Toolkit to enable universities and colleges to access and review their identity management processes and policies. Projects funded under this call will be early adopter pilots whereby universities and colleges deploy the Toolkit, work with the its creators and submit case studies to the Toolkit website as part of providing a richer set of resources to support the Toolkit. The projects will also need to share lessons learned with the sector. A total of £200,000 may be available for this area of work.
Identifiers: Universities and colleges create new pages on their public ac.uk websites every day. The management of these websites continues to grow in complexity and size, especially as editorial control is devolved to more departments and institutional staff. The URIs of these websites are a key institutional asset, as a representation of the organisation on the web. This, and the persistence of these identifiers over time, contributes to the trust that can be placed in an organisation’s web presence. The aim of this strand of the call is to start to improve the extent to which identifiers are planned and managed within institutions, and contribute to the technologies and skills required to do that. Projects will be relatively small development activities, resulting in a valid data model for a URI set, plus a corresponding proposed re-organisation of a set of web pages. A total of £70,000 may be available for this area of work.
Infrastructure to Support Resource Discovery: JISC, RLUK and partners have released a vision for infrastructure to support resource discovery and related services in libraries, museums and archives. See http://rdtf.jiscinvolve.org/wp/ for background and context. This vision focuses on the provision of open metadata to support innovative and flexible services for researchers, teachers and students. JISC and partners will be funding work to realise this vision in line with the implementation plan. This area of the call is designed to enable libraries archives and museums to make open metadata about their collections available in a sustainable way, and to investigate the issues involved in its production. A recent workshop run by UKOLN recommended a series of steps that content providers could take in making their data available, in a model influenced by Tim Berners-Lee’s Linked Data Note:
1. make data available in an open form
2. assign and expose HTTP URIs for everything, and expose useful content at those URIs
3. publish data produced in step 2 as XML
4. expose the semantics of data produced in step 2
The JISC funding call will focus on the later steps in this series. A total of £200,000 may be available for work in this area.
Activity Data: Activity data is data pertaining to actions that a user has performed against an online resource or service, including simply reading it. Commercial companies such as Amazon and Tesco have made a great success of exploiting their data about customer activities to improve services to customers, manage stock and support decision making. Recent research and projects have suggested that similar opportunities may exist for Higher education institutions in managing their research, learning, information and administrative services. JISC intends to fund projects that explore these opportunities by positing a way in which their institution could benefit by exploiting this data, developing tools or approaches to exploit the data in that way, and reporting back on their success. There will be scope for larger projects working across several institutions, and a synthesis project to draw out and communicate lessons for the sector. A total of £600,000 may be available for work in this area.
Digital Preservation: Two specific areas of work in digital preservation will be included in the call. The first aims to enable practitioners within UK universities and colleges to test, validate, critique and demonstrate the use of existing preservation tools in live environments. While the most obvious use case is for information specialists within institutions to use this funding to embark on a scoped preservation exercise involving identified information or datasets, and using an appropriate preservation tool or tools, other use cases are possible. A total of £150,000 may be available for this area of work. The second area of work relates to complex visual digital materials and environments, specifically simulations, visualisations, gaming environments, virtual worlds and digital art. Preservation of these resources and environments is hugely challenging, and the work proposed will examine and record emerging good practice, and make recommendations to those facing those challenges in institutions and to JISC and other national and international bodies as appropriate. A total of £120,000 may be available for this area of work.
Geo-spatial: Location is a fundamental concept that underpins analysis within research and learning. Because of this geospatial tools and data can form a core component of research, teaching and learning in almost any discipline. The aim of the geospatial programme area is to increase the use of geospatial tools, infrastructure and information for learners, teachers and researchers; to enhance tools and services and related practice as well as identifying future requirements. Collectively the projects should help improve take-up in the longer term and where relevant they should support the transfer of geospatial skills to disciplines that are not primarily geospatial. A total of £700,000 may be available for this area of work.
Sustainability of at risk resources: Working in a dynamic sector in which organisations are set up, merged, re-organised and closed, it is easy for important digital resources to lose their curating body. Changes of ownership are difficult and valuable lessons could be learnt by applying best practice and sharing experiences. JISC has funded the development of advice in this area. One-off small grants may be made available to support projects in relocating valuable “at risk” resources, in such a way that experience is captured and shared, and feeds into JISC’s future work. A total of £100,000 may be available for this area of work.
Digital Repositories: Significant investment by universities and colleges, and by national bodies such as JISC, means that there is now a solid foundation for UK repository infrastructure that consists of institutional repositories, subject repositories, software, tools, skills and shared services. As a part of this growth and development pockets of excellence and good practice have formed at institutions. This final area of the call aims to improve institutional services that rely on the repository, by enabling the lessons and benefits from the most successful of repository applications, tools and techniques to be realised across a range of universities and colleges. So, small projects will be funded that enable institutions to take repository applications, tools and techniques from elsewhere and deploy them locally, thereby developing sustainable service improvements, building skills, and sharing the practice back with the sector. A total of £180,000 may be available for this area of work.
This is a broad call for proposals, demonstrating the breadth of vision JISC has for infrastructure to support education and research. It has been developed to achieve a balance between exploring the potential of new areas, and building on existing areas of strength to benefit the whole sector.
The draft call text is currently being reviewed by domain experts, who are offering feedback to JISC. Since the text is draft at the moment, and we expect it to change as a result of this feedback, no further information about the call is available at this time. The expert reviewers are: Kevin Ashley, Steve Bailey, Neil Chue Hong, Anna Clements, Liam Earney, Michael Fraser, Marieke Guy, David Harrison, Mark Hedges, Gareth Johnson, David Kay, William Kilbride, Gareth Knight, Mike Mertens, Paul Miller, William Nixon, John Paschoud, Dave Pattern, Andy Powell, Cal Racey, Rosemary Russell, Colin Smith, Owen Stephens, Graham Stone, David Thomas, Paul Walk, and the members of the JISC Geospatial Working Group. We are very grateful for their help advising us on various areas of this grant funding call.
London, King’s Cross, The Hub. Friday last I (David F. Flanders) received an email from Her Majesty’s Office of the Public Sector inviting me to attend an event at 7.30am on a Monday morning (ugh – a lovely morning though) called “Building Britain’s Digital Future” event. I thought it would have something to do with the jiscEXPO (LinkedData) programme we currently have a Call out for, but I had no idea it would be this significant! Upon arriving I realised it must be some announcement given the camera crews and so I assumed we be in for some announcements about money. I was not disappointed as the following money related “initiatives” were announced by none other than the Prime Minister himself: