Category Archives: digital curation

New Digital Infrastructure funding call available now

What better way to welcome the freshly rebranded Digital Infrastructure team blog than to announce a new funding call that spans nearly all the activities that the team is involved in.

The call is available now from the JISC site and the deadline for submissions is 12 noon on Monday 21st of November.

The call seeks projects in the following areas:

  • Resource Discovery – up to 10 projects to implement the resource discovery taskforce vision by funding higher education libraries archives and museums to make open metadata about their collections available in a sustainable way. Funding up to £250,000 is available for this work.
  • Enhancing the Sustainability of Digital Collections – up to10 projects to investigate and measure how effectively action can be taken to increase the prospects of sustainability for specified digital resources. Funding of up to £500,000 is available for this work.
  • Research Information Management – 3 projects to explore the feasibility and pilot delivery of a national shared service for the reporting of research information from Research Organisations to funders and other sector agencies, to increase the availability of validated evidence of research impact for research organisations, funders and policy bodies, and to formally evaluate JISC-funded activities in the Research Information Management programme and to gather robust evidence of any benefits accruing to the sector from these activities. Funding of up to £450,000 is available for this work.
  • Research Tools – 5 to 10 projects on exploiting technologies and infrastructure in the research process as well as innovating and extending the boundaries to determine the future demands of research on infrastructures. Funding of up to £350,000 is available for this work.
  • Applications of the Linking You Toolkit – Up to 10 projects investigating the implementation and improvement of the ‘Linking You Toolkit’ for the purpose of demonstrating the benefits that management of institutional URLS can bring to students, researchers, lecturers and other University staff. Funding of up to £140,000 is available for this work.
  • Access and Identity Management – 5 to 10 projects investigating the embedding of Access and Identity Management outputs and technological solutions within institutions. Funding of up to £200,000 is available for this work.

As always, JISC programme managers are keen to speak to prospective bidders. We’re always keen to talk ideas through and clarify the finer points of the call document. We have set aside 2 specific days for these conversations, the 26th and 27th of October so if you are considering a bid, please do get in touch to arrange a conversation. If those days aren’t good for you then my team mates and I will be happy to arrange alternative times.

This is always an exciting process for JISC staff as we get to hear lots of exciting ideas so I’m really looking forward to seeing what you clever people  come up with this time.


Why do repositories quickly become so complex? One answer is simply scope creep – repositories have roles in dissemination, research information management and curation and, facing these three ways, it is inevitable that the demands placed upon them mushroom. Without wanting to open up any arguments around SOA or RESTful approaches, one answer is to go back to Cliff Lynch’s 2007 description of the institutional repository as “a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members”. This approach seems to be having a revival.

The California Digital Library (CDL) is charged with providing a environment that enables those at the University of California to curate digital assets effectively. Rather than adopting a single solution, they have pioneered an approach based on “micro-services”. In this approach, the fundamental curatorial functions are disaggregated and provided by a managed and defined set of discrete services. They claim that this increases the flexibility of the environment, its ability to exploit changing technologies, and enables it to develop sufficient complexity to deal with evolving demands without becoming baroque. The approach has also been adopted at Northwestern University and Penn State in the US. The topic was of considerable interest at the recent Scholarly Infrastructure Technical Summit (SITS) meeting.

It’s an approach followed in several current projects, including Hydra. The discussion at the SITS meeting seemed to focus in part on the degree to which such micro-services can be standalone, as some of the CDL ones can be seen, or require that certain assumptions can be made about the environment in which they will be used, as in Hydra (Fedora). In reporting on the SITS meeting, Dave Challis notes that “I’m not convinced the specs for these are well defined enough for general purpose use yet”. There may be useful lessons from initiatives such as the e-Framework on the circumstances in which such definitions are feasible.

Relatedly, perhaps, it was interesting to hear Chuck Humphrey (Head of the Data Library, University of Alberta) speak at the recent SPARC Repositories conference describing the approach taken in Canada whereby a distributed OAIS environment is being established based on discrete services deployed across the country. Previous JISC work such as the Sherpa DP and PRESERV projects explored some of the options a few years ago, and these lessons may be worth revisiting in the light of the micro-services discussions.

There is probably some further learning to be done about what constitutes a viable, usable and sustainable micro-service and, with real examples out there now to use, there is a chance that people’s experiences of providing and using them will be shared.

Halfway through the US Blue Ribbon Meeting

Greetings from Blue Ribbon Meeting again.

Some interesting angles emerging from a variety of participants. Heard from Thomas Kalil this morning who works as a policy staffer at the White House in the Office of Science and Technology. (Is he perhaps what they call a ‘policy wonk’ … I don’t know. I should have paid closer attention to the West Wing). Anyway – he talked about how the preservation community might be able to get this whole area of digital sustainability onto the Presidential radar. What we don’t do is to present this as a problem that needs to be tackled in ten different ways, at different levels, by diverse stakeholders. We need to realise that the White House is in the business of saying ‘no’ most of the time, so we need to make it as easy as possible for them to say ‘yes’. We need to be realistic about what we’re asking for and we need to be very directed about who in the Presidential office we approach to champion the cause.

What we probably do is get a message to the President that person x at trusted organisation y, has a low cost/high benefit measure that the President really needs to hear, which fits with his broad agenda, and has the backing of many thought leaders across the expert community.

I’m assuming we can transpose that to Downing Street.

 Right … we’re back in session. More later.


The Economics of Sustaining Digital Information

I’m in Washington to attend the US Symposium of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access.

Symposium programme –

This is the end of a two year process of enquiry and analysis where  14 experts have had a series of meetings to consider what the economic implications are, and what economic frameworks are required, to ensure that all of the expensively acquired knowledge that we commit to digital formats is available for as long as we need it … or in some cases, even longer! – given that there is a great deal of information around that subsequent generations might find useful even though we have no clear idea what we should do with it!

The panel of experts is mostly drawn from the US although the UK has been represented by Paul Ayris (UCL) and Chris Rusbridge (DCC). The final report is now available at:

The report focuses on four types of information: scholarly discourse; research data; commercially owned cultural content; and collectively produced web content, and uses these categories to frame some recommendations for a range of different stakeholders. These are presented as a number of bullet-pointed lists and tables which can (and no doubt will) be extracted from the report by way of summarising some of the detail contained in the main body of the text.

For those of us not au fait with the language of economics, hearing digital materials referred to as ‘non-rival depreciable durable assets’ makes you stop and think for a moment … but as the concepts are explained and the principles become clear, this well written report starts to give you a slightly new take on the long-term management of digital resources.


Data Management Policy – An Interview with Paul Taylor

Dr. Paul Taylor works at the University of Melbourne and has just finished a 2 week secondment in the UK with the JISC-funded EIDCSR (Embedding Institutional Data Curation Services in Research) project based in Oxford. This is an approximate transcript of a quick 5 minute interview between Paul and Neil Grindley (JISC Information Environment Programme Manager)

Hi Paul, thanks for sparing the time out of a very busy schedule … what role do you have in the EIDCSR project?

Thanks Neil … I’m here to help them come up with a draft policy for the management of research data and records. It’s something we’ve had in place at the University of Melbourne since 2005 and we’ve just completed a revision of the policy to hopefully help make it a little more useful for researchers.

Tell us a little bit more about how that policy has been developed at the University of Melbourne and the reactions to it from researchers and data managers.

As I said, we’ve had policy in place since 2005 and early this year we were asked to work out how compliant we were with it, on the basis that if you have a policy and no-one pays any attention to it, its probably not much use keeping it there! Not surprisingly, we found out that most people weren’t compliant and also didn’t really know that the policy was there. We’re hoping that was the reason that they weren’t compliant rather than any sort of animosity against policies in general – but that’s still to be determined.

We reviewed the policy for two reasons: firstly to try and make it of more use to researchers (… there’s limits to that because when you are writing a policy to go across the institution, it has to contain really high level principles about the management of research data. If you get too specific you rule large populations out and then people pay even less attention to it than they did before). Secondly, its to get some attention and a bit of refocus on the data management area. There are a lot of things happening at the university at the moment in terms of the services that the university intends to provide for it’s researchers and some other changes in the Australian environment. We’re hoping to lock the high-level principles away in policy documentation and focus on keeping the guidance, information and support materials up to date and relevant for researchers.

The sustainability of keeping that guidance and information for researchers up to date is a real issue. Capturing their feedback and working it back into future iterations of those materials (and ultimately the policy documentation) is a desirable outcome but also a big challenge isn’t it?

Yes, it is.

How do you think that the policy that you’ve developed in Melbourne transposes to the University of Oxford?

That’s a good question … one of the things that we’ve learnt from the 2005 version of the policy is that its not enough to have the central policy on its own. There needs to be some kind of localisation of the policies and so with this new version of our policy we’ll be asking faculties to come up with their own enhancements so that it makes more sense to their researchers, and then probably get departments to do the same thing. I’d imagine the same sort of system could work at Oxford but it would be a little more complex with the number of people that would need to be involved in coming up with these localised versions of the policy. The hope is that there will be a trickle down effect from the high-level policies which have a practical influence on the way that researchers go about managing data.

In the meetings that I’ve had since I’ve been here, there have been some excellent examples of data managers and data management researchers (I guess you’d call them) who are working closely (one-on-one) with researchers who have come up with some excellent and novel solutions. I think the more that that can happen – a sort of resourcing at the coal face – then the more likelihood there is of high level principles trickling down to meet some of the very local one-on-one researcher-based developments. At that stage, perhaps there would be a general improvement in the management of research data across the institution.

One of the things I’ve heard a lot from people is the need for it to be a federated system. A lot of the departmental research groups have come up with their own systems for managing their own research data. Anything new that is provided centrally from the university has to try and complement those processes rather than take them over. That wouldn’t work well here (in Oxford) and it wouldn’t work in Melbourne. It would tend to antagonise people rather than improve the situation.

Yes … that principle of embedding existing processes and workflows into broader policy initiatives is an important concept for institutions grappling with these kinds of issues at the moment. Thanks very much Paul.


University of Melbourne – Policy on the Management of Research Data and Records (2005)

Review of Policy on the Management of Research Data and Records (2009)

EIDCSR Project (Embedding Institutional Data Curation Services in Research)


EC Digital Libraries and Digital Preservation Call

I went to a meeting in Peter Mandelson’s basement the other day, otherwise known as the Department for Business Innovation and Skills just next to Westminster Abbey. Lord Mandelson (if you go up about 17 levels of management) is my boss, so it’s good to know where the orders are coming from.

 Anyway … the meeting was a briefing day and a chance for the European Commission to explain a bit about the priorities and procedures that people should think about if they want to apply for funding for projects in the Digital Libraries and Preservation area (Formally referred to as FP7 ICT Call 6). The presentations are now available online at

The headline issues that I took away from the meeting were …

The whole funding decision-making process takes nearly a year and is extremely competitive. If you are a small organisation that is simply looking for money … it probably isn’t for you! The commission will be evaluating proposals according to 3 main criteria:

1. Are they proposing something that is useful and is technically robust?

2. Will they be able to achieve their objectives?

3. What impact will the work have?

They are looking for effective collaborations. Consortiums must have a lead and at least 3 other partners. How many partners and where they come from is – contrary to popular belief – not that important! There have been rumours in the past that people needed to hook up with Eastern European partners, or Southern European partners, in order to get funding. This is a fallacy. You just need to demonstrate that your consortium will be effective. In fact, once you have your core group of at least 4 EU partners, additional partners (with appropriate expertise) can come from anywhere in the world.

It is not generally the job of a research organisation to know about marketing and exploiting products that are created as part of a research programme. Join up with an organisation who knows about this stuff! It’s important to get it right and sustainability is EXTREMELY important.

Think hard about what sort of project suits your proposal … The models on offer are:

IP’s = Integrating Projects. Large scale (Euros 6 – 12m … sometimes more). R&D work, concepts, methods, tools, systems, often many partners. Advancing the state of the art – producing solutions that are within 3-5 years of being marketable.

STREP’s = Small to Medium Targeted Research Projects. (Generally Euros 2-4m). Focusing on more specific research problems with outputs that might be 5-7 years away from being marketable solutions.

NoE’s = Networks of Excellence. Advancing knowledge and bridging technological domains

CA’s = Co-ordinating Actions. Helping to ensure synergy between EC funded work

SA’s = Supporting Actions. Helping to maximise the effectiveness and impact of EC funded work

Total funding available for this call – Euros 69m

IP’s and STREPS = Euros 56m

NoE’s and C/SA’s = Euros 13m

Strikes me that a lot of people will be thinking hard and talking to other people over the next 12 months to really try and grapple with some of the hard problems in the Digital Preservation area and that is going to have a marvellous impact on the amount and quality of proposals that might end up flowing towards JISC. I’m not saying we’ll mop up failed EC proposals!! … I’m simply saying this has to be good for generally raising our whole collective game in the relevant areas of research and development.

iPres 2009 – Preservation Infrastructure Track

In San Francisco at iPres sitting in the preservation infrastructure track.

Stephen Abrams (CDL) is telling us about micro-curation services. Lots of clear categorisation of types of services that institutions might require. Currently talking about storage requirements. Provide for safety through redundancy, meaning through context, utility through service. Rattling through too fast to capture detail.

Q. How do CDL services compare with iRods?

A. i-Rods are all part of one controlled environment. CDL Micro-services can run as small discreet functions

Pam Armstrong and Johanna Smith from Library and Archives Canada.

They have a trusted Digital repository project that is running from 2008-2010. They are showing a value management framework. The first concern is ‘significance’. They are looking at government records and are trying to determine which records are important even before they arrive at the archive. Talking about a filtering process. Trying to deal with web 2.0 issues and are working on some guidelines.

They have established a records management task force with a high level of government support. A directive on recordkeeping is linked to a management accountability framework. If departments are found to be wanting with their records management function, they are denied the right to delete records. Good stick. There are functional requirements for EDRMS based on ISO. There is a proposed shared service for EDRMS for government info in Canada. They have built open source software eRTA for records managers. They have been working on metadata core set. They are using MODS and MARC and the info is discoverable by public. They have got to their summary already … my o my – these talks are quick!

The lessons learnt include the usefulness of the mandatory instrument that has consequences (see above).

Q. do you accept all formats?

A.  No, they have acceptable formats. Can’t do all formats.

Q. How implemented is all of this?

A. The implementation is uneven. All the instances across govt are implemented inconsistently. They have got lots of work to do to bring the legacy information into line.

Robert Sharpe – Tessella

Representing PLANETS consortium. Title is “Are you Ready? Assessment of readiness of organisations for Digital Preservation”. (I’m interested in this talk. Wondering how this matches up with JISC-funded AIDA project). They did a survey. To establish whether people were ready to use Digital Preservation solutions. The target group for PLANETS is national libraries and archives. There are 96 of these in Europe. They also invited any other interested parties to contribute. They got 206 responses. 70% responses from Europe.  They were a diverse community representing a range of roles.

15% digital preservation

16% in general preservation

22% curation

16% IT

also directors researchers data managers etc …

93% aware of DP challenges.

17% had not considered solutions.

52% did not have preservation policies.

They were 3 times more likely to have a DP budget if they had a DP policy in place. The majority had budgets to do capital activities. DP not really embedded in the institutions that responded still. What needs to be preserved? Stuff in file systems = 77% … many other categories going down to a long tail. National Libraries feel they have almost no control of the formats they have to accept. National Archives however claim high levels of control.

80% of organisations say they have less than 100TB to store in 2009. They think that by 2019, 70% orgs will have more than 100TB and 42% will have more than 1Pb. 85% have a solution or are working on one. They are generally expecting ‘plug and play’ components. That’s the trend and what people are expecting.

What functionality is important? Single most important function was that the repository must maintain authenticity, reliability and integrity of records. 17 different functions cited. Least important function is ‘checks for duplicate items’.Very little agreement on which standards should be used! (surprise surprise!) Of 13 standards on Robert’s chart, PREMIS in the middle in terms of who is using it already.

Summary …

Excellent start on getting DP message out
More work needed on policies and budgets
Wide range of types of digital info from range of sources
Significant quantities of data to preserve
Component-based solutions required
Best practice not yet clear
Early adopters are busy and planning to do more

Q. We are doing a good job with early adopters but what about the wider community. The success factor will be general users engaging with Digital Preservation

A. Yes

Q. The standards you showed, the figures are high for people not even having heard of them!

A. Yes.

End of session

Data in Nature

Finally got around to looking at the article on data that appeared on the Nature website last week.

Very nice to see JISC mentioned so positively in the editorial. They mention the Digital Curation Centre by name which is obviously one of the key pieces of support and infrastructure that JISC is funding to ensure that UK universities and colleges have access to advice and guidance in the handling and managing of research and other types of data.

Some other resources they didn’t have space to mention …The DCC (in collaboration with the Research Information Network) run the Research Data Manager’s Forum. This is a series of meetings that have brought a number of practitioners, funders and other stakeholders together to examine and discuss the issues facing data managers and curators.

There is a mailing list available that is geared towards this community

There is a recent report (Nov 2008) that looks at the Benefits of Curating and Sharing Research Data.

Another report (Jan 2009) looks at various national infrastructures enabling the sharing of data.

Earlier reports are available … one looking at the skills, roles and career structures that are required to support data scientists

All of which build on a report from 2007 authored by Liz Lyon, “Dealing with Data”.

The JISC Research Data Management programme is now in full swing and is in the process of starting 8 new major projects that will examine various aspects of Data Management Infrastructure. These projects will be supported by the DCC and other initiatives that will progress specific areas of complementary work (e.g.Tools).