The JISC Preservation of Web Resources Workshop (PoWR)
The first JISC-PoWR workshop took place on Friday (27th June 2008) at Senate House Library, University of London and was attended by over 30 people from a wide range of professional groupings, including the Web management and Records Management communities. The workshop was entitled ‘Preservation of Web Resources: Making a Start’ and considered how delegates could begin to consider including Web resources in their preservation strategy. There was much interest in the case study presented by the University of Bath which illustrated the differing perspectives held by the web and records management communities. Bringing together these communities is something the project is seeking to address.The main presentations are now available for download:
http://jiscpowr.jiscinvolve.org/2008/06/30/workshop-1-resources-available/
Posted by: Neil Grindley
Research data curation
Back last year, following the Digital Curation Conference in Washington DC, JISC and the Andrew J Mellon Foundation hosted an international workshop to discuss and suggest where the international priorities are for research and development work supporting academic research data curation. It’s taken a while for the notes to become available, for which I apologise, but here they are:
Priorities for research data curation workshop 2007
(I realise this is a PDF file, which won’t please everyone, but shrunk the filesize by over an order of magnitude from MS Word)
The starting point for the workshop was a recognition that, while research data orients largely by (sub)discipline, the way in which infrastructure is developed and funded is often oriented nationally, or even around institutions. Some way is needed to square these two. I have to confess that, on the day, I wasn’t sure we’d made a lot of progress, but in drafting the notes I changed my mind somewhat. Certainly, Peter Murray-Rust seemed to identify the academic department infrastructure as a key point where intervention could serve both that department and the wider goal of data curation and sharing. The photos of flip chart diagrams are perhaps not easy to read or understand, but suggest a distinctive place for libraries and repositories.
Greg Crane’s Perseus project anticipated some of the topics that were covered later - notably how to design an infrastructure that is sustainable and yet adaptive - there are a few ideas in the notes. there are also a few ideas about how the problem space might be broken down so that an international approach can be taken, though this remains difficult. With luck and effort, JISC’s and other UK ‘data’ work will join up with that in the US (eg the NSF Datanet programme), Australia (Australian National Data Service), etc, and these notes will help us do that.
Many thanks to the workshop participants, listed at the end of the notes.
ReStore workshop
I attended a very interesting workshop for the ReStore project last week. The project is run by Southampton’s ESRC National Centre for Research Methods and is investigating the use of a repository to host and maintain orphan web resources.
The problem that the project is addressing is that very useful web resources are produced by research projects. However when the project funding stops the maintenance of the resources often stops. This means that the resources start to decay, broken links flourish and the usefulness of the resource deteriorates quickly.
ReStore aims to address this problem by accepting suitable resources after a review process and then hosting and curating the sites with a mixture of automated and manual processes.
The project is funded by ESRC and aims to produce a prototype repository that curates a few web resources that have been produced by other ESRC projects.
The workshop was chiefly concerned with introducing the project and discussing some of the major issues such as technical challenges, IPR and sustainability. The presentations from the day can be downloaded from the project website: http://www.ncrm.ac.uk/restore/slides/. These include some mockups of the proposed system and an overview of the proposed review and curation process.
The project’s work on development of a long-term strategy for ESRC in sustaining on-line resources will be very relevant to JISC.
The technical challenges in hosting a range of resources that may all use different software and hardware are significant and it may be better in the short term to use Amazon Web Services or a similar service to host the sites and avoid a large hardware bill.
The costs of preserving research data
There’s a new report on the JISC website, authored by Neil Beagrie, Julia Chruszcz and Brian Lavoie. It looks at how much it costs to preserve research data and, perhaps as importantly, how institutions and others could calculate this. There are lots of reasons why this report is likely to have an impact - looking after research data is potentially costly, and yet it is important that - as a community - we make reasonable decisions about what should be preserved and how. Perhaps unsurprisingly (at least for those who already do this for a living), it seems the cost of ingesting the data forms the largest cost in the curation lifecycle, but at the same time the evidence shows that correcting badly ingested data later is even more costly, so the figures probably suggest that there is a positive cost/benefit calculation here. There is potential for developing the methodology here into a tool, and there could also be potential for some join-up with the Data Audit Framework.
Repositories and Preservation Programme Synthesis
We are proposing to undertake a synthesis of the repositories and preservation programme which will support action. This means that the outputs need to be targeted at decision makers with additional information for those that will have to implement the decisions.
We have taken as a starting point the idea that decision makers are most likely to take note of what we are saying if repositories or preservation address problems that they are already worried about, and that many of these will stem from government, funding council or similar policies which they have to implement.
We have identified policies, decision makers who are concerned with them and ways in which we think that repositories or preservation can help.
We are aware that there will be other policies out there that we should be considering, that there may be other ways in which repositories or preservation could help and there may be other people we need to address.
We would very much welcome comments and thoughts on our thinking so that we can take it forward and start the synthesis.
Please comment either by posting comments or by email to Tom Franklin who is leading on this (tom@franklin-consulting.co.uk).
Research
The Research Excellence Framework is of concern to many at the moment including senior managers, research managers, researchers and librarians. We believe that it is likely that institutional repositories will make collection of the relevant information easier and cheaper and will support whatever metrics are likely to be selected. It is also possible that open access repositories will lead to research being found more easily and therefore cited more widely. This also supports increasing research recognition.
Funding mandates from funding bodies such as research councils and Wellcome can be addressed through the use of required repositories (such as UK Pubmed Central), but through the use of suitable institutional repositories that support things like embargo periods.
Community and business engagement requires that information is made accessible to those that might effective use of it. Institutional repositories may assist here.
Teaching and learning
Cost reduction may be achieved through better sharing of learning materials, including learning objects, this will be of interest to both managers and teachers who need to then implement and make use of repositories, but contributors will also have to think about using appropriate standards. Integration with the VLE would also enable the most current version of materials to be easily accessible.
Quality assurance of courses, especially franchised courses for instance between a university and FE colleges is of concern to senior managers and teachers and could be supported by making learning resources available across the group through use of repositories.
Many institutions and their managers are concerned with retaining control over the IPR of their learning materials, institutional repositories for learning objects offer one way of controlling access effectively.
Information services and libraries
All managers and Staff are concerned with meeting their legal and Contractual requirements including self-deposit / open access and being able to enforce embargoes. Institutional repositories can help with these issues.
Help wanted
Are these the most important drivers?
Are there other drivers that we should consider?
Have we correctly identified the key audiences who can help to identify these things?
Posted by: Tom Franklin
Significant Properties
Around 150 people came to an event at the British Library a few weeks ago (April 7th) to discuss current research into the ‘Significant Properties of Digital Objects’. Significant properties are essential attributes of a digital object which affect its appearance, behaviour, quality and usability, and which must be preserved over time for the digital object to remain accessible and meaningful.
This was an event organised by JISC, the Digital Preservation Coalition (DPC) and the PLANETS project which is based at the British Library.
Frances Boyle (DPC) has written some useful notes on this event which are on the following JISC web page:
http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/2008sigpropswrkshp.aspx
By way of an addendum to those notes …
Towards the end of the discussion session (at the workshop) I invited delegates to consider what the ‘next steps’ should be in terms of commissioning new work in this area. One question to consider is: should JISC be seeking to fund more studies in a similar vein to the four featured at the workshop (on vector images, moving images, software and e-learning objects – i.e. continue investigating digital objects according to their type) or should JISC be looking to try and exploit the value of the work already done and commission some preservation / curation software tools that incorporate and use the concepts associated with significant properties?
I think it would be fair to say that delegates expressed support for both options but also gave the impression that there was still some groundwork to be done in really defining how the concept translates from the theoretical to the practical, and what it means to different users (or ‘designated communities’). I took the opportunity of repeating those questions at a meeting of the JISC Repositories and Preservation Advisory Group last week and received some further useful input. Points were made about the potential role that significant properties work could and should play in the quality assessment of preservation-related work … and also in the authentication of digital objects - two important and under-researched areas. The point was also made that whilst the four JISC studies had all made important contributions to this research area (and in usefully different ways) they had all struggled with the concept and it might be fair to say that we still didn’t know whether information managers had anything yet that was of practical value to them in their work.
So … the next step is to organise a hands-on technical workshop where a small invited group of participants consider a range of files, define their significant properties, and then use that information to undertake practical preservation work. This doesn’t rule out more studies, or tools development, or a discipline-based approach, or indeed other ways of taking this work forward. It will simply provide a more practical foundation for further work.
Research data and the JISC IE
We’re hoping to present some themed web pages on the innovation work being funded under the JISC Information Environment area, including one on research data. I thought I’d use this blog to offer preview / pilot that page. I’m not sure if that’s an acceptable use of a blog, but I’m sure I’ll find out.
The aim of the JISC IE work on data is to promote and enable new ways of finding, using and sharing research data. Because there are huge variations in what ‘data’ is, and in disciplinary cultures and practices around it, there is likely to be a ‘mixed economy’ of infrastructure and services to support its management.
There has been a large number of reports on data recently, some of which are helpfully listed in a recent presentation by Michael Jubb of the Research Information Network. Three key documents are the report from the then Office of Science and Innovation on ‘e-infrastructure’, which set out a high-level vision, a set of principles for data stewardship developed by the Research Information Network, and the ‘Dealing with Data‘ report from JISC/UKOLN, which made practical recommendations.
In terms of current practice, two projects promise to paint a clear picture from different perspectives. A study of ‘data publication’ practice among researchers has been funded by JISC, the Research Information Network and the Natural Environment Research Council. A different project, SCARP, is exploring disciplinary attitudes and approaches to data deposit, sharing and re-use, curation and preservation.
JISC and the Engineering and Physical Sciences Research Council have jointly funded the Digital Curation Centre (DCC), which is a centre both of innovation and of guidance. Members of the DCC are developing a Data Audit Framework, which will enable universities to assess what data is being held on their computer systems, and who is responsible for it. The Data Audit Framework will be piloted in a number of universities in 2008.
There is a suspicion that the sector lacks sufficient skilled people to manage research data effectively. A report is due shortly that will review the position and make recommendations on how this might be addressed. The DCC will run a summer school this year to begin to address this issue. Of course, investment will only follow if a business case can be made, and a part of making that case is assessing the costs of preserving data. A methodology is being developed that will enable estimates to be made, though of course without assessing the benefits of keeping data, it is only half the story.
The UK is fortunate to have both the UK Data Archive (co-funded by JISC and the Economic and Social Research Council) and the data centres supported by the Natural Environment Research Council. These services offer expert advice and infrastructure for data management. A feasibility study is underway into the possibility of a UK Research Data Service as a collaboration between some UK universities, to fill in some of the gaps between such data centres. In addition, the DISC-UK Datashare project is looking at how UK higher education can increase its capacity to curate and share research data.
Finally, it is worth noting that JISC also funds work under the heading of ‘e-Research’, which is also focused on research data, including grid and semantic enabling of datasets.
Host your own programme meeting
JISC regularly holds meetings for the people involved in the projects funded under a particular programme. These programme meetings are popular. Well, some parts are, the networking parts are popular, the parts where we discuss JISC objectives and reporting are less popular.
Because the Repositories and Preservation Programme is very large (circa 80 projects) and addresses a variety of themes, it is very difficult for JISC to design programme meetings that meet the networking and sharing needs of all the project staff while still acheiving those pesky JISC objectives.
To help with this difficulty we decided to offer extra funding to project staff to enable them to host their own programme meetings. These meetings would be free from JISC interference (unless desired) and would be based around themes chosen by the project staff. The only restriction we placed on the meetings was that they had to include case studies and networking.
We got a healthy response to this idea and as a result the following meetings will be funded:
20th May 2008 - Differences between research repositories and repositories for learning and teaching purposes - DRAW project (University of Worcester).
June 2008 - From VLE to Repository: How do we do it? - CURVE project (University of Coventry).
June 2008 - Digital Curation and Preservation Projects Forum – Placing Ourselves in the Bigger Picture - The preservation strand of the Repositories and Preservation Programme.
September 2008 - The Impact of Organisational Culture on Repository Growth and Development - Embed project (Cranfield University).
November 2008 - Advocacy issues in populating institutional repositories - BURP project (University of Bradford).
November or December 2008 - Demonstrating and exploiting repository value - NECTAR (University of Northampton) and WRAP (University of Warwick) projects.
All dates may be subject to change.
The meetings are primarily for staff working on projects funded under the repositories and preservation programme however some people from outside may be invited and any spare capacity will be opened up to the wider community.
Further details will be blogged in due course.
Posted by: Andy McGregor
The Research Data Management Forum
This week I went to an early meeting of the Research Data Management Forum, co-sponsored by the Digital Curation Centre and the Research Information Network. The management and curation of research data is both a hot topic and a major challenge – not always a happy combination. This meeting of the Forum was open to anyone, and a diverse group attended, including several directly involved in managing research data and several more, like me, who have an interest supporting that work. In many ways the challenge of managing the digital data deluge is beyond the capacity of a single forum, and at times the list of unanswered questions prompted by the discussion threatened to sink the enthusiasm of even the keenest curator. There seems to be so much that needs doing. However, the main message I took away was the urgent need for more and better evidence: What are the benefits of curating and sharing research data? What are the benefits of having people in UK higher education skilled in data management? To whom do these benefits accrue? The evidence may be of a variety of kinds. Certainly, case studies can help show where these benefits arise and to whom in particular cases. However, what’s also needed is some serious economic modelling of the kind recently deployed by Professors Newbery and Bently and Rufus Pollock in their recent report on ‘Models of Public Sector Information Provision’ and, in a different context, by John Houghton and his colleagues on ‘Research Communication Costs in Australia’.
To supplement this message, and assuming the evidence shows that data curation and sharing is beneficial to UK higher education, to the UK more widely, and to research in general, the question arises what’s in it for researchers? In many disciplines data sharing is not common, and this can be for good reasons. The forthcoming report on ‘data publication’, commissioned by the Research Information Network, Natural Environment Research Council and JISC, will document the picture in some detail. A missing piece in the puzzle is the full citation of specific datasets, which is uncommon. Should this become common, metrics could be derived from aggregated citations to indicate the extent to which datasets were referenced, and academic credit (and therefore incentive) could follow. The difficulty is less in principle (for example, datasets from the UK Data Archive should already be cited whenever used, and can be easily) but in practice; it just rarely happens. JISC is funding several projects that might help – CLADDIER, StoreLink and OJIMS are the most obvious examples, but it’s not necessarily something that a JISC project alone can address.
Posted by: Neil Jacobs
Now THAT is what I call preservation …
For those interested in the e-journal archiving field … Portico (long-term archiving specialists) have signed a deal to place an offline copy of their 6 million articles in the e-Depot archive (long-term archiving specialists) based at the Koninklijke Bibliotheek, the National Library of the Netherlands (the KB).
http://www.portico.org/news/022708.html
Portico are clearly taking this preservation business VERY seriously indeed! … and good for them.
thanks to Steve Hitchcock for seeing this
Posted by: Neil Grindley