The costs of preserving research data
There’s a new report on the JISC website, authored by Neil Beagrie, Julia Chruszcz and Brian Lavoie. It looks at how much it costs to preserve research data and, perhaps as importantly, how institutions and others could calculate this. There are lots of reasons why this report is likely to have an impact - looking after research data is potentially costly, and yet it is important that - as a community - we make reasonable decisions about what should be preserved and how. Perhaps unsurprisingly (at least for those who already do this for a living), it seems the cost of ingesting the data forms the largest cost in the curation lifecycle, but at the same time the evidence shows that correcting badly ingested data later is even more costly, so the figures probably suggest that there is a positive cost/benefit calculation here. There is potential for developing the methodology here into a tool, and there could also be potential for some join-up with the Data Audit Framework.
Repositories and Preservation Programme Synthesis
We are proposing to undertake a synthesis of the repositories and preservation programme which will support action. This means that the outputs need to be targeted at decision makers with additional information for those that will have to implement the decisions.
We have taken as a starting point the idea that decision makers are most likely to take note of what we are saying if repositories or preservation address problems that they are already worried about, and that many of these will stem from government, funding council or similar policies which they have to implement.
We have identified policies, decision makers who are concerned with them and ways in which we think that repositories or preservation can help.
We are aware that there will be other policies out there that we should be considering, that there may be other ways in which repositories or preservation could help and there may be other people we need to address.
We would very much welcome comments and thoughts on our thinking so that we can take it forward and start the synthesis.
Please comment either by posting comments or by email to Tom Franklin who is leading on this (tom@franklin-consulting.co.uk).
Research
The Research Excellence Framework is of concern to many at the moment including senior managers, research managers, researchers and librarians. We believe that it is likely that institutional repositories will make collection of the relevant information easier and cheaper and will support whatever metrics are likely to be selected. It is also possible that open access repositories will lead to research being found more easily and therefore cited more widely. This also supports increasing research recognition.
Funding mandates from funding bodies such as research councils and Wellcome can be addressed through the use of required repositories (such as UK Pubmed Central), but through the use of suitable institutional repositories that support things like embargo periods.
Community and business engagement requires that information is made accessible to those that might effective use of it. Institutional repositories may assist here.
Teaching and learning
Cost reduction may be achieved through better sharing of learning materials, including learning objects, this will be of interest to both managers and teachers who need to then implement and make use of repositories, but contributors will also have to think about using appropriate standards. Integration with the VLE would also enable the most current version of materials to be easily accessible.
Quality assurance of courses, especially franchised courses for instance between a university and FE colleges is of concern to senior managers and teachers and could be supported by making learning resources available across the group through use of repositories.
Many institutions and their managers are concerned with retaining control over the IPR of their learning materials, institutional repositories for learning objects offer one way of controlling access effectively.
Information services and libraries
All managers and Staff are concerned with meeting their legal and Contractual requirements including self-deposit / open access and being able to enforce embargoes. Institutional repositories can help with these issues.
Help wanted
Are these the most important drivers?
Are there other drivers that we should consider?
Have we correctly identified the key audiences who can help to identify these things?
Posted by: Tom Franklin
Click streams -Library Managment Systems
I’ve been meaning to do a short post about the recent library systems study that JISC commissioned with SCONUL so people know about it. So here it is. I’ve been reminded of it as I’m at the Eduserv Symposium today and Ken Chad who worked on the study asked a question related to it.
The Eduserv Symposium is focusing on disruptive technologies and what the impact might be on the organisation. So in our case universities and colleges, and as Andy Powell pointed out in his introduction there is also disruption for related service providers such as Eduserv (and for that matter JISC). So one question is how should the academic/education sector respond to the ‘disruptive’ technologies (for that read web 2.0/ service provision on the network e.g. google and amazon services). Ken Chad mentioned the opportunity that the sector has in terms of the data known about users;for example click streams. The library management systems study (that Ken worked on with Sero Consulting) sees this as an opportunity for academic libraries to make their services more relevant to users. Of course there are delicate issues surrounding the use of click streams; not in the least privacy as Larry Johnston, NMC, pointed out in response to Ken’s question at the Eduserv Symposium.
The report covers far more ground that click streams, it is a horizon scan of what is happening in the UK academic sector in terms of LMS provision and what might be the requirements in the changing context that libraries now find themselves.
http://www.jisc.ac.uk/whatwedo/programmes/resourcediscovery/libraryms.aspx
Significant Properties
Around 150 people came to an event at the British Library a few weeks ago (April 7th) to discuss current research into the ‘Significant Properties of Digital Objects’. Significant properties are essential attributes of a digital object which affect its appearance, behaviour, quality and usability, and which must be preserved over time for the digital object to remain accessible and meaningful.
This was an event organised by JISC, the Digital Preservation Coalition (DPC) and the PLANETS project which is based at the British Library.
Frances Boyle (DPC) has written some useful notes on this event which are on the following JISC web page:
http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/2008sigpropswrkshp.aspx
By way of an addendum to those notes …
Towards the end of the discussion session (at the workshop) I invited delegates to consider what the ‘next steps’ should be in terms of commissioning new work in this area. One question to consider is: should JISC be seeking to fund more studies in a similar vein to the four featured at the workshop (on vector images, moving images, software and e-learning objects – i.e. continue investigating digital objects according to their type) or should JISC be looking to try and exploit the value of the work already done and commission some preservation / curation software tools that incorporate and use the concepts associated with significant properties?
I think it would be fair to say that delegates expressed support for both options but also gave the impression that there was still some groundwork to be done in really defining how the concept translates from the theoretical to the practical, and what it means to different users (or ‘designated communities’). I took the opportunity of repeating those questions at a meeting of the JISC Repositories and Preservation Advisory Group last week and received some further useful input. Points were made about the potential role that significant properties work could and should play in the quality assessment of preservation-related work … and also in the authentication of digital objects - two important and under-researched areas. The point was also made that whilst the four JISC studies had all made important contributions to this research area (and in usefully different ways) they had all struggled with the concept and it might be fair to say that we still didn’t know whether information managers had anything yet that was of practical value to them in their work.
So … the next step is to organise a hands-on technical workshop where a small invited group of participants consider a range of files, define their significant properties, and then use that information to undertake practical preservation work. This doesn’t rule out more studies, or tools development, or a discipline-based approach, or indeed other ways of taking this work forward. It will simply provide a more practical foundation for further work.
Repositories Support Project (RSP) Workshop
I attended a very useful workshop last week which was run by the Repositories Support Project (RSP). About 50 people were there representing around 30 organisations and there were presentations on the following initiatives:
JULIET
RoMEO
OpenDOAR
ROAR
The Depot
JORUM
EThOS
OAISter and BASE
Intute Repository Search
There was also time for some discussion and this highlighted a few issues that might be worth flagging up.
Bill Hubbard (RSP) commented on the lack of success in the U.S. of the Open Access Mandate at the National Institutes of Health. (see Open Access news article for background: http://www.earlham.edu/~peters/fos/2007/12/oa-mandate-at-nih-now-law.html). There appears to be only a 5% compliance rate at the moment so that obviously hasn’t worked! Bill made the point that this clearly reinforces the notion that the most important factor in improving repository deposit rates is not telling people ‘they must’, but to ensure that deposit is an integral part of the scholarly workflow.
(Obviously it’s not all about quantity, the material in these repositories has to be high quality and JISC is commissioning some work that will investigate techniques to help determine the quality of that deposited material).
Another point Bill made … It’s worth remembering that the amount of research that should be going into repositories is very substanstial. 6 out of 7 UK Research Councils have an archiving policy, and 36 out 38 Russell Group/1994 universities (which account for more than 80% of HE sector research done in the UK) have repositories.
It was good to see some of the repository stats reporting tools that are available in ROAR (Registry of Open Access Repositories - http://roar.eprints.org/)
One of repository manager participants at the event said that she recently had a conversation with an academic who was much more impressed with the information about repositories that he could see in OpenDOAR and ROAR than he was with the idea that his own institution had a fully operational and well stocked DSpace repository. We talked about the quality of advocacy materials that were available for repository managers to ’sell’ their systems and wondered if more could be done.
Some Other issues/comments from participants …
* JULIET & RoMEO were very useful resources. More should be done to develop API’s for both of these so that information could be embedded into institutional repository (IR) interfaces.
* The diversity of information and resources for HE IR managers was confusing. There should be a ‘one stop shop’.
* SWORD looks really interesting. Multiple deposit could improve the versioning problem where 4 different authors of a single paper are all putting separate (and potentially different) copies into their IRs.
* Is Intute more important for librarians than academics?
* The focus on colour-coded Open Access types is confusing and unhelpful. Green/white/Gold etc.
* Perhaps when talking about copyright issues, there should be more information about what IS possible rather than what isn’t. Copyright is not an issue that a lot of people want to engage with and some clear enabling advice would be good.
* IR managers on the whole had not started to grapple with preservation issues in a methodical way
These are just some of the notes I jotted down and the RSP will be reporting on the workshop in detail. But a very useful session - highly recommended for anyone in the repository field - particularly those who are fairly new to the area.
forthcoming events - http://www.rsp.ac.uk/events/
The top concerns of researchers
What do researchers care about? It’s probably uncontentious to say that they care about access, cost, copyright and quality. There’s a report published last month from the JISC Scholarly Communications Group that goes into a bit more detail:
http://www.jisc.ac.uk/media/documents/aboutus/workinggroups/topconcernsreport.doc
There are perhaps few surprises - the concerns might be paraphrased as ‘lack of access’, ’some funding arrangements inhibit access’, ‘copyright is confusing’ and ‘new types of quality assurance are untested’. One key tool that should help address several, but not all, of of these concerns is a licence to publish. There’s a JISC-SURF one here, but there are certainly others that do much the same thing - ie, help authors retain rights they may need to use and share their papers. It’ll be interesting to see how it gets taken up.
Metadata for stuff in repositories …
I just wanted to highlight some metadata application profile work that is underway as part of the information environment repository programme. Having attended the birds of a feather session (coordinated by Rosemary Russell, UKOLN and Julie Allinson, University of York) about this at OR08 I finally got to see what JISC had funded. Today at the JISC Repositories and Preservation Advisory Group we discussed some of the work and I guess it made me think it was worth making a few more people aware of it. JISC has funded the development of:
metadata application profiles based on Dublin Core for:
Scholarly works
Geo-spatial data/information
Images
Multi-media
And we’ve also funded some work to assess what might be done in terms of application profiles for the following:
Learning objects
Scientific data
A little bit of context…
After using OAI-PMH across repositories in the JISC Focus on Access to Institutional Repositories (FAIR) programme the experience was that Dublin Core was often not rich enough to be very useful to end user applications. The requirement for both metadata and full text indexing was a specific recommendation of the FAIR ePrints UK harvest and search project. After other work also confirmed this the response was to seek to add to basic DC by developing an application profile. The scholarly works application profile (SWAP) was developed by Julie Allinson (at the time UKOLN now University of York) and Andy Powell (Eduserv Foundation). SWAP aims to help support richer search functions and also to support full text indexing, and as I understand it another benefit is navigation between different versions. It is based on the Functional Requirements for Bibliographic Records (FRBR) model which uses the following entities: work, expression, manifestation and item.
You can read more about SWAP here:
http://www.ariadne.ac.uk/issue50/allinson-et-al/
SWAP, although based on a FRBR type model was kept quite simple. It seems that when creating SWAP some hard lines were drawn to avoid too much complexity and from the feedback I have heard it seems to have addressed requirements. It was certainly good to hear from one of the attendees at the OR08 meeting that SWAP was “exactly what they required”. Mick Eadie (Visual Arts Data Service, University College for the Creative Arts) also described the images AP at the OR08 meeting, and it seems to have tried to keep a simple approach too. A draft of the images AP is now out for comment. See:
http://www.ukoln.ac.uk/repositories/digirep/index/Images_Application_Profile
Of course to get the real benefit of these application profiles the implementation of them has to be made as easy as possible and we need to encourage take-up. Working with repository software providers to support the APs is one thing that might be possible and the teams supporting the work intend to do this. SWAP has been implemented at Warwick University as they customised EPrints software to support it.
If you really want to help or know someone that can
a job advert is currently out for a related metadata advocacy post at UKOLN: http://www.ukoln.ac.uk/vacancies/08H127A/job-ad/
Note that SWAP is the most mature of the APs; the other areas are in initial draft and are still being developed.
Here are some related links:
SWAP:
http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile
The geo spatial work that James Reid, EDINA (University of Edinburgh) is leading on is currently out for comment:
http://www.ukoln.ac.uk/repositories/digirep/images/e/ef/Geospatial_Application_Profile.doc
Work done by Phil Barker, CETIS, (Heriot-Watt University) on the learning material application profile is here:
http://www.icbl.hw.ac.uk/lmap/domainModel.draft1.html
It is probably worth mentioning that previously some work has been done for learning materials/objects. See information on RLLOMAP: http://www.intute.ac.uk/publications/rdn-ltsn/ap/
and: http://standards-catalogue.ukoln.ac.uk/index/UK_LOM_Core
RLLOMAP seems to have a similar aim to the current work in that it was to support the exchange of metadata using OAI-PMH and UK LOM did build on this.
Not surprisingly the multi-media application profile is a tough one and drafts are not yet available as far as I know. But I do know via Pete Johnston (Eduserv Foundation) that there are some early results being reviewed! Gayle Calverley is leading the work in this area.
There is also the DCMI Scholarly Communications Community where discussion should take place about the application profiles once the work picks up as a whole (coordination and outreach is currently being planned):
http://dublincore.org/groups/scholar/
Jorum to move to Open Access
Jorum has recently been awarded £2.4m by JIIE to do what so many people have said needs doing: it is going open access! The new service (“JorumOpen”) will operate under a Creative Commons License and will not require user registration to access and download its content. Users that have already contributed content through existing licences will be contacted to ask if they wish to sign a new open access licence or continue to store their content under the same terms in a parallel service (“JorumPrivilege”).
The new services (collectively known as “Jorum2”) will start being rolled out this Autumn. There will be a range of added value services - such as a development bay to explore integration with VLE’s and to allow users to experiment with learning object reuse - as well as continued R&D. The full press release is attached.
www.jorum.ac.uk
Open Repositories 2008
Before arrival at the recent Open Repositories 2008 conference, I was telling myself that this would be a dynamic, busy and vibrant conference, attended by a technically ambitious and knowledgeable community, and that it would obviously be a great opportunity for me to engage in constant blog activity (reading and writing). As it turned out, the preconceptions I had about the conference were exactly right. The aspirations I had about my own activities in the blogosphere, however, turned out to be more a case of ‘amplified expectations’ rather than the ‘amplified conference’ that Lorcan Dempsey has referred to (http://orweblog.oclc.org/archives/001404.html).
From the more comfortable perspective of two weeks after the energetic and meeting-packed week down in Southampton (that made it impossible to get near a blog!) it’s possible to look back and consider a few of the more prominent features of the conference.
One principal item was the role that OAI-ORE (Open Archives Initiative Protocol – Object Reuse and Exchange (http://www.openarchives.org/ore/) may have in describing the structure and semantics of aggregations of web objects, thereby making those objects available to a variety of applications. Though still in beta (or perhaps even alpha) by the time of the conference, this data model was used in the development of the winning prototype of the ‘Repository Challenge’ competition (http://or08.ecs.soton.ac.uk/developers.html ) - a JISC/CRIG sponsored event that was an important and characteristic feature of the conference.
Tim Brody (University of Southampton) along with fellow team members, Ben O’Steen (University of Oxford) and Dave Tarrant (University of Southampton) developed the winning application which was called ‘Mining the ORE’. Tim Brody describes it as …
‘A practical approach to copying complex objects between repositories. Every eprint in a repository is exposed as an ORE aggregation (Object Reuse and Exchange). Each ORE
aggregation of an eprint links together all the files and associated metadata. This aggregation of files had one resource that was marked as conforming to simple Dublin Core and this was used as the basis of the metadata interoperability. When ingested into a new repository each resource in the ORE aggregation is retrieved and stored. The simple Dublin Core is used to index the new eprint for the purposes of search and discovery, otherwise all of the component resources are simply shown to the user. We implemented exemplar ORE interfaces for both EPrints and Fedora, enabling the transfer of complex objects between the two system implementations.’
19 teams entered the ‘Repository Challenge’ and in total over 40 developers were involved in creating the rapid prototypes. Five prototypes were shortlisted by an international panel of judges and the winner was then selected by a balloted vote from the conference delegates at the OR08 awards dinner. This type of developmental process is a new departure in terms of JISC-funded initiatives but has proved to be potentially of great benefit in terms of providing candidate service-usage models (SUM’s) for submission to the e-Framework, and other forms of documentation including training materials and case studies. It would be very interesting to hear views and opinions about the value of this form of rapid prototyping exercise. Anyone interested should contact David Flanders at the Common Repositories Interface Group (CRIG) http://www.ukoln.ac.uk/repositories/digirep/index/CRIG. David was the driving force behind the Repository Challenge at OR08 and its success was entirely to do with his energy and determination.
Returning to the mainstream sessions of the conference, Peter Murray Rust gave the first keynote speech and urged delegates to be wary of the ubiquitous use of the pdf format to capture the complexity of scientific information. This reluctance to accept what has become the de facto deposit standard clearly rang bells with some delegates (http://scilib.typepad.com/science_library_pad/2008/04/or08—the-pres.html).
One of the challenges tackled by many presenters was how to ease the burden of deposit and how to incorporate web 2.0 interfaces and techniques into repository design and workflow. The automation of metadata tagging and the design of batch ingest procedures were also variously discussed.
All the papers are being made available in the OR08 repository (http://pubs.or08.ecs.soton.ac.uk/) and this will give some idea of the complexity of the main part of the conference. What it won’t describe is the amount of peripheral but important activity that happened around these presentations, encompassing: Fedora, e-Prints and DSpace group meetings; a repository manager forum; a developer barcamp; an international meeting about Global Registries; a EurOpen Scholar day addressing issues about Open Access … not to mention gatherings and briefings put together by commercial participants such as Microsoft, who introduced the research data repository platform that they have been developing.
Perhaps the very busiest part of the conference was the one that I almost completely missed. If Owen Stephens’ experience of the conference was anything to go by (and this was someone who wasn’t even at the conference), then all the ‘amplification’ that was going on was perhaps a bit too much! http://ukwebfocus.wordpress.com/2008/04/08/micro-blogging-at-events/#comment-64627.
The ‘chattering classes’ is obviously a thing of the past. Now we have the ‘twittering’ classes.
Case studies galore
As part of the Repositories Support Project’s session for Repository managers at the splendid Open Repositories 08, the conference organisers collected a load of case histories from repository managers in the US and Europe.
The case histories have been made available on the Repository Support Project’s website. They cover a variety of different repositories in a variety of different settings. Some of them are short and some are long but they are all an interesting read.
As far as I can see these are useful in a number of ways:
- A great way of seeing how similar problems have been handled in different institutions.
- A guide for dealing with repository problems for new repository managers.
- A collection of the current concerns of repository managers.
- An evaluation resource, they represent a picture of where we are now with repositories. Looking back on these a couple of years into the future could prove to be very interesting in terms of what has been achieved and what still needs to be achieved.
- A research resource - if i was an information science student I would be grabbing this resource with both hands, I imagine that common themes, successful strategies, lessons learned etc would make a good dissertation topic.
Posted by: Andy McGregor