Harvesting usage data?

I was talking with a researcher the other day who said that, despite his institution mandating deposit of research papers in his institutional repository, he didn’t comply - prefering to deposit in an international subject repository. Naturally, I asked him ‘why?’. He said that it was because he wanted each of his papers to be in one, and only one, place on the web, so that he could get accurate download statistics for it. Obviously, we’re aware in the JISC IE team of the various arguments on this topic, and we’ve funded a piece of work to look at the practical ways in which subject and institutional repositories might work together, which could address this issue among others. We’ve also funded various projects on repository statistics, such as ‘Interoperable Repository Statistics’ (which has developed a tool that repository managers can use to analyse and share statistics) and an ongoing small piece of work on harmonising article-level usage data formats. There is also MESUR and other projects in this space.

However, in the real world, it is likely that copies of some research papers are likely to be at various places on the web, and we wondered whether a tool could be built that used fuzzy matching to identify copies that were probably the same paper, some means of querying the servers on which they sat to get download data, and a reliable way of then aggregating that data into some acceptable statistics. Is that an important use case? Is feasible to build something that addresses it?
What’s the relationship (if any) with name authority services (see the JISC pilot Names project) or persistent identifiers (see the JISC Resourcing Identifier Interoperability for Repositories - RIDIR demonstrator)?

Bringing repositories to the attention of university senior managers

There are two new JISC briefing papers on repositories. One is concerned with the benefits of managing and sharing learning objects, the other with managing and sharing research outputs.

JISC and UUK are sending these papers to senior managers in universities next week. The papers should arrive on desks on Monday 16th of June. With any luck, the briefing papers will pique some interest in repositories or at least make sure the concept is familiar to senior managers.

This may represent an opportunity for capitalising on this familiarity or interest with further advocacy directed at senior managers about repository services, policies or projects.

The recipients are likely to be:

Plus some of:

The briefing papers can be found on the JISC website:
Learning objects: http://www.jisc.ac.uk/publications/publications/elearningrepositoriesbpv1.aspx
Research: http://www.jisc.ac.uk/publications/publications/researchrepositoriesbpv1.aspx

Open Standards

I recently attended two completely separate but thematically related events on the nature of openness within digital technology. The first of these was a lecture by Jonathan Zittrain entitled ‘The Future of the Internet And How to Stop It’ - organised by the Oxford Internet Institute. His central contention was that we are increasingly seeing corporations designing technology that cannot easily be manipulated by its users to allow them to do new and unanticipated things. The phrase he uses for such prescriptive technology is ‘non-generative’, one example of this being (in Zittrain’s opinion) the Apple iPhone. (You can read more about this at http://futureoftheinternet.org/)

The second event took place in the Hague a couple of weeks ago and was convened by an organisation which calls itself Digistan (http://www.digistan.org/). This group is also concerned about the degrees of openness apparent in the digital realm and has placed a clear statement of intent on their website in the form of ‘The Hague Declaration’. (http://www.digistan.org/hague-declaration:en)

This decalaration calls on governments to:
1. Procure only information technology that implements free and open standards
2. Deliver e-government services based exclusively on free and open standards
3. Use only free and open digital standards in their own activities

Strong stuff … and interesting, particularly when you consider that a representative of the Netherlands government was at the meeting and handing out copies of a booklet entitled ‘The Netherlands in Open Connection: An action plan for the use of Open Standards and Open Source Software in the public and semi-public sector’ (http://appz.ez.nl/publicaties/pdfs/07ET15.pdf).

It’s got me thinking about where JISC stands in relation to all this. I had another look at the JISC standards catalogue which is currently hosted by UKOLN (http://standards-catalogue.ukoln.ac.uk/index/Standards_Approach). It states:

“Despite the acknowledged importance of open standards, it was also recognised that the selection and use of open standards is not always easy. There is an awareness that not all open standards gain widespread acceptance and that adoption of open standards before they have proven their reliability and gained widespread acceptance can be costly.”

So there you go, we’re firmly on the fence! But there again, we do state that we have a policy of asking projects to either use open standards or justify whey they aren’t, which sounds exactly like the way the man from the Dutch government was talking at the start of the meeting. “Comply or Explain” was his approach. So perhaps we aren’t so far away afterall. One thing that certainly emerged from this meeting for me was that it would probably be helpful to have some kind of framework for determining how open a standard actually is. Perhaps something for inclusion into the next phase of development for the standards catalogue?

ORE@JISC

With the release of the beta OAI-ORE specification this week, I thought it was worth highlighting some of the JISC work in the UK that is contributing to this initiative. Two short projects are looking to experiment with ORE and feed back into its development. The FORESITE project at Liverpool, run by Rob Sanderson, has produced ORE resource map descriptions of the JSTOR collection (1.8 million full text articles), and will also ORE-enable the DSpace repository platform, depositing the JSTOR-ORE collection into DSpace using the SWORD protocol. The Theorem project, based at Cambridge and run by Jim Downing, is looking at etheses, both representing ‘ideal’ born-digital theses as ORE resource maps, and looking at workflows around these. This project is working closely with the Integrated Content Environment (ICE) developed by Peter Sefton at the University of Southern Queensland, Australia, to create an authoring and management environment that produces and handles chemistry theses as born-digital objects, with live links to data, and so on. This work complements an international project led in the UK by Chris Awre, and involving partners from the UK, Netherlands, Germany and Denmark, which is looking to get some international agreement on a complex object format for theses, drawing from the ORE specifications, but building on specifications currently used, such as x-metadiss in Germany. Given the relative simplicity of doctoral theses – they have limited versioning issues for example – and the pressing need in many countries to automate the thesis workflow, it may be that theses become an early ORE adopter.

Using Repositories for Learning and Teaching: Can we find a recipe for success?

I attended a JISC Repositories and Preservation Programme meeting, but for a change I was able to sit back and learn rather than run around stressed as the entire event was designed and organised by DRaW, one of the projects in the start up and enhancement strand of the programme.

This was the first of 6 programme meetings that will be delivered by the projects rather than programme managers and, in my opinion, it was a roaring success.

You can read summaries of the day produced by Nick Sheppard of the Leeds Met Repository project and Julian Beckton of the Lirolem project. The day started with 6 quickfire introductions to the Lirolem, Circle, DRaW, YSJ Digirep and Faroes projects and an overview of the issues with learning and teaching repositories from Andrew Rothery (of the University of Worcester) and Phil Barker (of JISC CETIS). We also had two impromtu introductions to the POCKET and Edspace projects. It ws interesting to note that all of these projects were adopting a different approach to the implementation of a repository:

The afternoon session focused on using the experiences of the delegates to try and prepare a list of recommendations for people implementing a learning and teaching repository. The outcomes of this discussion will be turned into a document that can be shared. This will complement the structured guidelines for starting a learning and teaching repository produced by the CD-LOR project.

The slides from the event, the audio recordings and the recommendations will all be available from the DRaW website in due course.

I learned an awful lot at the event and it was really gratifying that the tone of discussions became more optimistic as the day wore on. From my conversations with delegates the day was really useful to them and I am looking forward to the remaining events in this series (see more details in this earlier post). I think that this type of event will complement the more traditional JISC programme meeting in a way that is beneficial to JISC programme objectives and to the projects.

Research data curation

Back last year, following the Digital Curation Conference in Washington DC, JISC and the Andrew J Mellon Foundation hosted an international workshop to discuss and suggest where the international priorities are for research and development work supporting academic research data curation. It’s taken a while for the notes to become available, for which I apologise, but here they are:
Priorities for research data curation workshop 2007

(I realise this is a PDF file, which won’t please everyone, but shrunk the filesize by over an order of magnitude from MS Word)

The starting point for the workshop was a recognition that, while research data orients largely by (sub)discipline, the way in which infrastructure is developed and funded is often oriented nationally, or even around institutions. Some way is needed to square these two. I have to confess that, on the day, I wasn’t sure we’d made a lot of progress, but in drafting the notes I changed my mind somewhat. Certainly, Peter Murray-Rust seemed to identify the academic department infrastructure as a key point where intervention could serve both that department and the wider goal of data curation and sharing. The photos of flip chart diagrams are perhaps not easy to read or understand, but suggest a distinctive place for libraries and repositories.

Greg Crane’s Perseus project anticipated some of the topics that were covered later - notably how to design an infrastructure that is sustainable and yet adaptive - there are a few ideas in the notes. there are also a few ideas about how the problem space might be broken down so that an international approach can be taken, though this remains difficult. With luck and effort, JISC’s and other UK ‘data’ work will join up with that in the US (eg the NSF Datanet programme), Australia (Australian National Data Service), etc, and these notes will help us do that.

Many thanks to the workshop participants, listed at the end of the notes.

ReStore workshop

I attended a very interesting workshop for the ReStore project last week. The project is run by Southampton’s ESRC National Centre for Research Methods and is investigating the use of a repository to host and maintain orphan web resources.

The problem that the project is addressing is that very useful web resources are produced by research projects. However when the project funding stops the maintenance of the resources often stops. This means that the resources start to decay, broken links flourish and the usefulness of the resource deteriorates quickly.

ReStore aims to address this problem by accepting suitable resources after a review process and then hosting and curating the sites with a mixture of automated and manual processes.

The project is funded by ESRC and aims to produce a prototype repository that curates a few web resources that have been produced by other ESRC projects.

The workshop was chiefly concerned with introducing the project and discussing some of the major issues such as technical challenges, IPR and sustainability. The presentations from the day can be downloaded from the project website: http://www.ncrm.ac.uk/restore/slides/. These include some mockups of the proposed system and an overview of the proposed review and curation process.

The project’s work on development of a long-term strategy for ESRC in sustaining on-line resources will be very relevant to JISC.

The technical challenges in hosting a range of resources that may all use different software and hardware are significant and it may be better in the short term to use Amazon Web Services or a similar service to host the sites and avoid a large hardware bill.

The costs of preserving research data

There’s a new report on the JISC website, authored by Neil Beagrie, Julia Chruszcz and Brian Lavoie. It looks at how much it costs to preserve research data and, perhaps as importantly, how institutions and others could calculate this. There are lots of reasons why this report is likely to have an impact - looking after research data is potentially costly, and yet it is important that - as a community - we make reasonable decisions about what should be preserved and how. Perhaps unsurprisingly (at least for those who already do this for a living), it seems the cost of ingesting the data forms the largest cost in the curation lifecycle, but at the same time the evidence shows that correcting badly ingested data later is even more costly, so the figures probably suggest that there is a positive cost/benefit calculation here. There is potential for developing the methodology here into a tool, and there could also be potential for some join-up with the Data Audit Framework.

Repositories and Preservation Programme Synthesis

We are proposing to undertake a synthesis of the repositories and preservation programme which will support action. This means that the outputs need to be targeted at decision makers with additional information for those that will have to implement the decisions.

We have taken as a starting point the idea that decision makers are most likely to take note of what we are saying if repositories or preservation address problems that they are already worried about, and that many of these will stem from government, funding council or similar policies which they have to implement.

We have identified policies, decision makers who are concerned with them and ways in which we think that repositories or preservation can help.

We are aware that there will be other policies out there that we should be considering, that there may be other ways in which repositories or preservation could help and there may be other people we need to address.

We would very much welcome comments and thoughts on our thinking so that we can take it forward and start the synthesis.

Please comment either by posting comments or by email to Tom Franklin who is leading on this (tom@franklin-consulting.co.uk).

Research

The Research Excellence Framework is of concern to many at the moment including senior managers, research managers, researchers and librarians. We believe that it is likely that institutional repositories will make collection of the relevant information easier and cheaper and will support whatever metrics are likely to be selected. It is also possible that open access repositories will lead to research being found more easily and therefore cited more widely. This also supports increasing research recognition.

Funding mandates from funding bodies such as research councils and Wellcome can be addressed through the use of required repositories (such as UK Pubmed Central), but through the use of suitable institutional repositories that support things like embargo periods.

Community and business engagement requires that information is made accessible to those that might effective use of it. Institutional repositories may assist here.

Teaching and learning

Cost reduction may be achieved through better sharing of learning materials, including learning objects, this will be of interest to both managers and teachers who need to then implement and make use of repositories, but contributors will also have to think about using appropriate standards. Integration with the VLE would also enable the most current version of materials to be easily accessible.

Quality assurance of courses, especially franchised courses for instance between a university and FE colleges is of concern to senior managers and teachers and could be supported by making learning resources available across the group through use of repositories.

Many institutions and their managers are concerned with retaining control over the IPR of their learning materials, institutional repositories for learning objects offer one way of controlling access effectively.

Information services and libraries

All managers and Staff are concerned with meeting their legal and Contractual requirements including self-deposit / open access and being able to enforce embargoes. Institutional repositories can help with these issues.

Help wanted

Are these the most important drivers?

Are there other drivers that we should consider?

Have we correctly identified the key audiences who can help to identify these things?

Posted by: Tom Franklin

Click streams -Library Managment Systems

I’ve been meaning to do a short post about the recent library systems study that JISC commissioned with SCONUL so people know about it. So here it is. I’ve been reminded of it as I’m at the Eduserv Symposium today and Ken Chad who worked on the study asked a question related to it.

The Eduserv Symposium is focusing on disruptive technologies and what the impact might be on the organisation. So in our case universities and colleges, and as Andy Powell pointed out in his introduction there is also disruption for related service providers such as Eduserv (and for that matter JISC). So one question is how should the academic/education sector respond to the ‘disruptive’ technologies (for that read web 2.0/ service provision on the network e.g. google and amazon services). Ken Chad mentioned the opportunity that the sector has in terms of the data known about users;for example click streams. The library management systems study (that Ken worked on with Sero Consulting) sees this as an opportunity for academic libraries to make their services more relevant to users. Of course there are delicate issues surrounding the use of click streams; not in the least privacy as Larry Johnston, NMC, pointed out in response to Ken’s question at the Eduserv Symposium.

The report covers far more ground that click streams, it is a horizon scan of what is happening in the UK academic sector in terms of LMS provision and what might be the requirements in the changing context that libraries now find themselves.

http://www.jisc.ac.uk/whatwedo/programmes/resourcediscovery/libraryms.aspx

← Previous PageNext Page →