Harvesting usage data?
I was talking with a researcher the other day who said that, despite his institution mandating deposit of research papers in his institutional repository, he didn’t comply - prefering to deposit in an international subject repository. Naturally, I asked him ‘why?’. He said that it was because he wanted each of his papers to be in one, and only one, place on the web, so that he could get accurate download statistics for it. Obviously, we’re aware in the JISC IE team of the various arguments on this topic, and we’ve funded a piece of work to look at the practical ways in which subject and institutional repositories might work together, which could address this issue among others. We’ve also funded various projects on repository statistics, such as ‘Interoperable Repository Statistics’ (which has developed a tool that repository managers can use to analyse and share statistics) and an ongoing small piece of work on harmonising article-level usage data formats. There is also MESUR and other projects in this space.
However, in the real world, it is likely that copies of some research papers are likely to be at various places on the web, and we wondered whether a tool could be built that used fuzzy matching to identify copies that were probably the same paper, some means of querying the servers on which they sat to get download data, and a reliable way of then aggregating that data into some acceptable statistics. Is that an important use case? Is feasible to build something that addresses it?
What’s the relationship (if any) with name authority services (see the JISC pilot Names project) or persistent identifiers (see the JISC Resourcing Identifier Interoperability for Repositories - RIDIR demonstrator)?
Bringing repositories to the attention of university senior managers
There are two new JISC briefing papers on repositories. One is concerned with the benefits of managing and sharing learning objects, the other with managing and sharing research outputs.
JISC and UUK are sending these papers to senior managers in universities next week. The papers should arrive on desks on Monday 16th of June. With any luck, the briefing papers will pique some interest in repositories or at least make sure the concept is familiar to senior managers.
This may represent an opportunity for capitalising on this familiarity or interest with further advocacy directed at senior managers about repository services, policies or projects.
The recipients are likely to be:
- Vice Chancellors,
- DVC Academic,
- DVC Research,
- University Secretary,
- Deans of Schools
Plus some of:
- Records Manager,
- Dean of the Graduate Research School,
- Director of ICT Systems,
- Director of Library& Information Services,
- Director Academic Enterprise,
- Principal Lecturer Pathfinder E-Learning (central post)
The briefing papers can be found on the JISC website:
Learning objects: http://www.jisc.ac.uk/publications/publications/elearningrepositoriesbpv1.aspx
Research: http://www.jisc.ac.uk/publications/publications/researchrepositoriesbpv1.aspx
Repositories and Preservation Programme Synthesis
We are proposing to undertake a synthesis of the repositories and preservation programme which will support action. This means that the outputs need to be targeted at decision makers with additional information for those that will have to implement the decisions.
We have taken as a starting point the idea that decision makers are most likely to take note of what we are saying if repositories or preservation address problems that they are already worried about, and that many of these will stem from government, funding council or similar policies which they have to implement.
We have identified policies, decision makers who are concerned with them and ways in which we think that repositories or preservation can help.
We are aware that there will be other policies out there that we should be considering, that there may be other ways in which repositories or preservation could help and there may be other people we need to address.
We would very much welcome comments and thoughts on our thinking so that we can take it forward and start the synthesis.
Please comment either by posting comments or by email to Tom Franklin who is leading on this (tom@franklin-consulting.co.uk).
Research
The Research Excellence Framework is of concern to many at the moment including senior managers, research managers, researchers and librarians. We believe that it is likely that institutional repositories will make collection of the relevant information easier and cheaper and will support whatever metrics are likely to be selected. It is also possible that open access repositories will lead to research being found more easily and therefore cited more widely. This also supports increasing research recognition.
Funding mandates from funding bodies such as research councils and Wellcome can be addressed through the use of required repositories (such as UK Pubmed Central), but through the use of suitable institutional repositories that support things like embargo periods.
Community and business engagement requires that information is made accessible to those that might effective use of it. Institutional repositories may assist here.
Teaching and learning
Cost reduction may be achieved through better sharing of learning materials, including learning objects, this will be of interest to both managers and teachers who need to then implement and make use of repositories, but contributors will also have to think about using appropriate standards. Integration with the VLE would also enable the most current version of materials to be easily accessible.
Quality assurance of courses, especially franchised courses for instance between a university and FE colleges is of concern to senior managers and teachers and could be supported by making learning resources available across the group through use of repositories.
Many institutions and their managers are concerned with retaining control over the IPR of their learning materials, institutional repositories for learning objects offer one way of controlling access effectively.
Information services and libraries
All managers and Staff are concerned with meeting their legal and Contractual requirements including self-deposit / open access and being able to enforce embargoes. Institutional repositories can help with these issues.
Help wanted
Are these the most important drivers?
Are there other drivers that we should consider?
Have we correctly identified the key audiences who can help to identify these things?
Posted by: Tom Franklin
The top concerns of researchers
What do researchers care about? It’s probably uncontentious to say that they care about access, cost, copyright and quality. There’s a report published last month from the JISC Scholarly Communications Group that goes into a bit more detail:
http://www.jisc.ac.uk/media/documents/aboutus/workinggroups/topconcernsreport.doc
There are perhaps few surprises - the concerns might be paraphrased as ‘lack of access’, ’some funding arrangements inhibit access’, ‘copyright is confusing’ and ‘new types of quality assurance are untested’. One key tool that should help address several, but not all, of of these concerns is a licence to publish. There’s a JISC-SURF one here, but there are certainly others that do much the same thing - ie, help authors retain rights they may need to use and share their papers. It’ll be interesting to see how it gets taken up.
Jorum to move to Open Access
Jorum has recently been awarded £2.4m by JIIE to do what so many people have said needs doing: it is going open access! The new service (“JorumOpen”) will operate under a Creative Commons License and will not require user registration to access and download its content. Users that have already contributed content through existing licences will be contacted to ask if they wish to sign a new open access licence or continue to store their content under the same terms in a parallel service (“JorumPrivilege”).
The new services (collectively known as “Jorum2”) will start being rolled out this Autumn. There will be a range of added value services - such as a development bay to explore integration with VLE’s and to allow users to experiment with learning object reuse - as well as continued R&D. The full press release is attached.
www.jorum.ac.uk
Open Repositories 2008
Before arrival at the recent Open Repositories 2008 conference, I was telling myself that this would be a dynamic, busy and vibrant conference, attended by a technically ambitious and knowledgeable community, and that it would obviously be a great opportunity for me to engage in constant blog activity (reading and writing). As it turned out, the preconceptions I had about the conference were exactly right. The aspirations I had about my own activities in the blogosphere, however, turned out to be more a case of ‘amplified expectations’ rather than the ‘amplified conference’ that Lorcan Dempsey has referred to (http://orweblog.oclc.org/archives/001404.html).
From the more comfortable perspective of two weeks after the energetic and meeting-packed week down in Southampton (that made it impossible to get near a blog!) it’s possible to look back and consider a few of the more prominent features of the conference.
One principal item was the role that OAI-ORE (Open Archives Initiative Protocol – Object Reuse and Exchange (http://www.openarchives.org/ore/) may have in describing the structure and semantics of aggregations of web objects, thereby making those objects available to a variety of applications. Though still in beta (or perhaps even alpha) by the time of the conference, this data model was used in the development of the winning prototype of the ‘Repository Challenge’ competition (http://or08.ecs.soton.ac.uk/developers.html ) - a JISC/CRIG sponsored event that was an important and characteristic feature of the conference.
Tim Brody (University of Southampton) along with fellow team members, Ben O’Steen (University of Oxford) and Dave Tarrant (University of Southampton) developed the winning application which was called ‘Mining the ORE’. Tim Brody describes it as …
‘A practical approach to copying complex objects between repositories. Every eprint in a repository is exposed as an ORE aggregation (Object Reuse and Exchange). Each ORE
aggregation of an eprint links together all the files and associated metadata. This aggregation of files had one resource that was marked as conforming to simple Dublin Core and this was used as the basis of the metadata interoperability. When ingested into a new repository each resource in the ORE aggregation is retrieved and stored. The simple Dublin Core is used to index the new eprint for the purposes of search and discovery, otherwise all of the component resources are simply shown to the user. We implemented exemplar ORE interfaces for both EPrints and Fedora, enabling the transfer of complex objects between the two system implementations.’
19 teams entered the ‘Repository Challenge’ and in total over 40 developers were involved in creating the rapid prototypes. Five prototypes were shortlisted by an international panel of judges and the winner was then selected by a balloted vote from the conference delegates at the OR08 awards dinner. This type of developmental process is a new departure in terms of JISC-funded initiatives but has proved to be potentially of great benefit in terms of providing candidate service-usage models (SUM’s) for submission to the e-Framework, and other forms of documentation including training materials and case studies. It would be very interesting to hear views and opinions about the value of this form of rapid prototyping exercise. Anyone interested should contact David Flanders at the Common Repositories Interface Group (CRIG) http://www.ukoln.ac.uk/repositories/digirep/index/CRIG. David was the driving force behind the Repository Challenge at OR08 and its success was entirely to do with his energy and determination.
Returning to the mainstream sessions of the conference, Peter Murray Rust gave the first keynote speech and urged delegates to be wary of the ubiquitous use of the pdf format to capture the complexity of scientific information. This reluctance to accept what has become the de facto deposit standard clearly rang bells with some delegates (http://scilib.typepad.com/science_library_pad/2008/04/or08—the-pres.html).
One of the challenges tackled by many presenters was how to ease the burden of deposit and how to incorporate web 2.0 interfaces and techniques into repository design and workflow. The automation of metadata tagging and the design of batch ingest procedures were also variously discussed.
All the papers are being made available in the OR08 repository (http://pubs.or08.ecs.soton.ac.uk/) and this will give some idea of the complexity of the main part of the conference. What it won’t describe is the amount of peripheral but important activity that happened around these presentations, encompassing: Fedora, e-Prints and DSpace group meetings; a repository manager forum; a developer barcamp; an international meeting about Global Registries; a EurOpen Scholar day addressing issues about Open Access … not to mention gatherings and briefings put together by commercial participants such as Microsoft, who introduced the research data repository platform that they have been developing.
Perhaps the very busiest part of the conference was the one that I almost completely missed. If Owen Stephens’ experience of the conference was anything to go by (and this was someone who wasn’t even at the conference), then all the ‘amplification’ that was going on was perhaps a bit too much! http://ukwebfocus.wordpress.com/2008/04/08/micro-blogging-at-events/#comment-64627.
The ‘chattering classes’ is obviously a thing of the past. Now we have the ‘twittering’ classes.
The Research Data Management Forum
This week I went to an early meeting of the Research Data Management Forum, co-sponsored by the Digital Curation Centre and the Research Information Network. The management and curation of research data is both a hot topic and a major challenge – not always a happy combination. This meeting of the Forum was open to anyone, and a diverse group attended, including several directly involved in managing research data and several more, like me, who have an interest supporting that work. In many ways the challenge of managing the digital data deluge is beyond the capacity of a single forum, and at times the list of unanswered questions prompted by the discussion threatened to sink the enthusiasm of even the keenest curator. There seems to be so much that needs doing. However, the main message I took away was the urgent need for more and better evidence: What are the benefits of curating and sharing research data? What are the benefits of having people in UK higher education skilled in data management? To whom do these benefits accrue? The evidence may be of a variety of kinds. Certainly, case studies can help show where these benefits arise and to whom in particular cases. However, what’s also needed is some serious economic modelling of the kind recently deployed by Professors Newbery and Bently and Rufus Pollock in their recent report on ‘Models of Public Sector Information Provision’ and, in a different context, by John Houghton and his colleagues on ‘Research Communication Costs in Australia’.
To supplement this message, and assuming the evidence shows that data curation and sharing is beneficial to UK higher education, to the UK more widely, and to research in general, the question arises what’s in it for researchers? In many disciplines data sharing is not common, and this can be for good reasons. The forthcoming report on ‘data publication’, commissioned by the Research Information Network, Natural Environment Research Council and JISC, will document the picture in some detail. A missing piece in the puzzle is the full citation of specific datasets, which is uncommon. Should this become common, metrics could be derived from aggregated citations to indicate the extent to which datasets were referenced, and academic credit (and therefore incentive) could follow. The difficulty is less in principle (for example, datasets from the UK Data Archive should already be cited whenever used, and can be easily) but in practice; it just rarely happens. JISC is funding several projects that might help – CLADDIER, StoreLink and OJIMS are the most obvious examples, but it’s not necessarily something that a JISC project alone can address.
Posted by: Neil Jacobs
Open Repositories 2008
2008 … is that the date that it’s happening or is that the number of delegates that have decided to come along! This year’s Open Repositories conference (to take place in Southampton 1- 4 April) has easily surpassed the number of registrants that signed up for the same conference last year when it took place in Texas. This is interesting for two reasons I think. Firstly, some people have entertained the notion recently that maybe repositories haven’t lived up to their own hype … that they don’t contain enough material to be taken seriously … and that academics will never be persuaded to engage with them in a sustained and serious way. Well if you want to argue this, where better to do it than amongst more than 400 people from all over the world, almost all of whom are committed to ensuring that repositories are open, interoperable and as effective as possible for everyone involved with research and teaching. Except of course if you do want to do that (and you haven’t registered already) then you’re a bit late … ‘cos registration has closed! What you can do though is look at this blog during the conference because I’m intending to do some posts during the conference.
I said there were two interesting things didn’t I? The other thing is about size. I always thought they like to do everything big in Texas. Big hats. Big steaks. Big country. I didn’t realise that Southampton had even bigger aspirations. I’ll have to remember to pack my fifteen-gallon hat when I set off down South in two weeks time.
Posted by: Neil Grindley
A tool for revealing web conversations and citations for an open access scientific paper
I found this via Jon Udell’s blog and thought it might be useful for those of you who are expecting to host scientific papers in your repository.
It is a tool that could be embedded in a repository splash page to reveal what open access conversations are occurring about an open access paper. The tool uses either doi, url or pmid and displays:
- blog postings that have discussed the paper (using postgenomic);
- comments that have been made on pubmed;
- people who have bookmarked the paper using connotea;
- how many citations are recorded for the paper on pubmed central and scopus.
The demonstrator is avaliable from Alf Eaton’s Hublog
It might be worth reading Jon Udell’s blog post for the background that lead to Alf Eaton posting this demonstrator.
Now I don’t use a repository on a daily basis so this may have disadvantages that are hidden from me, or be problematic to embed in a repository, but I think that putting this kind of tool on your repository might be a nice selling point for any scientists you are trying to convince to deposit.
I would love to hear from anyone who has tried this out on a live repository.
Posted by: Andy McGregor