Metadata for stuff in repositories …

I just wanted to highlight some metadata application profile work that is underway as part of the information environment repository programme. Having attended the birds of a feather session (coordinated by Rosemary Russell, UKOLN and Julie Allinson, University of York) about this at OR08 I finally got to see what JISC had funded. Today at the JISC Repositories and Preservation Advisory Group we discussed some of the work and I guess it made me think it was worth making a few more people aware of it. JISC has funded the development of:

metadata application profiles based on Dublin Core for:
Scholarly works
Geo-spatial data/information
Images
Multi-media

And we’ve also funded some work to assess what might be done in terms of application profiles for the following:
Learning objects
Scientific data

A little bit of context…
After using OAI-PMH across repositories in the JISC Focus on Access to Institutional Repositories (FAIR) programme the experience was that Dublin Core was often not rich enough to be very useful to end user applications. The requirement for both metadata and full text indexing was a specific recommendation of the FAIR ePrints UK harvest and search project. After other work also confirmed this the response was to seek to add to basic DC by developing an application profile. The scholarly works application profile (SWAP) was developed by Julie Allinson (at the time UKOLN now University of York) and Andy Powell (Eduserv Foundation). SWAP aims to help support richer search functions and also to support full text indexing, and as I understand it another benefit is navigation between different versions. It is based on the Functional Requirements for Bibliographic Records (FRBR) model which uses the following entities: work, expression, manifestation and item.
You can read more about SWAP here:
http://www.ariadne.ac.uk/issue50/allinson-et-al/

SWAP, although based on a FRBR type model was kept quite simple. It seems that when creating SWAP some hard lines were drawn to avoid too much complexity and from the feedback I have heard it seems to have addressed requirements. It was certainly good to hear from one of the attendees at the OR08 meeting that SWAP was “exactly what they required”. Mick Eadie (Visual Arts Data Service, University College for the Creative Arts) also described the images AP at the OR08 meeting, and it seems to have tried to keep a simple approach too. A draft of the images AP is now out for comment. See:

http://www.ukoln.ac.uk/repositories/digirep/index/Images_Application_Profile

Of course to get the real benefit of these application profiles the implementation of them has to be made as easy as possible and we need to encourage take-up. Working with repository software providers to support the APs is one thing that might be possible and the teams supporting the work intend to do this. SWAP has been implemented at Warwick University as they customised EPrints software to support it.

If you really want to help or know someone that can :-) a job advert is currently out for a related metadata advocacy post at UKOLN: http://www.ukoln.ac.uk/vacancies/08H127A/job-ad/

Note that SWAP is the most mature of the APs; the other areas are in initial draft and are still being developed.

Here are some related links:

SWAP:

http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile

The geo spatial work that James Reid, EDINA (University of Edinburgh) is leading on is currently out for comment:

http://www.ukoln.ac.uk/repositories/digirep/images/e/ef/Geospatial_Application_Profile.doc

Work done by Phil Barker, CETIS, (Heriot-Watt University) on the learning material application profile is here:

http://www.icbl.hw.ac.uk/lmap/domainModel.draft1.html

It is probably worth mentioning that previously some work has been done for learning materials/objects. See information on RLLOMAP: http://www.intute.ac.uk/publications/rdn-ltsn/ap/
and: http://standards-catalogue.ukoln.ac.uk/index/UK_LOM_Core
RLLOMAP seems to have a similar aim to the current work in that it was to support the exchange of metadata using OAI-PMH and UK LOM did build on this.

Not surprisingly the multi-media application profile is a tough one and drafts are not yet available as far as I know. But I do know via Pete Johnston (Eduserv Foundation) that there are some early results being reviewed! Gayle Calverley is leading the work in this area.

There is also the DCMI Scholarly Communications Community where discussion should take place about the application profiles once the work picks up as a whole (coordination and outreach is currently being planned):
http://dublincore.org/groups/scholar/

Jorum to move to Open Access

Jorum has recently been awarded £2.4m by JIIE to do what so many people have said needs doing: it is going open access! The new service (“JorumOpen”) will operate under a Creative Commons License and will not require user registration to access and download its content. Users that have already contributed content through existing licences will be contacted to ask if they wish to sign a new open access licence or continue to store their content under the same terms in a parallel service (“JorumPrivilege”).

The new services (collectively known as “Jorum2”) will start being rolled out this Autumn. There will be a range of added value services - such as a development bay to explore integration with VLE’s and to allow users to experiment with learning object reuse - as well as continued R&D. The full press release is attached.
www.jorum.ac.uk

Open Repositories 2008

Before arrival at the recent Open Repositories 2008 conference, I was telling myself that this would be a dynamic, busy and vibrant conference, attended by a technically ambitious and knowledgeable community, and that it would obviously be a great opportunity for me to engage in constant blog activity (reading and writing). As it turned out, the preconceptions I had about the conference were exactly right. The aspirations I had about my own activities in the blogosphere, however, turned out to be more a case of ‘amplified expectations’ rather than the ‘amplified conference’ that Lorcan Dempsey has referred to (http://orweblog.oclc.org/archives/001404.html).

From the more comfortable perspective of two weeks after the energetic and meeting-packed week down in Southampton (that made it impossible to get near a blog!) it’s possible to look back and consider a few of the more prominent features of the conference.

One principal item was the role that OAI-ORE (Open Archives Initiative Protocol – Object Reuse and Exchange (http://www.openarchives.org/ore/) may have in describing the structure and semantics of aggregations of web objects, thereby making those objects available to a variety of applications. Though still in beta (or perhaps even alpha) by the time of the conference, this data model was used in the development of the winning prototype of the ‘Repository Challenge’ competition (http://or08.ecs.soton.ac.uk/developers.html ) - a JISC/CRIG sponsored event that was an important and characteristic feature of the conference.

Tim Brody (University of Southampton) along with fellow team members, Ben O’Steen (University of Oxford) and Dave Tarrant (University of Southampton) developed the winning application which was called ‘Mining the ORE’. Tim Brody describes it as …
 
‘A practical approach to copying complex objects between repositories. Every eprint in a repository is exposed as an ORE aggregation (Object Reuse and Exchange). Each ORE
aggregation of an eprint links together all the files and associated metadata. This aggregation of files had one resource that was marked as conforming to simple Dublin Core and this was used as the basis of the metadata interoperability. When ingested into a new repository each resource in the ORE aggregation is retrieved and stored. The simple Dublin Core is used to index the new eprint for the purposes of search and discovery, otherwise all of the component resources are simply shown to the user. We implemented exemplar ORE interfaces for both EPrints and Fedora, enabling the transfer of complex objects between the two system implementations.’

19 teams entered the ‘Repository Challenge’ and in total over 40 developers were involved in creating the rapid prototypes. Five prototypes were shortlisted by an international panel of judges and the winner was then selected by a balloted vote from the conference delegates at the OR08 awards dinner. This type of developmental process is a new departure in terms of JISC-funded initiatives but has proved to be potentially of great benefit in terms of providing candidate service-usage models (SUM’s) for submission to the e-Framework, and other forms of documentation including training materials and case studies. It would be very interesting to hear views and opinions about the value of this form of rapid prototyping exercise. Anyone interested should contact David Flanders at the Common Repositories Interface Group (CRIG) http://www.ukoln.ac.uk/repositories/digirep/index/CRIG. David was the driving force behind the Repository Challenge at OR08 and its success was entirely to do with his energy and determination.

Returning to the mainstream sessions of the conference, Peter Murray Rust gave the first keynote speech and urged delegates to be wary of the ubiquitous use of the pdf format to capture the complexity of scientific information.  This reluctance to accept what has become the de facto deposit standard clearly rang bells with some delegates (http://scilib.typepad.com/science_library_pad/2008/04/or08—the-pres.html).

One of the challenges tackled by many presenters was how to ease the burden of deposit and how to incorporate web 2.0 interfaces and techniques into repository design and workflow. The automation of metadata tagging and the design of batch ingest procedures were also variously discussed.

All the papers are being made available in the OR08 repository (http://pubs.or08.ecs.soton.ac.uk/) and this will give some idea of the complexity of the main part of the conference. What it won’t describe is the amount of peripheral but important activity that happened around these presentations, encompassing: Fedora, e-Prints and DSpace group meetings; a repository manager forum; a developer barcamp; an international meeting about Global Registries; a EurOpen Scholar day addressing issues about Open Access … not to mention gatherings and briefings put together by commercial participants such as Microsoft, who introduced the research data repository platform that they have been developing.

Perhaps the very busiest part of the conference was the one that I almost completely missed. If Owen Stephens’ experience of the conference was anything to go by (and this was someone who wasn’t even at the conference), then all the ‘amplification’ that was going on was perhaps a bit too much! http://ukwebfocus.wordpress.com/2008/04/08/micro-blogging-at-events/#comment-64627.

The ‘chattering classes’ is obviously a thing of the past. Now we have the ‘twittering’ classes.

Case studies galore

As part of the Repositories Support Project’s session for Repository managers at the splendid Open Repositories 08, the conference organisers collected a load of case histories from repository managers in the US and Europe.

The case histories have been made available on the Repository Support Project’s website. They cover a variety of different repositories in a variety of different settings. Some of them are short and some are long but they are all an interesting read.

As far as I can see these are useful in a number of ways:

Posted by: Andy McGregor

Is this an effective development community?

The information environment, and repositories in particular, were highlighted by Sir Ron Cooke (JISC chair), in his opening keynote at the JISC conference. (See the online conference proceedings.)

He described the vision of a national e-infrastructure supporting the “body of knowledge” at the centre. He told delegates that “[his] nightmare is the challenge of the super-abundance of digital data” and stressed the importance of positioning our repositories very carefully in this landscape of abundant information. From a seemingly different perspective, the closing keynote by Angela Beesley described the work of the Wikimedia foundation, which includes Wikipedia but also other interesting projects I had not heard of before. Their vision is of open access, of making as much knowledge as possible available to the world. Their solution is less about infrastructure and more about mass, scaleable workflows. Her answer to “can you trust user-generated content?” was a refreshingly firm “no. but you can trust the process”.

So how do we develop a layer of scholarly information (for research, learning and teaching) where individuals can find, use and share trusted information, supported by an agile infrastructure provided by institutions, publicly funded shared services, commercial services and wikipedia? It’s a heady mix. I took heed from Ron’s warning that “it’s often easier to have the vision than to have the stamina to battle against institutional inertia or even resistance”.

I think that’s the key challenge for us now, in the world of digital libraries and e-infrastructure. How do we ensure that we’re building firm foundations instead of castles in the sky? How do we avoid going down routes that are technically interesting but offer no tangible benefits to staff and students in institutions?

An important part of the answer is in how we, as a development community, work together to make sure we’re doing the right sorts of things in the right way in the right order. This was the focus of the Rapid Community Building session I went to in the afternoon . The Users and Innovation Development Model marries up the requirements analysis process with the development process to encourage constant sense-checking and quality assurance. We need this on a grand scale if we’re to continue developing in the right direction. The Emerge project is about sharing ideas to support this virtuous cycle and the overall impression I had was of creative chaos! Not everyone wants to work in the web2.0 way. But perhaps if every cluster of developers has an enthusiastic communicator then the community will get more of the benefits sooner.

I’ll finish with a quote and a question.

Quote, with thanks to George Roberts in the community building session:
“Much of what works is already there” Cooperrider and Srivastva (1987)

Question … Is it true? How do we review what works? How do we address the gaps? The IE team really wants to hear from projects how we can improve the development cycle, from identifying useful projects through to embedding outputs. What sorts of things can we all do to make this process work better?

Research data and the JISC IE

We’re hoping to present some themed web pages on the innovation work being funded under the JISC Information Environment area, including one on research data. I thought I’d use this blog to offer preview / pilot that page. I’m not sure if that’s an acceptable use of a blog, but I’m sure I’ll find out.

The aim of the JISC IE work on data is to promote and enable new ways of finding, using and sharing research data. Because there are huge variations in what ‘data’ is, and in disciplinary cultures and practices around it, there is likely to be a ‘mixed economy’ of infrastructure and services to support its management.

There has been a large number of reports on data recently, some of which are helpfully listed in a recent presentation by Michael Jubb of the Research Information Network. Three key documents are the report from the then Office of Science and Innovation on ‘e-infrastructure’, which set out a high-level vision, a set of principles for data stewardship developed by the Research Information Network, and the ‘Dealing with Data‘ report from JISC/UKOLN, which made practical recommendations.

In terms of current practice, two projects promise to paint a clear picture from different perspectives. A study of ‘data publication’ practice among researchers has been funded by JISC, the Research Information Network and the Natural Environment Research Council. A different project, SCARP, is exploring disciplinary attitudes and approaches to data deposit, sharing and re-use, curation and preservation.

JISC and the Engineering and Physical Sciences Research Council have jointly funded the Digital Curation Centre (DCC), which is a centre both of innovation and of guidance. Members of the DCC are developing a Data Audit Framework, which will enable universities to assess what data is being held on their computer systems, and who is responsible for it. The Data Audit Framework will be piloted in a number of universities in 2008.

There is a suspicion that the sector lacks sufficient skilled people to manage research data effectively. A report is due shortly that will review the position and make recommendations on how this might be addressed. The DCC will run a summer school this year to begin to address this issue. Of course, investment will only follow if a business case can be made, and a part of making that case is assessing the costs of preserving data. A methodology is being developed that will enable estimates to be made, though of course without assessing the benefits of keeping data, it is only half the story.

The UK is fortunate to have both the UK Data Archive (co-funded by JISC and the Economic and Social Research Council) and the data centres supported by the Natural Environment Research Council. These services offer expert advice and infrastructure for data management. A feasibility study is underway into the possibility of a UK Research Data Service as a collaboration between some UK universities, to fill in some of the gaps between such data centres. In addition, the DISC-UK Datashare project is looking at how UK higher education can increase its capacity to curate and share research data.

Finally, it is worth noting that JISC also funds work under the heading of ‘e-Research’, which is also focused on research data, including grid and semantic enabling of datasets.

Open Repositories 2008, and Web Science

Having missed most of the presentations at the Open Repositories conference in Southampton this week, my reflections on the event have been prompted more by the Southampton-MIT collaboration described by Wendy Hall and Nigel Shadbolt before the conference dinner, the Web Science research centre. Initiatives such as this (see also, for example, the Oxford Internet Institute), with their focus on an interdisciplinary understanding of the web as a ‘first class’ research object, are particularly timely. I was struck by a potential parallel: an Australian colleague had earlier told me how researchers and technologists there are working to create an interdisciplinary research data resource on breast cancer; on the one hand there are similarities (the alliance of researchers and technologists brought to bear on a research topic), but there are differences. It could be argued (though not all would agree) that the web is a product of social interaction in a way that breast cancer is not. That is, technology (including the infrastructure underpinning science) is ‘social relations made concrete’. Incidentally, those social relations include interventions by those studying or evaluating science practice, so that ‘web science’ is a reflexive undertaking in a way that the study of breast cancer is not (again, not all would agree, see the ‘Science Wars’ entry on Wikipedia).

Some examples from the conference: Johan Bollen presented the outstanding and topical LANL MESUR work on metrics. Fifteen years ago, Steve Woolgar* alerted us to the social, as well as the academic, reasons for the persistence of citation metrics as a tool in research evaluation. This mash-up of social and academic relations is likely now to be embedded in a technical infrastructure for research, so that it is important that the social aspects are well understood before that infrastructure is ‘fixed’. For example, should we be concerned about the potential of this infrastructure for surveillance?

One of the most successful parts of the conference was the ‘Repository Challenge’ (and I don’t just say that because JISC sponsored it!). Some 19 teams of developers competed in building potentially useful tools from existing services and components. It’s perhaps telling that many of those shortlisted focused on ingest, getting material more easily into repositories. In particular, the aim seemed to be to make ingest invisible. What does this say about the relations between the repository community and scientists?

Finally, I was struck by the number of times within a single evening that the conversation turned to the key role played by seemingly rather prosaic aspects of university organisation. Two examples: (i) Talking with a developer who wanted to use Amazon’s S3 storage services, the almost insurmountable obstacle was the difficulty in getting access to an institutional credit card (the only means of payment). (ii) Talking with a scientist-informatician, it became clear that the move in the UK to a single pay spine for all those working in a university does not mean that the boundaries around traditional academic disciplines are any less rigid – he wants to employ people with both science and informatics skills, but they have no comfortable home within the set of university roles as currently defined.

The range of these concerns, from research evaluation policy to whether or not a developer can use the departmental credit card, shows that ‘web science’ (as a practice and a topic for research) operates on a broad front, not all of which is especially elevated. If we’re to appreciate the ways these interconnect, then the need for some insight from disciplines such as anthropology seems obvious.

*Woolgar, S. (1991) Beyond the citation debate: towards a sociology of measurement technologies and their use in science policy. Science and Public Policy, 18(5), 319-326.

Posted by: Neil Jacobs

Host your own programme meeting

JISC regularly holds meetings for the people involved in the projects funded under a particular programme. These programme meetings are popular. Well, some parts are, the networking parts are popular, the parts where we discuss JISC objectives and reporting are less popular.

Because the Repositories and Preservation Programme is very large (circa 80 projects) and addresses a variety of themes, it is very difficult for JISC to design programme meetings that meet the networking and sharing needs of all the project staff while still acheiving those pesky JISC objectives.

To help with this difficulty we decided to offer extra funding to project staff to enable them to host their own programme meetings. These meetings would be free from JISC interference (unless desired) and would be based around themes chosen by the project staff. The only restriction we placed on the meetings was that they had to include case studies and networking.

We got a healthy response to this idea and as a result the following meetings will be funded:

20th May 2008 - Differences between research repositories and repositories for learning and teaching purposes - DRAW project (University of Worcester).

June 2008 - From VLE to Repository: How do we do it? - CURVE project (University of Coventry).

June 2008 - Digital Curation and Preservation Projects Forum – Placing Ourselves in the Bigger Picture - The preservation strand of the Repositories and Preservation Programme.

September 2008 - The Impact of Organisational Culture on Repository Growth and Development - Embed project (Cranfield University).

November 2008 - Advocacy issues in populating institutional repositories - BURP project (University of Bradford).

November or December 2008 - Demonstrating and exploiting repository value - NECTAR (University of Northampton) and WRAP (University of Warwick) projects.

All dates may be subject to change.

The meetings are primarily for staff working on projects funded under the repositories and preservation programme however some people from outside may be invited and any spare capacity will be opened up to the wider community.

Further details will be blogged in due course.

Posted by: Andy McGregor

1,924 collections added to the Information Environment Service Registry

The Information Environment Service Registry is now richer by nearly 2,000 resources. These resources are collections of content that are hosted in the UK, relevant to UK higher education and free at point of use. The information about these collections was collected by the recent Digital Repositories and Archives Inventory.

The inventory completed in October and discovered 1,924 collections. Phase 2 of the inventory is due to complete in June 2008 and is expected to push the total number of collections up to approximately 3,000. The collections gathered from phase 2 will be added to the Information Environment Service Registry sometime after June 2008.

Users will benefit from this content being added to the Information Environment Service Registry as the resources can be easily discovered by portals and other applications. Similarly the collection owners will benefit as this represents another mechanism for discovering their resource.

Posted by: Andy McGregor

The Repository Challenge at OR08

judges

The judging of the repository challenge is in full swing and our august panel of judges are being treated to fast and furious demonstrations from eighteen different teams made up of delegates at the Open Repositories 2008 conference. This competition has been organised by the Common Repositories Interfaces Group and $5000 prize money is at stake for the best demonstration of some capacity to create a new and useful item of functionality that will work across at least two different repository platforms. Entrants have 5 minutes to put their idea over to our panel of 5 judges and then have to face a further five minutes of questioning. This is tough …! but all the participants are doing a great job of communicating their hard work, some of which has been created in hotel rooms and at various refectory tables around the campus of Southampton University over the last two days. The prize will be awarded at the conference dinner this evening.

Posted by: Neil Grindley

← Previous PageNext Page →