Why do repositories quickly become so complex? One answer is simply scope creep – repositories have roles in dissemination, research information management and curation and, facing these three ways, it is inevitable that the demands placed upon them mushroom. Without wanting to open up any arguments around SOA or RESTful approaches, one answer is to go back to Cliff Lynch’s 2007 description of the institutional repository as “a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members”. This approach seems to be having a revival.
The California Digital Library (CDL) is charged with providing a environment that enables those at the University of California to curate digital assets effectively. Rather than adopting a single solution, they have pioneered an approach based on “micro-services”. In this approach, the fundamental curatorial functions are disaggregated and provided by a managed and defined set of discrete services. They claim that this increases the flexibility of the environment, its ability to exploit changing technologies, and enables it to develop sufficient complexity to deal with evolving demands without becoming baroque. The approach has also been adopted at Northwestern University and Penn State in the US. The topic was of considerable interest at the recent Scholarly Infrastructure Technical Summit (SITS) meeting.
It’s an approach followed in several current projects, including Hydra. The discussion at the SITS meeting seemed to focus in part on the degree to which such micro-services can be standalone, as some of the CDL ones can be seen, or require that certain assumptions can be made about the environment in which they will be used, as in Hydra (Fedora). In reporting on the SITS meeting, Dave Challis notes that “I’m not convinced the specs for these are well defined enough for general purpose use yet”. There may be useful lessons from initiatives such as the e-Framework on the circumstances in which such definitions are feasible.
Relatedly, perhaps, it was interesting to hear Chuck Humphrey (Head of the Data Library, University of Alberta) speak at the recent SPARC Repositories conference describing the approach taken in Canada whereby a distributed OAIS environment is being established based on discrete services deployed across the country. Previous JISC work such as the Sherpa DP and PRESERV projects explored some of the options a few years ago, and these lessons may be worth revisiting in the light of the micro-services discussions.
There is probably some further learning to be done about what constitutes a viable, usable and sustainable micro-service and, with real examples out there now to use, there is a chance that people’s experiences of providing and using them will be shared.
Neil, Thanks for a challenging post. Microservices? If we were to look at this from the perspective of a highly successful domain elsewhere, we might call these apps? That’s what EPrints is doing with its forthcoming Bazaar http://eprintsnews.blogspot.com/2010/09/theres-repository-app-for-that-eprints.html. This is intended to be a store where repositories can find and install apps in a single click.
To pick up your wider point on curation and preservation, you can find preservation tools (or apps), and bundled tools – such as FITS http://hul.harvard.edu/ois/digpres/tools.html, or EPrints preservation apps (will be in the Bazaar) – that combine preservation functions into a single controllable resource, just as is anticipated with microservices. In the case of the EPrints apps, the key is not just in the combination of tools, but the presentation and the target user. In this case, we wanted to put the interface in the repository for use by repository managers and administrators. Of course, those repository managers may or may not be trained archivists.
So whereas in the Preserv project, mentioned in your post, we investigated the prospects for distributed preservation services, what these recent developments reveal is a focus on tools, and what these tools seek to do is effectively to incorporate and deliver specialist expertise directly to the user rather than through a service provider.
In the KeepIt project we designed a course on preservation tools for repository managers http://blogs.ecs.soton.ac.uk/keepit/tag/keepit-course/. Remarkably, you can find a tool to support any part of a repository preservation strategy and workflow (not all of these are necessarily implementable as microservices), and btw, about 70% of these tools are from JISC projects. We have to recognise that there are now a diverse range of repositories and repository content, and these are all supportable as well.
Here’s the crunch. Microservices or not, this still leaves a lot for the general repository manger to do, in terms of preservation. When faced with the challenge of doing digital preservation, what will institutional repository managers choose to do? http://blogs.ecs.soton.ac.uk/keepit/2010/11/30/keepit-preservation-exemplar-repositories-the-final-countdown/ What we found with our exemplar repositories in KeepIt is that they will choose different tools pragmatically and strategically, depending on the type of repository or the stage of development reached, scope and institutional (and perhaps national) context and technical platform (repository software). Or, to put it another way, it will typically be piecemeal and take longer than the repositories want for them to build their preferred solutions.
So microservices (or apps) look likely to be a valuable development, but for institutional repositories it is important that the functionality is delivered through appropriate interfaces for non-specialist users. Wherever possible that functionality might usefully be bundled to save time in implementing a more complete repository solution.
Neil, I’ve been interested in the CDL micro-services approach for some time. I have great respect for the CDL team; the DCC (Research) hosted John Kunze for a couple of months a few years ago, and I’ve visited CDL since then.
My slight concern is: why this set of micro-services? This concern was sparked by the Pair-tree service, as an example. Why should a service that creates a tree structure of directories be a fundamental micro-service?
Putting this another way, where is the reductionist analysis that leads to this particular set of micro-services, as opposed to any other? I’ve asked a couple of times but have yet to see more than sophisticated descriptions of the services themselves.
Is this something that JISC could address, perhaps with CNI and/or IMLS/NSF?
Thanks to Steve and Chris for your replies. Taking the following points from them (though I know these miss out a lot of what you said):
Steve- Repository managers need usable tools rather than raw microservices. Toolsets are emerging, and the EPrints apps bazaar is an example of this kind of approach.
Chris- How are the services factored to give a set that are likely to be widely useful?
Seems to me there is a relation between these two points, which is that the answer to the second may be found in the ease with which widely-used toolsets are developed. That is perhaps over-simplistic?
The question i wanted to ask, which I think speaks to both Steve and Chris’s comments, is something like “what is the assumed repository/service environment in which such-and-such microservice is viable and makes sense?” The EPrints apps clearly assume EPrints; Hydra assumes Fedora… CDL assumes ?? (and “??” here is probably the key to answering Chris’s question).
My guess is that this is largely an empirical question – we’ll find out by seeing which microservices get used? Once again, metrics data comes centre-stage perhaps.
Pingback: Microservices in (and beyond) Research Information Management | The Indo-EuropeanThe Indo-European