The Magic Word is Repcloud

It is rare that the venue is discussed more on Twitter than the content of the event but at the Repositories and the Cloud (#repcloud) event on the 23rd of February it was a close run thing. The event took place at the rather amazing Magic Circle Headquarters in Euston which gave the speakers and audience plenty of scope for magic related puns!

The focus at Repcloud was discussions about how the leading Digital Repository vendors (represented here by Duraspace, EPrints and Microsoft Zentity) were integrated ‘cloud’ capabilities into their offerings to the community and also how ready and interested the community was for those options.

The three speakers in the morning covered their, mainly beta or upcoming, strategies for ‘cloud’ services and introduced us to some interesting projects and ideas.

Eduserv have already released the videos and presentations online as well as some ‘voxpox’ videos and a link to the Twitter archive on Twapperkeeper so I will just pick out a few highlights.

Duraspace, via its DuraCloud pilot projects, is pushing ahead in this area. The work they are doing seems particularly focused on [multi]media objects rather than traditional papers and they are utilising the ‘cloud’ to not only provide storage but also large computational jobs offering filetype conversion (TIFF to JPEG2000) and video encoding and streaming. They did identify the fact that bandwidth to and from the ‘cloud’ remains an unreliable and costly process though.

Alex Wade from Microsoft did something of a whistle-stop tour through the offerings of Microsoft research and the MS ‘cloud’ service Azure. Some particular points of interest here were the relationship Microsoft has formed for the NSF in the US around offering ‘cloud’ capability, the Research Information Centre which they developed in collaboration with the British Library which includes Sword based push to repository capability and Entity Cube which maps relationships between people (and places) in a visual manner by querying the open web.

The final take of the morning was from Les Carr of EPrints. The focus here was on the changes in EPrints 3.2 that allows for a ‘hybrid storage solution’ mixing local and ‘cloud’ offerings as most appropriate and using the new Storage Controller to write rules that would make decisions on what to do with uploaded content on the fly ensuring the optimal performance of the repository. Something I found particularly interesting (and wrote about elsewhere) was the idea of EPrints offer a ‘Blogger’ like service that would allow users to fire up their own repository on-demand. I thought this sounded like a very clever idea and was surprised it wasn’t discussed further.

The afternoon opened with Terry Harmer from the Belfast e-Science Centre who gave a hugely interesting talk from a perspective of someone who has been using ‘cloud’ capabilities as the backbone of their operations since before the term ‘cloud’ became popular. What was particularly interesting here (to me at least) was how they had used this ‘cloud’ based strategy to circumvent the limitations and bureaucracy at their host institution and as such were much more flexible with the options they could offer.

The rest of the afternoon was split into separate technical and policy sessions (though judging by the discussion after there was more than a little cross-over.)

Issues around geo-location and legal issues (particularly IP) were raised in both sessions. SLAs were also discussed in both sessions with differences from one provider to another identified as a bit of a problem.

Issues around the true costs of using ‘cloud’ services also came up in both sessions – with the issue of bandwidth costs coming up again in the Technical session.

The day ended with Andy Powell and Rachel Bruce closing things by saying both JISC and Eduserv would continue the conversation to see what the next steps might be for work in this area.

I found it a useful day and it seemed that it was a topic that many of the attendees were thinking about and moving towards to some extent or another.

[As I attended the Technical session the following are my notes from the Q&A in that session]

The first discussion in the Q and A session was raised by Paul Browning around the issue of deposit from the desktop and whether WEBDav was still a relevant option (not sure how much this had to do with Cloud but it was obviously a topic that was of interest.)

Both Brad and David discussed the issues [problems] with using WebDAV in a Repository environment and in particular the difficulty of mapping the WebDAV files/folders model to the Repository storage model. ePrints had used WebDAV to create some Read-Only functionality to assist in batch-processing but had abandoned working on any read/write functionality (as had DuraSpace). Ian Boston from Cambridge did step in and defend WebDAV pointing out that the WebDAV protocol was sound in of itself the problem was with the implementation of the clients created to deal with it. Nonetheless the issues were seen as being too much of a hurdle for it to be used in a production environment.

Getting back to Pauls original question the ‘Author Add’ tool from Word that allows deposit to Zenity, ePrints and DSpace from Word itself.

The discussion then branched off to discuss a ‘pull’ model for deposit based on a Live Mesh / Dropbox model. This would entail nominating items as ‘suitable for deposit’ and then the system pulling them into the deposit automatically.

Next the topic moved on to whether it would be possible to just pull out a small piece of a greater whole item from the Cloud storage? The example given was that ‘web archive .war’ files are often massive and it would be useful to be able to identify a smaller part of that archive as it is rare that you need the entire thing.

This morphed into a discussion around the ability to unwrap ‘packages’ on the fly by offering computation services closely coupled with the Cloud storage and the fact that while it may be possible to offer common services to support the major package types for the most part this would be a custom development.

There was a brief – but interesting – discussion about the Memento project that uses ‘time based’ navigation to identify and offer access to web content based on a time in the past. It describes itself like this;

Remnants of the past Web are available, and there are many efforts ongoing to archive even more Web content. It’s just that the past Web is not as readily accessible as today’s. For example, if you want to see an archived version of http://cnn.com, you can go to the Internet Archive’s Wayback Machine and search for it there. Yes, you can find the CNN front page of 9/11 there.

Finally there was a discussion around Authentication/Authorisation/Access issues as related to Cloud offering and Brad was open in admitting that DuraCloud offers very basic security ‘out of the box’ and encourages their pilot projects (and future clients) to make their own decisions on what level of security they need. The aim is to not give a false sense of security.

There was a discussion around the issue of consistent naming of events from one platform to another to make the development of plug-ins more straightforward (currently each plug-in would require 3 wrappers). While it was agreed this would be great it was also seen as being hugely difficult for the time being – that said there was a commitment to open up more via APIs and exposing as many of the ‘primitives’ as possible.

Comments

2 Responses to “The Magic Word is Repcloud”

  1. [JISC] Weeknote 3/33 « Matt Jukes on March 6th, 2010 10:34 am

    [...] wrote only my second blogpost for the team blog this week – I really should be doing more but am slightly struggling to [...]

  2. Dépôts institutionnels: ressources (21/03/10) « pintiniblog on March 21st, 2010 9:47 pm

    [...] > Repositories and the Cloud Les présentations-vidéos (23/02/2010) Lire également http://infteam.jiscinvolve.org/2010/03/04/the-magic-word-is-repcloud/ [...]

Leave a Reply