Research Data Sharing without barriers…get involved?

Research Data Alliance (RDA) – second plenary meeting – Washington DC – 16th – 18th September.

Many readers of this blog will know about the Research Data Alliance already, but there will, I guess also be a lot of people that don’t. I am using this post as an introduction to the RDA – having this week been to Washington DC to attend the second plenary meeting of the organisation.

What with all of the interest and some urgency around research data publishing, management and re-use, at Government level, at university level, disciplinary level; and of course with an eye on research being global, there is a need to join the data up with shared practices, standards, policies and infrastructure. That’s where the RDA comes in.

Building on initiatives such as Data One in the US, the initiatives across Europe, such as the Jisc research data activity, that take place in many member states & have collectively informed the EC’s direction on research data infrastructure as part of the forthcoming Horizon 2020- and the Australian National Data Service, the RDA has been formed. It’s been formed to address the ‘joining-up’ challenges and to build a global community that can contribute to shared practice and ultimately a more sustainable way to build an infrastructure and the intersections required to support data-driven research and innovation.

The founding members from funding type agencies are the US National Science Foundation (working also with Chris Greer from NIST), the European Commission and the Australian National Data Service (ANDS) – over the past year these partners have carefully consulted and built a community that is global and encourages bottom up sharing and agreement. I have been to some prior gatherings, and had discussions with Ross Wilkinson from ANDS, Carlos Morais-Pires from the EC, Juan Bicarregui from STFC, and others; and witnessed their planning and progress. In Europe engagement is overseen by RDA Europe, Norman Wiseman from Jisc is on the Strategic Forum that oversees this on behalf of the Knowledge Exchange/KE (KE do alot of work on Research Data!). It’s a big ask – forming a structure that can collaboratively take on progressing the research data challenge. And I have to say the meeting this week in Washington demonstrated pretty impressive progress.

So in short over the past year a set of working groups and interest groups have been formed to collectively work on key issues, and Washington was really the first time that they were there face to face to develop their work – there was a first plenary meeting from the 18th -20th of March 2013 in Gothenburg, Sweden where the initiative was formally launched and groups started to form their case statement for work – but in Washington these groups were able to show early outcomes and to form firmer priorities and plans.

So what are they (we) working on ? it’s a long list [see here for the current list -https://rd-alliance.org/working-and-interest-groups.html]. Some of the areas that the groups are tackling: metadata & a metadata standards directory; legal interoperability; data citation; a community capability model; persistent identifiers; practical policy; data foundation & terminology; big data and analytics & more – including interest groups that cover some disciplinary areas – such as agriculture and history and ethnography.

This Alliance is forming – but from what I experienced in Washington it certainly has a lot of potential and should be an essential vehicle to research data interoperability. In Washington this week, following the group discussions there was a plenary update from all of them highlighting their priorities (given in the grand setting of the US National Academy of Sciences) and Mark Parsons, RDA/US Managing Director facilitated a discussion on the scope and ways of working. It was a really useful discussion; and one where I think there was consensus that RDA isn’t a standards body but more of a clearing house for best practice, standards and approaches. So if you’re interested join up? I think it is an important initiative that will help to address the organisational,social and technical infrastructure required for real research data sharing. Jisc, and the Digital Curation Centre (DCC) are engaged in the initiative and will continue to be so; and we will tie in UK activities as best we can so we can learn from others and also input the lessons and emerging practice from the UK so we get to that utopia …a global research data infrastructure (note:there are many UK participants already).

We will continue to give updates on progress to try and keep people in the loop. But if it is your bag – go ahead and join in the discussions. Currently there are 800 members from over 50 countries, and I can say from having been there this week it’s an impressive crowd…

Yes it is early days – but it’s important and thus far very positive. Looking forward to seeing more progress – I think there will be!

The value and impact of the British Atmospheric Data Centre (BADC)

Jisc[i] in partnership with NERC[ii]  have commissioned work to examine the value of impact of the British Atmospheric Data Centre (BADC). Charles Beagrie Ltd, the Centre for Strategic Economic Studies Victoria University, and the British Atmospheric Data Centre are pleased to announce key findings for the forthcoming publication of the results of the study on the value and impact of the British Atmospheric Data Centre (BADC). The study will be available for download on 30th September at: http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx

Key findings:

The study shows the benefits of integrating qualitative approaches exploring user perceptions and non-economic dimensions of value with quantitative economic approaches to measuring the value and impacts of research data services.

The measurable economic benefits of BADC substantially exceed its operational costs. A very significant increase in research efficiency was reported by users as a result of their using BADC data and services, estimated to be worth at least £10 million per annum.

The value of the increase in return on investment in data  resulting from the additional use facilitated by the BADC was estimated to be between £11 million and £34 million over thirty years (net present value) from one-year’s investment – effectively, a 4-fold to 12-fold return on investment in the BADC service.

The qualitative analysis also shows strong support for the BADC, with many users and depositors aware of the value of the services for them personally and for the wider user community.

For example, the user survey showed that 81% of the academic users who responded reported that BADC was very or extremely important for their academic research, and 53% of respondents reported that it would have a major or severe impact on their work if they could not access BADC data and services.

Surveyed depositors cited having the data preserved for the long-term and its dissemination being targeted to the academic community, as the most beneficial aspects of depositing data with the BADC, both rated as a high or very high benefit by around 76% of respondents.

The study engaged the expertise of Neil Beagrie of Charles Beagrie Ltd and Professor John Houghton of Victoria University, to examine indicators of the value of digital collections and services provided by the BADC.

The findings of this study are relevant to the community attending the conferences below hence the announcement.

13th EMS Annual Meeting & 11th European Conference on Applications of Meteorology (ECAM) | 09 – 13 September 2013 | Reading, United Kingdom
http://www.ems2013.net/

2013 European Space Agency Living Planet Symposium
http://www.livingplanet2013.org/

The British Atmospheric Data Centre (BADC)
The BADC, based at the STFC Rutherford Appleton Laboratory in the UK, is the Natural Environment Research Council’s (NERC) Designated Data Centre for the Atmospheric Sciences. Its role is to assist UK atmospheric researchers to locate, access, and interpret atmospheric data and to ensure the long-term integrity of atmospheric data produced by NERC projects. There is also considerable interest from the international research community in BADC data holdings.


[i] http://www.jisc.ac.uk/
[ii] http://www.nerc.ac.uk/

Performance and Measurement in Libraries

In his article in the New York Times, Robert Crease wrote:

We look away from what we are measuring, and why we are measuring, and fixate on the measuring itself.

For libraries, so used to collecting, managing and analysing various sets of data and metrics, this is a critical point.

It is also a sentiment that kicked off the 10th Northumbria conference on Performance Measurement in Libraries held in York earlier this week.

Elliot Shore from ARL (Association of Research Libraries) spoke about the need for libraries to take heed of this advice: To focus on the ‘fit’ of what we’re measuring. 

This fit, as Shore calls it, has been evolving over the past 10 years as the role and presence of the library has changed. The digital environment and changing technologies and expectations of users means that what was once important to measure and capture may no longer have the same urgency. 

This focus on what should be measured – and how it impacts on the role and shape of the library – was developed in a great talk by Margie Jantti at the University of Wollongong in Australia.

Margie talked about the constant flow of information and data that her staff (relationship managers) get from the researchers and academic staff, which is used to tailor services and focus resources on priority services. This has seen the library develop expertise in publication support for researchers by the library.

The large knowledgebase of data the library collects on its users enables it to punch far above it’s weight: helping develop a fast; agile and world-class library team.

Finally, one thing that emerged from a majority of the presentations during the conference was the increasing recognition that data and metrics from inside or about the library were no longer enough. The field from which the data and metrics is harvested is growing, and reaching further beyond the library. Into the teaching and learning space through to research, registry and student services and beyond.

The idea that library performance and measurement requires only data from the library – or within the immediate vicinity of the library – is no longer an option.

So, it was against this background that the Library Analytics and Metrics Project (LAMP) presented at the conference.

We provided some of the background to the project (where it has come from and the work that has led us to this point) and provided an overview of the work so far and how you can get involved and follow the progress of the project.

For me, what’s really interesting, is that LAMP has the potential to bring in data from across the institution (and beyond) to help inform decision making and how and where resources are allocated. It also takes away the burden of collecting the data and provides the space for libraries to act on the data, and to think strategically about what they want and should be measuring and analysing.

The conference was also useful in bringing to my attention LibQual, and the potential for LAMP to work with that data too (although this may be something for further down the development pipeline).

You can find a link to our presentation here. At the end are some ways that you and your library can get involved – so do feel free to get in touch.

lamp

Observing the Web

Like a lot of people, when I think about it, or when I’m reminded about it, I understand that the Web is a place where someone is always watching what you do. I understand that … but then I think, well … the Web is such a huge beast; such a vast ocean; such a giant metropolis where the comings and goings of individuals are insignificant. How and why would anyone notice what I’m looking at and which links I’m clicking on?

Then up pops Tom Barnett from Switch Concepts Ltd. at a meeting yesterday to tell us that ‘Google has a file the size of an encyclopedia on everyone in this room.’

Hmmm … that’s not a particularly comfortable idea for someone to put in your head. I start to feel a vague sense of paranoia creeping through my mind.

And then I think, c’mon Neil, pull yourself together! Google really doesn’t care who you are. They just want to put things in your line of sight that are more rather than less likely to get you to open your wallet and part with your wages!!

Such were the thoughts that were buzzing around my head yesterday at an event organised by the Web Science Trust (http://webscience.org).

The meeting was entitled ‘Observing the Web’ and the purpose was to highlight some of the work that the Web Science Trust and their partners and collaborators are doing to build a global network of Web Observatories providing an open analytics environment to drive new forms of Web research. We went round the room doing introductions and Dame Wendy Hall ended up branding us a ‘motley crew’. Academics, industry players, not-for-profits, technologists, funders, charities, a lawyer. (Quite a respectable looking motley crew in the very smart surroundings of the Royal Society I might add). But ‘motley crew’ felt about right for a topic and a collaborative, academic, open activity that is still exploring the territory and testing new ground. Presumably in contrast to the well-resourced, sophisticated and highly developed (but opaque) methods employed by the corporate observers of the Web (Facebook, Amazon, Google, Microsoft, Yahoo etc.).

The point of all of this ‘observing’ is not to try and take account of every little bit of data and content on the web, but rather to understand what the aggregated use of the Web can tell us; how trends and fashions and changes of behaviour in relation to the Web might illuminate aspects of our society and culture, both now and for future students and researchers.

This was all of great interest to Jisc. We are currently working with the British Library, the Oxford Internet Institute and the Institute of Historical Research on an initiative that aligns very well with the notion of the Web Observatory.

The Big Data project (http://www.oii.ox.ac.uk/research/projects/?id=88)

and

the AADDA project (http://www.history.ac.uk/projects/digital/AADDA)

are both using a copy of the Internet Archive’s collection of UK domain websites collected over the period 1996-2010, to examine new ways to engage with the web at domain level, and develop new forms of research that leverage the scale of the web. As the name of the Oxford project says … it’s all about using ‘Big Data’.

This was work that emerged from influential JISC-funded reports commissioned in 2010 -

Researcher Engagement with Web Archives
http://www.jisc.ac.uk/whatwedo/programmes/preservation/researcherengagementwithWA.aspx

As we heard at the meeting, the academic observatory is a very different proposition to the corporate observatory and comes with enormous challenges including: interoperability (how do we link observatories?); access (asides from Twitter which of the big corporates will let us use their data?); privacy (will people feel spied upon?); and sustainability (what is the business model?).

A fascinating meeting and big topic. There will be more discussion in Early May at the ACM Web Science Meeting in Paris.

http://www.websci13.org/registration/

 

 

Data-Driven Library Infrastructure: UKSG 2012 Presentation

Below is a copy of the plenary presentation I gave to the UKSG conference 2012. I have also included a much reduced transcript of the talk to provide some context to the slides.

My presentation was about looking at library services and systems from a data-centric point of view. Specifically, it was about the potential that library data has for the creation of new services and improved systems.

This isn’t a radically new vision – indeed the idea of data-driven is something that seems all pervasive at the moment (data-driven journalism etc). Rather it is a way to refocus, or possibly to re-align our thinking so what may appear problems at the present are viewed as new opportunities.


There is also a video of the presentation available:

I began my presentation with a video. The video was made by University of Lincoln students without formal permission from the university and upoaded to YouTube.

Incredibly it got over 2.1 million hits. Is this the most watched University recruitment film ever!? Even more impressive is that it will be watched by exactly the demographic Lincoln would want to appeal to – young people. Lincoln recognised the potential of the film, and officially branded it and it is now a part of their advertising.

So, I think the film highlights nicely the three main themes of my presentation:

  1. Situating services and infrastructure within the wider ecosystem (this might be institution; community; society etc) – allow innovation to flourish anywhere, and ensure you’re in a position to take advantage of it;
  2. Redistributing effort – focus on the services that have an impact for users, ensure you have the talent to recognise those emergent opportunities and embrace them;
  3. Covering all eventualities: Future proofing –  become agile and more entrepreneurial. The barriers for students creating the video were incredibly low: flip cam and youtube. Barriers to students using library data should be low too
Why is this so important to libraries…? Well, I think there are three compelling reasons why libraries should take a data-centric approach to their systems and services.

1. Ecosystem

Taking a data centric approach enables the library to affect the entire ecosystem that they inhabit.

Focusing on the data forces us to think about the other sources of important data within the institution: the Repository, VLE’s, student records etc.  The wider data ecosystem becomes evident, and the potential of the data underpinning those systems can be realised.

A really good example of this is the Discovery work that’s currently being undertaken by JISC and RLUK and Mimas at the Uni of Manchester. Discovery’s aim is to provide a metadata ecology’ for UK education and research – and it does this by focusing on open and accessible data.

When you start to think like this you realise there is incredibly rich and important metadata describing content outside of HE libraries – museums, galleries, archives, museums. Researchers and students want this content as much as anything in the library – so why wouldn’t you include that?

What was largely hidden or difficult to find becomes visible. The possibilities of those large, cross-sector discovery tools can be realised, as well as those small, un-thought of possibilities for individual researchers to create their own unique discovery tools, searching a corpus of data they curated and specific t their research.

What happens, suddenly, is the data ecosystem starts to mingle with the human ecosystems libraries are inevitably a part of. The free flow of data provides the fertile ground for new ideas and services to grow – Innovation is allowed to flourish everywhere on campus – not just within the confines of the traditional walls of the library.

Libraries and their institutions need to ensure an environment where this flourishing of innovation can happen,  and that there are the right skills and people to recognise those opportunities, and help develop further the ideas and prototypes.

2. Effort

If you think only of shared services as a way to reduce effort then you lose the ability to respond and build on the opportunities that may emerge from your fertile data ecosystem. It’s not of reduction of effort, but of redistribution of effort.

The aim is to reduce the effort on those chore jobs – admin, back office functions – critical, but not what the user sees as having an impact. This refocus allows you to redistribute effort to the core services you provide.

Shared services such as JUSP and Raptor are great examples of how you can stop doing the administration, and use those services to provide you with the data to make changes.  

These shared services also demonstrate the way data begets data: the way data is used produces more data, that enables better understanding of how the data is used and how it can be improved.

Data seems to bestow the need for iterative thinking – to constantly revisit, act, think, revisit.  Providing a virtuous circle of data, action, data.

Another example is Knowledge base+: a shared academic knowledge base for electronic resources – a great example of how you enable libraries to do something once and share it with everyone so they don’t have to repeat it locally. KB+ also recognises that the innovative services built on top of data do not necessarily have to be undertaken by the project itself, but can emerge from the community as well as third parties and commercial suppliers.

One interesting aspect of KB+   is it’s focus on data means that the use cases for it can develop over time. While the focus at the moment is on e-resources and their management, it may enable innovations around inter-library sharing of content, collection management etc.

Indeed, the use cases can quite happily shift from individual institutions:from those that envisage a use-case as an ERM, to those who see it as a backup for their local holdings and helping facilitate easy movement between external systems.

As librarians we’re very aware that the past needs protection from the future; but we need to recognise that the future needs to be protected from the past – I don’t know what will be needed in a few months time; but it probably won’t be the thing I think it will be based on my past experiences.

3. Eventualities

So, this leads me onto my third E! How a data-driven approach ensures that services and systems are Future proofed.

This is about libraries being able to be, at a fundamental level, more entrepreneurial. I want to pick up on an earlier point about the iterative imperatives of data: data – action – data: the process is one similar to how a small start up company might work.

There are some wonderful examples of libraries playing with this kind of innovative approach.  At the University of Huddersfield they have taken the generally unappealing library circulation data and turned it into a game for students:  Lemon Tree.

The library experience is gamified, and in a way that engages students and enhances their experience.

Supporting this kind of thinking are technical infrastructures like the JISC Elevator: A platform for new ideas to be posted, and for members of the community to vote on them and then for JISC to potentially fund.

This is about agility, and moving quickly.

This is again where the importance of redistribution of effort is so essential – the negotiation between shared above campus services and local capabilities is incredibly important. The negotiation between what is shared and kept local defines the institution, and how effective they can be in meeting the rapidly changing needs of their students and researchers.

As libraries begin to understand and curate their own data effectively it begins to demonstrate the libraries potential role in an increasingly data-driven academic environment.

As data management and curation become the next big problem for institutions, libraries can position themselves as the experts.

Shared Academic Knowledge Base (KB+) – Library Directors event

Yesterday saw the shared academic knowledge base (KB+) briefing day for approx. 60 library directors and senior managers take place in London, at the Wellcome Trust.

The project, known as KB+, is developing a shared community service that will improve the quality, accuracy, coverage and availability of data for the management, selection, licensing, negotiation, review and access of electronic resources for UK HE.

The aims of the day were to:

  • Provide an update on the progress of the shared academic knowledge base project;
  • Surface and share some of the questions, concerns and ideas participants may have about the project and the management of electronic resources in general;
  • To let participants know what will be happening next with the project and how you can get involved if you would like.

The day began with Ben Showers (JISC Programme Manager) providing some context to the work and situating the project within the wider subscriptions management landscape. The presentation can be found here: Shared Academic Knowledge Base: Context and Landscape

Liam Earney, the project lead from JISC Collections, then went on to outline the vision and approach that the project will be adopting, as well as providing the participants with an idea of what will happen next and how, if institutions wish, they can get involved.
Liam’s presentation can be found here:  shared academic knowledge base: Approach and Vision

The meeting engendered a large amount of discussion about the project, with participants freely sharing concerns, ideas and possible solutions to some of the issues that surfaced.

Extensive notes were taken from the Q&A sessions to help inform the project, but instead of repeating verbatim the questions and answers I have tried to highlight some of the themes that emerged during the meeting below.

Themes

A number of themes emerged during the day and, while this is not an exhaustive list, these are some of the recurring or critical issues that were surfaced:

Transformation of current practice

It was acknowledged that this project was potentially transformative; it has the potential to change what might be termed the bread and butter of library work.  Therefore its impact on the community, and how it works, could be significant.

This means that the community, from senior managers to practitioners and beyond will be keenly interested in the developments and the project will need to build trust and facilitate the involvement of the whole library community. Which brings me on to another of the days themes:

Communication

This was a theme that seemed to surface at regular intervals during the day. There was a clear message that the project needs to be able to communicate regularly with the library community on both progress and developments as they take place.  This might manifest itself in a newsletter such as that employed by the Discovery programme, or utilising existing communication channels from JISC, JISC Collections and other sector bodies (or indeed a combination).

The combination of communications channels is also important given the range of stakeholders interested in the developments, from commercial vendors and publishers to librarians in the UK and internationally.

Under this theme there were issues surrounding how the communication channels would allow for more interactivity than might otherwise be usual in a JISC funded project given both the high profile nature of the project, as well as the need for ongoing community engagement in the work.

Engagement

Closely related to communications was the topic of engagement.

Specifically a lot emerged on how the community, especially ERM librarians and similar, could be engaged in the project in a useful and meaningful way.

In his presentation Liam made it clear that the project hopes to ‘recruit’ a number of embedded librarians where the project will pay for a proportion of their time to work on the project.  It was made clear by the participants, however, that it would need to be made clear what expectations any involvement might have, from the skill levels and expertise of the person, through to the time length they might be involved.

Clarity on these issues would be key to maintain sector engagement.

It was also suggested there might be the need for something like an ‘advocacy pack’ so that library directors had the arguments to convince senior staff of the benefits of engaging with the project.

An interesting sub-theme within engagement was the power of the institutions themselves to help engage with the commercial companies and organisations they work with to put pressure on them to both work with the project as well as implement the recommendations and standards the project might recommend.

The message was clearly that this was a partnership.

Collaboration and leveraging other work

It was expressed a number of times how important it will be for the project to leverage this work and funding with other initiatives and projects that can help the KB+ project deliver its outputs.

It was acknowledged how much work was currently taking place around this area, such as national projects such as KBART , TERMs and JISC funded projects such as the journal usage statistics portal and e-journal archiving work including Peprs and the entitlement registry, as well as international projects such as the Open Library Environment.

This helped reinforce the projects own ambitions of engaging with, and where possible working with these complimentary projects and initiatives.

Sharing problems

This is a shared service, but it will be important that when issues are surfaced by an individual institution, or indeed a problem is resolved by someone, that the whole community can be made aware of this.

What tends to happen now is a problem will be reported to a supplier and that problem is then normally resolved, but no one other than the originating institution knows about this.

Further points of discussion

There were a number of other points of discussion including:

  • The potential conflict between aiming for quality of data and ensuring its timeliness.  It’s essential that quality doesn’t impact on the ability of libraries to deliver services to users as and when they want them.
  • Print subscriptions: The briefing day concentrated largely on electronic resources, but it was clear that participants wanted to see print incorporated in the work.  The project is taking a unified approach so this won’t be an issue, although electronic will remain the focus for much of the work.
  • The Identifiers elephant! It was clear that participants also felt that the issue of how the project deals with identifiers (be those institutional, journal title etc) will be a critical.
  • Decision making and workflows will be two potential aspects of the final service, but it is important to recognise that a focus on the decision making components that the service will deliver could help strengthen potential business models, and demonstrate real value to institutions.

As the event demonstrated, there will be a lot more work going on over the next few months to get the project into a position where it can successfully transfer to a service.

In the meantime, this won’t be the last you’ll hear of the project, with plans already in place to start communicating and engaging with the community over this important shared service.

If you would like to find out more about the project, or have any questions them please feel free to contact Liam Earney at JISC Collections.

Show us something cool

Recently the library, museum and archive world has taken to experimenting with open data with a vengeance. It seems an interesting new dataset is released under an open licence most weeks.

There are many motivations behind these data releases but one of the major ones is the hope that someone else will think of something cool to do with the data (to mangle a Rufus Pollock quote).

Well, all you someone elses are in luck. The JISC Discovery programme and the DevCSI project are running a competition to see what clever people can do with this open data.

The rules of the competition are laid out in detail on the Discovery site but in essence all that’s needed to enter the competition is to develop something using one of 10 recommended datasets. You can use other datasets too but you have to do it in conjunction with one or more of the 10 datasets listed on the Discovery site.

I’m probably revealing my nerdy librarian hand here but the 10 datasets are really rich and exciting:

  • There is library data from the British Library, Cambridge and Lincoln
  • There is archives data from the National Archives and the Archives hub;
  • Museum data from the Tyne and Wear Museums collections
  • English Heritage places data
  • Circulation data from a few UK university libraries
  • The musicnet codex
  • And search data from the OpenURL router service

Details on all of these are listed on the Discovery site.

There are 13 prizes to be won so there is every incentive to enter even if you are somehow able to resist the siren call of all that exciting data!

The competition is open now and closes on the 1st of August.

Exploiting institutional activity data

JISC is currently funding a range of projects that investigate how data stored in institutional systems can be mined to gain insights into the way that university services are operating and use those insights to improve the services. These projects are spread across two programmes, the activity data programme and the business intelligence programme. There are a few other projects working in similar areas spread across other programmes.

Last week we took the opportunity to bring most of these projects together to discuss their various approaches and to think about what else JISC can do to help universities make the most of their activity data.

We started the event with lightning talks from each project attending. I suspect faithfully listing all of those projects here would probably make for a gruelling reading experience. So instead I’ll group them into the broad motivations the projects are pursuing. Some projects fall under more than one category. I have linked to the presentations the projects gave, if I do not have the slides I have linked to the project website.

Some projects are reusing data about user behaviour to provide new or enhanced user experiences
RISE, Salt, AEIOU,

Some are mining data to gain insights into behaviour of people or systems in the institution to allow better resource allocation and intervention at crucial periods
LIDP, Supporting institutional decision making, Bringing corporate data to life, Lumis, IN-GRiD, Retain, Student Engagement Traffic Lighting

Others are visualising data to explore its meaning
LIDP, AGtivity, Bringing corporate data to life

A few are thinking about how various silos of data can be brought together to allow them to be mined for insight
UCIAD, Supporting institutional decision making, Bringing corporate data to life

A couple are looking at national services to help institutions explore data more easily or to enable reuse of existing data in novel ways
Using openURL router data, JUSP

The projects cover a vast range of areas from libraries to student management to environmental monitoring. Despite this breadth there are some common issues. These are the issues that jumped out at me on the day:

  • Not all institutions have people with the technical skills and the statistical skills required to manipulate and analyse data .
  • A lot of these datasets are large and that brings up issues of how to store and manipulate the data and how do you decide what to retain.
  • Institutions might need to take an institution wide strategic approach to deciding what data should be collected, how it should be exploited and by who. There also needs to be long term strategic approaches to data exploitation in departments like libraries.
  • Working across different silos of activity data is a problem we are only just beginning to face.
  • Language is an issue. Throughout this blog post I have used the term activity data, but this is a generic term we need to be clearer about what type of data we are talking about.

To help ensure that others can benefit from the lessons the projects learn there are synthesis projects for the activity data programme and the business intelligence programme. The purpose of the synthesis projects is to gather information on key issues and turn them into advice and guidance than anyone in the HE sector can use to inform the way they do things at their institution. Infonet are producing a business intelligence infokit for their synthesis project. The university of Manchester and Sero Consulting are producing the activity data synthesis. You can read the activity data synthesis blog which talks about progress so far and can also see a mindmap that describes the areas their final website will cover.

We ended the day with a discussion of the possible ways that JISC could look to address some of these issues. We produced a long list of very good ideas. We also had a go  at prioritising which were the most pressing or valuable. Our top 6 ideas were:

  1. Developing guidance for institutions on taking a strategic approach to exploiting activity data
  2. Addressing the need for new skills for exploiting activity data both from a technical perspective and from a statistical skills perspective
  3. Establish clear definitions for terms used – this could include a simple glossary and use of examples to illustrate the terms
  4. Developing a culture of exploiting data in institutions
  5. Exploring what’s involved in ensuring data is easy to reuse
  6. Study behaviour and how it relates to usage patters

JISC won’t be able to address all of these issues straight away, so my colleague Myles Danson and I will have to decide which we focus on. Comments and advice would be very welcome indeed!

After the meeting Mathieu from the UCIAD project wrote a very interesting blog post about the need to take a user centered approach to activity data so that’s something we’ll need to consider too.

Gaining business intelligence from user activity data

Business intelligence resulting from ‘user activity data’ could help universities to manage resources more efficiently, budget more effectively, make smarter purchasing decisions, improve their services and demonstrate impact.  The likes of Amazon and Tesco use activity data to make business decisions and to also provide recommender services on top of this data.  They know a lot about thier customers’ purchasing habits. How does this translate to the academic environment? Are we able to utilise this use data for our benefit?  This is a very timely and exciting area of investigation for the sector.

We invite you to attend a workshop exploring the potential of this data which is derived from services such as library systems, virtual learning environments and student registries.

As well as informing people of the potential of this data, we are looking to generate ideas and use cases to help plan for future work in this area.  So please come ready to participate – I promise it will be an interesting day!

This workshop is suitable for senior managers and practitioners working in libraries, teaching and research. It will be chaired by Professor David Baker, Deputy Chair of JISC, with contributions from practitioners who have practical experience of using user activity data in higher education.

Date: 14 July 2010 

Venue: The Hatton, 51-53 Hatton Garden, London, EC1N 8HN

For more information and registration go to: http://www.jisc.ac.uk/events/2010/07/businessintelligence.aspx

Data Management Policy – An Interview with Paul Taylor

Dr. Paul Taylor works at the University of Melbourne and has just finished a 2 week secondment in the UK with the JISC-funded EIDCSR (Embedding Institutional Data Curation Services in Research) project based in Oxford. This is an approximate transcript of a quick 5 minute interview between Paul and Neil Grindley (JISC Information Environment Programme Manager)

NG
Hi Paul, thanks for sparing the time out of a very busy schedule … what role do you have in the EIDCSR project?

PT
Thanks Neil … I’m here to help them come up with a draft policy for the management of research data and records. It’s something we’ve had in place at the University of Melbourne since 2005 and we’ve just completed a revision of the policy to hopefully help make it a little more useful for researchers.

NG
Tell us a little bit more about how that policy has been developed at the University of Melbourne and the reactions to it from researchers and data managers.

PT
As I said, we’ve had policy in place since 2005 and early this year we were asked to work out how compliant we were with it, on the basis that if you have a policy and no-one pays any attention to it, its probably not much use keeping it there! Not surprisingly, we found out that most people weren’t compliant and also didn’t really know that the policy was there. We’re hoping that was the reason that they weren’t compliant rather than any sort of animosity against policies in general – but that’s still to be determined.

We reviewed the policy for two reasons: firstly to try and make it of more use to researchers (… there’s limits to that because when you are writing a policy to go across the institution, it has to contain really high level principles about the management of research data. If you get too specific you rule large populations out and then people pay even less attention to it than they did before). Secondly, its to get some attention and a bit of refocus on the data management area. There are a lot of things happening at the university at the moment in terms of the services that the university intends to provide for it’s researchers and some other changes in the Australian environment. We’re hoping to lock the high-level principles away in policy documentation and focus on keeping the guidance, information and support materials up to date and relevant for researchers.

NG
The sustainability of keeping that guidance and information for researchers up to date is a real issue. Capturing their feedback and working it back into future iterations of those materials (and ultimately the policy documentation) is a desirable outcome but also a big challenge isn’t it?

PT
Yes, it is.

NG
How do you think that the policy that you’ve developed in Melbourne transposes to the University of Oxford?

PT
That’s a good question … one of the things that we’ve learnt from the 2005 version of the policy is that its not enough to have the central policy on its own. There needs to be some kind of localisation of the policies and so with this new version of our policy we’ll be asking faculties to come up with their own enhancements so that it makes more sense to their researchers, and then probably get departments to do the same thing. I’d imagine the same sort of system could work at Oxford but it would be a little more complex with the number of people that would need to be involved in coming up with these localised versions of the policy. The hope is that there will be a trickle down effect from the high-level policies which have a practical influence on the way that researchers go about managing data.

In the meetings that I’ve had since I’ve been here, there have been some excellent examples of data managers and data management researchers (I guess you’d call them) who are working closely (one-on-one) with researchers who have come up with some excellent and novel solutions. I think the more that that can happen – a sort of resourcing at the coal face – then the more likelihood there is of high level principles trickling down to meet some of the very local one-on-one researcher-based developments. At that stage, perhaps there would be a general improvement in the management of research data across the institution.

One of the things I’ve heard a lot from people is the need for it to be a federated system. A lot of the departmental research groups have come up with their own systems for managing their own research data. Anything new that is provided centrally from the university has to try and complement those processes rather than take them over. That wouldn’t work well here (in Oxford) and it wouldn’t work in Melbourne. It would tend to antagonise people rather than improve the situation.

NG
Yes … that principle of embedding existing processes and workflows into broader policy initiatives is an important concept for institutions grappling with these kinds of issues at the moment. Thanks very much Paul.

PT
Thanks

University of Melbourne – Policy on the Management of Research Data and Records (2005)
http://www.unimelb.edu.au/records/research.html

Review of Policy on the Management of Research Data and Records (2009)
http://research.unimelb.edu.au/integrity/conduct/data/review

EIDCSR Project (Embedding Institutional Data Curation Services in Research)
http://eidcsr.oucs.ox.ac.uk/

#res3

Next Page →