Data Management Policy - An Interview with Paul Taylor
Dr. Paul Taylor works at the University of Melbourne and has just finished a 2 week secondment in the UK with the JISC-funded EIDCSR (Embedding Institutional Data Curation Services in Research) project based in Oxford. This is an approximate transcript of a quick 5 minute interview between Paul and Neil Grindley (JISC Information Environment Programme Manager)
NG
Hi Paul, thanks for sparing the time out of a very busy schedule … what role do you have in the EIDCSR project?
PT
Thanks Neil … I’m here to help them come up with a draft policy for the management of research data and records. It’s something we’ve had in place at the University of Melbourne since 2005 and we’ve just completed a revision of the policy to hopefully help make it a little more useful for researchers.
NG
Tell us a little bit more about how that policy has been developed at the University of Melbourne and the reactions to it from researchers and data managers.
PT
As I said, we’ve had policy in place since 2005 and early this year we were asked to work out how compliant we were with it, on the basis that if you have a policy and no-one pays any attention to it, its probably not much use keeping it there! Not surprisingly, we found out that most people weren’t compliant and also didn’t really know that the policy was there. We’re hoping that was the reason that they weren’t compliant rather than any sort of animosity against policies in general - but that’s still to be determined.
We reviewed the policy for two reasons: firstly to try and make it of more use to researchers (… there’s limits to that because when you are writing a policy to go across the institution, it has to contain really high level principles about the management of research data. If you get too specific you rule large populations out and then people pay even less attention to it than they did before). Secondly, its to get some attention and a bit of refocus on the data management area. There are a lot of things happening at the university at the moment in terms of the services that the university intends to provide for it’s researchers and some other changes in the Australian environment. We’re hoping to lock the high-level principles away in policy documentation and focus on keeping the guidance, information and support materials up to date and relevant for researchers.
NG
The sustainability of keeping that guidance and information for researchers up to date is a real issue. Capturing their feedback and working it back into future iterations of those materials (and ultimately the policy documentation) is a desirable outcome but also a big challenge isn’t it?
PT
Yes, it is.
NG
How do you think that the policy that you’ve developed in Melbourne transposes to the University of Oxford?
PT
That’s a good question … one of the things that we’ve learnt from the 2005 version of the policy is that its not enough to have the central policy on its own. There needs to be some kind of localisation of the policies and so with this new version of our policy we’ll be asking faculties to come up with their own enhancements so that it makes more sense to their researchers, and then probably get departments to do the same thing. I’d imagine the same sort of system could work at Oxford but it would be a little more complex with the number of people that would need to be involved in coming up with these localised versions of the policy. The hope is that there will be a trickle down effect from the high-level policies which have a practical influence on the way that researchers go about managing data.
In the meetings that I’ve had since I’ve been here, there have been some excellent examples of data managers and data management researchers (I guess you’d call them) who are working closely (one-on-one) with researchers who have come up with some excellent and novel solutions. I think the more that that can happen - a sort of resourcing at the coal face - then the more likelihood there is of high level principles trickling down to meet some of the very local one-on-one researcher-based developments. At that stage, perhaps there would be a general improvement in the management of research data across the institution.
One of the things I’ve heard a lot from people is the need for it to be a federated system. A lot of the departmental research groups have come up with their own systems for managing their own research data. Anything new that is provided centrally from the university has to try and complement those processes rather than take them over. That wouldn’t work well here (in Oxford) and it wouldn’t work in Melbourne. It would tend to antagonise people rather than improve the situation.
NG
Yes … that principle of embedding existing processes and workflows into broader policy initiatives is an important concept for institutions grappling with these kinds of issues at the moment. Thanks very much Paul.
PT
Thanks
University of Melbourne - Policy on the Management of Research Data and Records (2005)
http://www.unimelb.edu.au/records/research.html
Review of Policy on the Management of Research Data and Records (2009)
http://research.unimelb.edu.au/integrity/conduct/data/review
EIDCSR Project (Embedding Institutional Data Curation Services in Research)
http://eidcsr.oucs.ox.ac.uk/
#res3
Data in Nature
Finally got around to looking at the article on data that appeared on the Nature website last week.
http://www.nature.com/nature/journal/v461/n7261/full/461145a.html
Very nice to see JISC mentioned so positively in the editorial. They mention the Digital Curation Centre by name which is obviously one of the key pieces of support and infrastructure that JISC is funding to ensure that UK universities and colleges have access to advice and guidance in the handling and managing of research and other types of data.
Some other resources they didn’t have space to mention …The DCC (in collaboration with the Research Information Network) run the Research Data Manager’s Forum. This is a series of meetings that have brought a number of practitioners, funders and other stakeholders together to examine and discuss the issues facing data managers and curators.
http://data-forum.blogspot.com/
There is a mailing list available that is geared towards this community
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=RESEARCH-DATAMAN
There is a recent report (Nov 2008) that looks at the Benefits of Curating and Sharing Research Data.
http://www.jisc.ac.uk/publications/documents/databenefitsfinalreport.aspx
Another report (Jan 2009) looks at various national infrastructures enabling the sharing of data.
http://www.jisc.ac.uk/whatwedo/programmes/preservation/nationaldata.aspx
Earlier reports are available … one looking at the skills, roles and career structures that are required to support data scientists
http://www.jisc.ac.uk/publications/documents/dataskillscareersfinalreport.aspx
All of which build on a report from 2007 authored by Liz Lyon, “Dealing with Data”.
http://www.jisc.ac.uk/publications/documents/dealingwithdatareportfinal.aspx
The JISC Research Data Management programme is now in full swing and is in the process of starting 8 new major projects that will examine various aspects of Data Management Infrastructure. These projects will be supported by the DCC and other initiatives that will progress specific areas of complementary work (e.g.Tools).
http://researchdata.jiscinvolve.org/
“… to engage or not engage…” the choice for libraries.
A couple of weeks ago I attended the RLUK conference, their first conference and one that everyone there seemed to enjoy. Unfortunately I only made it for the last day for a slot where a panel of funders, policy bodies and service providers, including JISC, said a few words about priorities and partnership with others.
I did get to hear Lynne Brindley speak. She covered a lot of ground and most of what she said chimed with JISC priorities; albeit coming from a different set of organisational boundaries. Anyway I thought I’d just jot down what Lynne said as I think the issues she raised are well worth recounting here. I might’ve misinterpreted some things, especially since it was a while ago now but on the whole I think I’ve captured the main points.
In general she was referring to the fact that in the complex digital environment offering services that remain relevant and take advantage of what Lynne called “mass creativity” can be difficult. But she said the choice for libraries is “ to engage or not engage”. Unsurprisingly the message was to engage.
A summary of issues she raised:
• Developing digital information services does incur a cost. A lot of innovative projects have been developed but we have not yet fully tackled sustainability.
• Libraries should support innovative scholarship. We’re now in a complex world where the web is a platform of “mass creativity” but offers real opportunities for innovative scholarship. She referred to some examples where digitisation and making digital resources available have led to new knowledge.
• Libraries need to move well beyond the critical role they play in licensing and recognise that things like document supply are not as relevant as they once were.
• “life beyond the document” how should libraries respond to this?
• The research data question and the skills gap – we have data librarians but not enough of them; traditionally libraries are more orientated towards humanities.
• Masses of information of different types – blogs, email etc are all important to scholarship they are the ephemeral information of today; what are we doing about versions of works or notes and annotations? Think of authorship and how notes are kept of authors that enhance research.
• Many people use information in different ways, skim reading etc, therefore should delivery be different, does it matter that people use information differently? Information literacy does that matter? Should libraries be helping to equip people with the skills to make the right judgments?
• The researchers of the future (and quite a few researching now) come from the born digital age and will use information differently, so what is information literacy?
• Web archiving: the web is a huge resource that must be accessible into the future for research; the legal issues are a problem but hopefully legal deposit will make a difference.
• The value of the library can sometimes be summarised as: authenticity, authority and long-term use – what about authority v amateur?
• Digital preservation is very important – this has been seen as important at policy and government levels but now it is getting into the public conscience - this is when libraries start to have real success with these issues. Just tell someone that all those photos will not be accessible and they can relate to it.
• She ended on intellectual property (IP) and referred to the EU Green Paper on Copyright and how IP deserved attention and organisations, such as academic libraries, needed to take action so any risk of locking information down further was mitigated. She emphasised that without reasonable copyright exceptions there is a risk to democratic society.
A lot of these issues are being addressed by libraries and organisations like the British Library and JISC, for example we’re responding to the EU Green Paper on Copyright in the Knowledge Economy. But despite that all of the issues require further debate and change.
JISC is about to launch a collaborative initiative with SCONUL, RLUK, The British Library and RIN that builds on our Libraries of the Future campaign and that will seek to further understand and shape the position of libraries into the future. Watch this space…it should be announced shortly.
Research data curation
Back last year, following the Digital Curation Conference in Washington DC, JISC and the Andrew J Mellon Foundation hosted an international workshop to discuss and suggest where the international priorities are for research and development work supporting academic research data curation. It’s taken a while for the notes to become available, for which I apologise, but here they are:
Priorities for research data curation workshop 2007
(I realise this is a PDF file, which won’t please everyone, but shrunk the filesize by over an order of magnitude from MS Word)
The starting point for the workshop was a recognition that, while research data orients largely by (sub)discipline, the way in which infrastructure is developed and funded is often oriented nationally, or even around institutions. Some way is needed to square these two. I have to confess that, on the day, I wasn’t sure we’d made a lot of progress, but in drafting the notes I changed my mind somewhat. Certainly, Peter Murray-Rust seemed to identify the academic department infrastructure as a key point where intervention could serve both that department and the wider goal of data curation and sharing. The photos of flip chart diagrams are perhaps not easy to read or understand, but suggest a distinctive place for libraries and repositories.
Greg Crane’s Perseus project anticipated some of the topics that were covered later - notably how to design an infrastructure that is sustainable and yet adaptive - there are a few ideas in the notes. there are also a few ideas about how the problem space might be broken down so that an international approach can be taken, though this remains difficult. With luck and effort, JISC’s and other UK ‘data’ work will join up with that in the US (eg the NSF Datanet programme), Australia (Australian National Data Service), etc, and these notes will help us do that.
Many thanks to the workshop participants, listed at the end of the notes.
The costs of preserving research data
There’s a new report on the JISC website, authored by Neil Beagrie, Julia Chruszcz and Brian Lavoie. It looks at how much it costs to preserve research data and, perhaps as importantly, how institutions and others could calculate this. There are lots of reasons why this report is likely to have an impact - looking after research data is potentially costly, and yet it is important that - as a community - we make reasonable decisions about what should be preserved and how. Perhaps unsurprisingly (at least for those who already do this for a living), it seems the cost of ingesting the data forms the largest cost in the curation lifecycle, but at the same time the evidence shows that correcting badly ingested data later is even more costly, so the figures probably suggest that there is a positive cost/benefit calculation here. There is potential for developing the methodology here into a tool, and there could also be potential for some join-up with the Data Audit Framework.
Research data and the JISC IE
We’re hoping to present some themed web pages on the innovation work being funded under the JISC Information Environment area, including one on research data. I thought I’d use this blog to offer preview / pilot that page. I’m not sure if that’s an acceptable use of a blog, but I’m sure I’ll find out.
The aim of the JISC IE work on data is to promote and enable new ways of finding, using and sharing research data. Because there are huge variations in what ‘data’ is, and in disciplinary cultures and practices around it, there is likely to be a ‘mixed economy’ of infrastructure and services to support its management.
There has been a large number of reports on data recently, some of which are helpfully listed in a recent presentation by Michael Jubb of the Research Information Network. Three key documents are the report from the then Office of Science and Innovation on ‘e-infrastructure’, which set out a high-level vision, a set of principles for data stewardship developed by the Research Information Network, and the ‘Dealing with Data‘ report from JISC/UKOLN, which made practical recommendations.
In terms of current practice, two projects promise to paint a clear picture from different perspectives. A study of ‘data publication’ practice among researchers has been funded by JISC, the Research Information Network and the Natural Environment Research Council. A different project, SCARP, is exploring disciplinary attitudes and approaches to data deposit, sharing and re-use, curation and preservation.
JISC and the Engineering and Physical Sciences Research Council have jointly funded the Digital Curation Centre (DCC), which is a centre both of innovation and of guidance. Members of the DCC are developing a Data Audit Framework, which will enable universities to assess what data is being held on their computer systems, and who is responsible for it. The Data Audit Framework will be piloted in a number of universities in 2008.
There is a suspicion that the sector lacks sufficient skilled people to manage research data effectively. A report is due shortly that will review the position and make recommendations on how this might be addressed. The DCC will run a summer school this year to begin to address this issue. Of course, investment will only follow if a business case can be made, and a part of making that case is assessing the costs of preserving data. A methodology is being developed that will enable estimates to be made, though of course without assessing the benefits of keeping data, it is only half the story.
The UK is fortunate to have both the UK Data Archive (co-funded by JISC and the Economic and Social Research Council) and the data centres supported by the Natural Environment Research Council. These services offer expert advice and infrastructure for data management. A feasibility study is underway into the possibility of a UK Research Data Service as a collaboration between some UK universities, to fill in some of the gaps between such data centres. In addition, the DISC-UK Datashare project is looking at how UK higher education can increase its capacity to curate and share research data.
Finally, it is worth noting that JISC also funds work under the heading of ‘e-Research’, which is also focused on research data, including grid and semantic enabling of datasets.