Data Management Policy - An Interview with Paul Taylor
Dr. Paul Taylor works at the University of Melbourne and has just finished a 2 week secondment in the UK with the JISC-funded EIDCSR (Embedding Institutional Data Curation Services in Research) project based in Oxford. This is an approximate transcript of a quick 5 minute interview between Paul and Neil Grindley (JISC Information Environment Programme Manager)
NG
Hi Paul, thanks for sparing the time out of a very busy schedule … what role do you have in the EIDCSR project?
PT
Thanks Neil … I’m here to help them come up with a draft policy for the management of research data and records. It’s something we’ve had in place at the University of Melbourne since 2005 and we’ve just completed a revision of the policy to hopefully help make it a little more useful for researchers.
NG
Tell us a little bit more about how that policy has been developed at the University of Melbourne and the reactions to it from researchers and data managers.
PT
As I said, we’ve had policy in place since 2005 and early this year we were asked to work out how compliant we were with it, on the basis that if you have a policy and no-one pays any attention to it, its probably not much use keeping it there! Not surprisingly, we found out that most people weren’t compliant and also didn’t really know that the policy was there. We’re hoping that was the reason that they weren’t compliant rather than any sort of animosity against policies in general - but that’s still to be determined.
We reviewed the policy for two reasons: firstly to try and make it of more use to researchers (… there’s limits to that because when you are writing a policy to go across the institution, it has to contain really high level principles about the management of research data. If you get too specific you rule large populations out and then people pay even less attention to it than they did before). Secondly, its to get some attention and a bit of refocus on the data management area. There are a lot of things happening at the university at the moment in terms of the services that the university intends to provide for it’s researchers and some other changes in the Australian environment. We’re hoping to lock the high-level principles away in policy documentation and focus on keeping the guidance, information and support materials up to date and relevant for researchers.
NG
The sustainability of keeping that guidance and information for researchers up to date is a real issue. Capturing their feedback and working it back into future iterations of those materials (and ultimately the policy documentation) is a desirable outcome but also a big challenge isn’t it?
PT
Yes, it is.
NG
How do you think that the policy that you’ve developed in Melbourne transposes to the University of Oxford?
PT
That’s a good question … one of the things that we’ve learnt from the 2005 version of the policy is that its not enough to have the central policy on its own. There needs to be some kind of localisation of the policies and so with this new version of our policy we’ll be asking faculties to come up with their own enhancements so that it makes more sense to their researchers, and then probably get departments to do the same thing. I’d imagine the same sort of system could work at Oxford but it would be a little more complex with the number of people that would need to be involved in coming up with these localised versions of the policy. The hope is that there will be a trickle down effect from the high-level policies which have a practical influence on the way that researchers go about managing data.
In the meetings that I’ve had since I’ve been here, there have been some excellent examples of data managers and data management researchers (I guess you’d call them) who are working closely (one-on-one) with researchers who have come up with some excellent and novel solutions. I think the more that that can happen - a sort of resourcing at the coal face - then the more likelihood there is of high level principles trickling down to meet some of the very local one-on-one researcher-based developments. At that stage, perhaps there would be a general improvement in the management of research data across the institution.
One of the things I’ve heard a lot from people is the need for it to be a federated system. A lot of the departmental research groups have come up with their own systems for managing their own research data. Anything new that is provided centrally from the university has to try and complement those processes rather than take them over. That wouldn’t work well here (in Oxford) and it wouldn’t work in Melbourne. It would tend to antagonise people rather than improve the situation.
NG
Yes … that principle of embedding existing processes and workflows into broader policy initiatives is an important concept for institutions grappling with these kinds of issues at the moment. Thanks very much Paul.
PT
Thanks
University of Melbourne - Policy on the Management of Research Data and Records (2005)
http://www.unimelb.edu.au/records/research.html
Review of Policy on the Management of Research Data and Records (2009)
http://research.unimelb.edu.au/integrity/conduct/data/review
EIDCSR Project (Embedding Institutional Data Curation Services in Research)
http://eidcsr.oucs.ox.ac.uk/
#res3
EC Digital Libraries and Digital Preservation Call
I went to a meeting in Peter Mandelson’s basement the other day, otherwise known as the Department for Business Innovation and Skills just next to Westminster Abbey. Lord Mandelson (if you go up about 17 levels of management) is my boss, so it’s good to know where the orders are coming from.
Anyway … the meeting was a briefing day and a chance for the European Commission to explain a bit about the priorities and procedures that people should think about if they want to apply for funding for projects in the Digital Libraries and Preservation area (Formally referred to as FP7 ICT Call 6). The presentations are now available online at http://bit.ly/3oPGFe.
The headline issues that I took away from the meeting were …
The whole funding decision-making process takes nearly a year and is extremely competitive. If you are a small organisation that is simply looking for money … it probably isn’t for you! The commission will be evaluating proposals according to 3 main criteria:
1. Are they proposing something that is useful and is technically robust?
2. Will they be able to achieve their objectives?
3. What impact will the work have?
They are looking for effective collaborations. Consortiums must have a lead and at least 3 other partners. How many partners and where they come from is - contrary to popular belief - not that important! There have been rumours in the past that people needed to hook up with Eastern European partners, or Southern European partners, in order to get funding. This is a fallacy. You just need to demonstrate that your consortium will be effective. In fact, once you have your core group of at least 4 EU partners, additional partners (with appropriate expertise) can come from anywhere in the world.
It is not generally the job of a research organisation to know about marketing and exploiting products that are created as part of a research programme. Join up with an organisation who knows about this stuff! It’s important to get it right and sustainability is EXTREMELY important.
Think hard about what sort of project suits your proposal … The models on offer are:
IP’s = Integrating Projects. Large scale (Euros 6 - 12m … sometimes more). R&D work, concepts, methods, tools, systems, often many partners. Advancing the state of the art - producing solutions that are within 3-5 years of being marketable.
STREP’s = Small to Medium Targeted Research Projects. (Generally Euros 2-4m). Focusing on more specific research problems with outputs that might be 5-7 years away from being marketable solutions.
NoE’s = Networks of Excellence. Advancing knowledge and bridging technological domains
CA’s = Co-ordinating Actions. Helping to ensure synergy between EC funded work
SA’s = Supporting Actions. Helping to maximise the effectiveness and impact of EC funded work
Total funding available for this call - Euros 69m
IP’s and STREPS = Euros 56m
NoE’s and C/SA’s = Euros 13m
Strikes me that a lot of people will be thinking hard and talking to other people over the next 12 months to really try and grapple with some of the hard problems in the Digital Preservation area and that is going to have a marvellous impact on the amount and quality of proposals that might end up flowing towards JISC. I’m not saying we’ll mop up failed EC proposals!! … I’m simply saying this has to be good for generally raising our whole collective game in the relevant areas of research and development.
iPres 2009 - Preservation Infrastructure Track
In San Francisco at iPres sitting in the preservation infrastructure track.
Stephen Abrams (CDL) is telling us about micro-curation services. Lots of clear categorisation of types of services that institutions might require. Currently talking about storage requirements. Provide for safety through redundancy, meaning through context, utility through service. Rattling through too fast to capture detail.
Q. How do CDL services compare with iRods?
A. i-Rods are all part of one controlled environment. CDL Micro-services can run as small discreet functions
Pam Armstrong and Johanna Smith from Library and Archives Canada.
They have a trusted Digital repository project that is running from 2008-2010. They are showing a value management framework. The first concern is ’significance’. They are looking at government records and are trying to determine which records are important even before they arrive at the archive. Talking about a filtering process. Trying to deal with web 2.0 issues and are working on some guidelines.
They have established a records management task force with a high level of government support. A directive on recordkeeping is linked to a management accountability framework. If departments are found to be wanting with their records management function, they are denied the right to delete records. Good stick. There are functional requirements for EDRMS based on ISO. There is a proposed shared service for EDRMS for government info in Canada. They have built open source software eRTA for records managers. They have been working on metadata core set. They are using MODS and MARC and the info is discoverable by public. They have got to their summary already … my o my - these talks are quick!
The lessons learnt include the usefulness of the mandatory instrument that has consequences (see above).
Q. do you accept all formats?
A. No, they have acceptable formats. Can’t do all formats.
Q. How implemented is all of this?
A. The implementation is uneven. All the instances across govt are implemented inconsistently. They have got lots of work to do to bring the legacy information into line.
Robert Sharpe - Tessella
Representing PLANETS consortium. Title is “Are you Ready? Assessment of readiness of organisations for Digital Preservation”. (I’m interested in this talk. Wondering how this matches up with JISC-funded AIDA project). They did a survey. To establish whether people were ready to use Digital Preservation solutions. The target group for PLANETS is national libraries and archives. There are 96 of these in Europe. They also invited any other interested parties to contribute. They got 206 responses. 70% responses from Europe. They were a diverse community representing a range of roles.
15% digital preservation
16% in general preservation
22% curation
16% IT
also directors researchers data managers etc …
93% aware of DP challenges.
17% had not considered solutions.
52% did not have preservation policies.
They were 3 times more likely to have a DP budget if they had a DP policy in place. The majority had budgets to do capital activities. DP not really embedded in the institutions that responded still. What needs to be preserved? Stuff in file systems = 77% … many other categories going down to a long tail. National Libraries feel they have almost no control of the formats they have to accept. National Archives however claim high levels of control.
80% of organisations say they have less than 100TB to store in 2009. They think that by 2019, 70% orgs will have more than 100TB and 42% will have more than 1Pb. 85% have a solution or are working on one. They are generally expecting ‘plug and play’ components. That’s the trend and what people are expecting.
What functionality is important? Single most important function was that the repository must maintain authenticity, reliability and integrity of records. 17 different functions cited. Least important function is ‘checks for duplicate items’.Very little agreement on which standards should be used! (surprise surprise!) Of 13 standards on Robert’s chart, PREMIS in the middle in terms of who is using it already.
Summary …
Excellent start on getting DP message out
More work needed on policies and budgets
Wide range of types of digital info from range of sources
Significant quantities of data to preserve
Component-based solutions required
Best practice not yet clear
Early adopters are busy and planning to do more
Q. We are doing a good job with early adopters but what about the wider community. The success factor will be general users engaging with Digital Preservation
A. Yes
Q. The standards you showed, the figures are high for people not even having heard of them!
A. Yes.
End of session
Data in Nature
Finally got around to looking at the article on data that appeared on the Nature website last week.
http://www.nature.com/nature/journal/v461/n7261/full/461145a.html
Very nice to see JISC mentioned so positively in the editorial. They mention the Digital Curation Centre by name which is obviously one of the key pieces of support and infrastructure that JISC is funding to ensure that UK universities and colleges have access to advice and guidance in the handling and managing of research and other types of data.
Some other resources they didn’t have space to mention …The DCC (in collaboration with the Research Information Network) run the Research Data Manager’s Forum. This is a series of meetings that have brought a number of practitioners, funders and other stakeholders together to examine and discuss the issues facing data managers and curators.
http://data-forum.blogspot.com/
There is a mailing list available that is geared towards this community
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=RESEARCH-DATAMAN
There is a recent report (Nov 2008) that looks at the Benefits of Curating and Sharing Research Data.
http://www.jisc.ac.uk/publications/documents/databenefitsfinalreport.aspx
Another report (Jan 2009) looks at various national infrastructures enabling the sharing of data.
http://www.jisc.ac.uk/whatwedo/programmes/preservation/nationaldata.aspx
Earlier reports are available … one looking at the skills, roles and career structures that are required to support data scientists
http://www.jisc.ac.uk/publications/documents/dataskillscareersfinalreport.aspx
All of which build on a report from 2007 authored by Liz Lyon, “Dealing with Data”.
http://www.jisc.ac.uk/publications/documents/dealingwithdatareportfinal.aspx
The JISC Research Data Management programme is now in full swing and is in the process of starting 8 new major projects that will examine various aspects of Data Management Infrastructure. These projects will be supported by the DCC and other initiatives that will progress specific areas of complementary work (e.g.Tools).
http://researchdata.jiscinvolve.org/