Like a lot of people, when I think about it, or when I’m reminded about it, I understand that the Web is a place where someone is always watching what you do. I understand that … but then I think, well … the Web is such a huge beast; such a vast ocean; such a giant metropolis where the comings and goings of individuals are insignificant. How and why would anyone notice what I’m looking at and which links I’m clicking on?
Then up pops Tom Barnett from Switch Concepts Ltd. at a meeting yesterday to tell us that ‘Google has a file the size of an encyclopedia on everyone in this room.’
Hmmm … that’s not a particularly comfortable idea for someone to put in your head. I start to feel a vague sense of paranoia creeping through my mind.
And then I think, c’mon Neil, pull yourself together! Google really doesn’t care who you are. They just want to put things in your line of sight that are more rather than less likely to get you to open your wallet and part with your wages!!
Such were the thoughts that were buzzing around my head yesterday at an event organised by the Web Science Trust (http://webscience.org).
The meeting was entitled ‘Observing the Web’ and the purpose was to highlight some of the work that the Web Science Trust and their partners and collaborators are doing to build a global network of Web Observatories providing an open analytics environment to drive new forms of Web research. We went round the room doing introductions and Dame Wendy Hall ended up branding us a ‘motley crew’. Academics, industry players, not-for-profits, technologists, funders, charities, a lawyer. (Quite a respectable looking motley crew in the very smart surroundings of the Royal Society I might add). But ‘motley crew’ felt about right for a topic and a collaborative, academic, open activity that is still exploring the territory and testing new ground. Presumably in contrast to the well-resourced, sophisticated and highly developed (but opaque) methods employed by the corporate observers of the Web (Facebook, Amazon, Google, Microsoft, Yahoo etc.).
The point of all of this ‘observing’ is not to try and take account of every little bit of data and content on the web, but rather to understand what the aggregated use of the Web can tell us; how trends and fashions and changes of behaviour in relation to the Web might illuminate aspects of our society and culture, both now and for future students and researchers.
This was all of great interest to Jisc. We are currently working with the British Library, the Oxford Internet Institute and the Institute of Historical Research on an initiative that aligns very well with the notion of the Web Observatory.
The Big Data project (http://www.oii.ox.ac.uk/research/projects/?id=88)
the AADDA project (http://www.history.ac.uk/projects/digital/AADDA)
are both using a copy of the Internet Archive’s collection of UK domain websites collected over the period 1996-2010, to examine new ways to engage with the web at domain level, and develop new forms of research that leverage the scale of the web. As the name of the Oxford project says … it’s all about using ‘Big Data’.
This was work that emerged from influential JISC-funded reports commissioned in 2010 -
Researcher Engagement with Web Archives
As we heard at the meeting, the academic observatory is a very different proposition to the corporate observatory and comes with enormous challenges including: interoperability (how do we link observatories?); access (asides from Twitter which of the big corporates will let us use their data?); privacy (will people feel spied upon?); and sustainability (what is the business model?).
A fascinating meeting and big topic. There will be more discussion in Early May at the ACM Web Science Meeting in Paris.
Below is a copy of the plenary presentation I gave to the UKSG conference 2012. I have also included a much reduced transcript of the talk to provide some context to the slides.
My presentation was about looking at library services and systems from a data-centric point of view. Specifically, it was about the potential that library data has for the creation of new services and improved systems.
This isn’t a radically new vision – indeed the idea of data-driven is something that seems all pervasive at the moment (data-driven journalism etc). Rather it is a way to refocus, or possibly to re-align our thinking so what may appear problems at the present are viewed as new opportunities.
There is also a video of the presentation available:
I began my presentation with a video. The video was made by University of Lincoln students without formal permission from the university and upoaded to YouTube.
So, I think the film highlights nicely the three main themes of my presentation:
- Situating services and infrastructure within the wider ecosystem (this might be institution; community; society etc) – allow innovation to flourish anywhere, and ensure you’re in a position to take advantage of it;
- Redistributing effort – focus on the services that have an impact for users, ensure you have the talent to recognise those emergent opportunities and embrace them;
- Covering all eventualities: Future proofing – become agile and more entrepreneurial. The barriers for students creating the video were incredibly low: flip cam and youtube. Barriers to students using library data should be low too
Taking a data centric approach enables the library to affect the entire ecosystem that they inhabit.
Focusing on the data forces us to think about the other sources of important data within the institution: the Repository, VLE’s, student records etc. The wider data ecosystem becomes evident, and the potential of the data underpinning those systems can be realised.
A really good example of this is the Discovery work that’s currently being undertaken by JISC and RLUK and Mimas at the Uni of Manchester. Discovery’s aim is to provide a metadata ecology’ for UK education and research – and it does this by focusing on open and accessible data.
What happens, suddenly, is the data ecosystem starts to mingle with the human ecosystems libraries are inevitably a part of. The free flow of data provides the fertile ground for new ideas and services to grow – Innovation is allowed to flourish everywhere on campus – not just within the confines of the traditional walls of the library.
Libraries and their institutions need to ensure an environment where this flourishing of innovation can happen, and that there are the right skills and people to recognise those opportunities, and help develop further the ideas and prototypes.
Yesterday saw the shared academic knowledge base (KB+) briefing day for approx. 60 library directors and senior managers take place in London, at the Wellcome Trust.
The project, known as KB+, is developing a shared community service that will improve the quality, accuracy, coverage and availability of data for the management, selection, licensing, negotiation, review and access of electronic resources for UK HE.
The aims of the day were to:
- Provide an update on the progress of the shared academic knowledge base project;
- Surface and share some of the questions, concerns and ideas participants may have about the project and the management of electronic resources in general;
- To let participants know what will be happening next with the project and how you can get involved if you would like.
The day began with Ben Showers (JISC Programme Manager) providing some context to the work and situating the project within the wider subscriptions management landscape. The presentation can be found here: Shared Academic Knowledge Base: Context and Landscape
The meeting engendered a large amount of discussion about the project, with participants freely sharing concerns, ideas and possible solutions to some of the issues that surfaced.
Extensive notes were taken from the Q&A sessions to help inform the project, but instead of repeating verbatim the questions and answers I have tried to highlight some of the themes that emerged during the meeting below.
A number of themes emerged during the day and, while this is not an exhaustive list, these are some of the recurring or critical issues that were surfaced:
Transformation of current practice
It was acknowledged that this project was potentially transformative; it has the potential to change what might be termed the bread and butter of library work. Therefore its impact on the community, and how it works, could be significant.
This means that the community, from senior managers to practitioners and beyond will be keenly interested in the developments and the project will need to build trust and facilitate the involvement of the whole library community. Which brings me on to another of the days themes:
This was a theme that seemed to surface at regular intervals during the day. There was a clear message that the project needs to be able to communicate regularly with the library community on both progress and developments as they take place. This might manifest itself in a newsletter such as that employed by the Discovery programme, or utilising existing communication channels from JISC, JISC Collections and other sector bodies (or indeed a combination).
The combination of communications channels is also important given the range of stakeholders interested in the developments, from commercial vendors and publishers to librarians in the UK and internationally.
Under this theme there were issues surrounding how the communication channels would allow for more interactivity than might otherwise be usual in a JISC funded project given both the high profile nature of the project, as well as the need for ongoing community engagement in the work.
Closely related to communications was the topic of engagement.
Specifically a lot emerged on how the community, especially ERM librarians and similar, could be engaged in the project in a useful and meaningful way.
In his presentation Liam made it clear that the project hopes to ‘recruit’ a number of embedded librarians where the project will pay for a proportion of their time to work on the project. It was made clear by the participants, however, that it would need to be made clear what expectations any involvement might have, from the skill levels and expertise of the person, through to the time length they might be involved.
Clarity on these issues would be key to maintain sector engagement.
It was also suggested there might be the need for something like an ‘advocacy pack’ so that library directors had the arguments to convince senior staff of the benefits of engaging with the project.
An interesting sub-theme within engagement was the power of the institutions themselves to help engage with the commercial companies and organisations they work with to put pressure on them to both work with the project as well as implement the recommendations and standards the project might recommend.
The message was clearly that this was a partnership.
Collaboration and leveraging other work
It was expressed a number of times how important it will be for the project to leverage this work and funding with other initiatives and projects that can help the KB+ project deliver its outputs.
It was acknowledged how much work was currently taking place around this area, such as national projects such as KBART , TERMs and JISC funded projects such as the journal usage statistics portal and e-journal archiving work including Peprs and the entitlement registry, as well as international projects such as the Open Library Environment.
This helped reinforce the projects own ambitions of engaging with, and where possible working with these complimentary projects and initiatives.
This is a shared service, but it will be important that when issues are surfaced by an individual institution, or indeed a problem is resolved by someone, that the whole community can be made aware of this.
What tends to happen now is a problem will be reported to a supplier and that problem is then normally resolved, but no one other than the originating institution knows about this.
Further points of discussion
There were a number of other points of discussion including:
- The potential conflict between aiming for quality of data and ensuring its timeliness. It’s essential that quality doesn’t impact on the ability of libraries to deliver services to users as and when they want them.
- Print subscriptions: The briefing day concentrated largely on electronic resources, but it was clear that participants wanted to see print incorporated in the work. The project is taking a unified approach so this won’t be an issue, although electronic will remain the focus for much of the work.
- The Identifiers elephant! It was clear that participants also felt that the issue of how the project deals with identifiers (be those institutional, journal title etc) will be a critical.
- Decision making and workflows will be two potential aspects of the final service, but it is important to recognise that a focus on the decision making components that the service will deliver could help strengthen potential business models, and demonstrate real value to institutions.
As the event demonstrated, there will be a lot more work going on over the next few months to get the project into a position where it can successfully transfer to a service.
In the meantime, this won’t be the last you’ll hear of the project, with plans already in place to start communicating and engaging with the community over this important shared service.
If you would like to find out more about the project, or have any questions them please feel free to contact Liam Earney at JISC Collections.
Recently the library, museum and archive world has taken to experimenting with open data with a vengeance. It seems an interesting new dataset is released under an open licence most weeks.
There are many motivations behind these data releases but one of the major ones is the hope that someone else will think of something cool to do with the data (to mangle a Rufus Pollock quote).
The rules of the competition are laid out in detail on the Discovery site but in essence all that’s needed to enter the competition is to develop something using one of 10 recommended datasets. You can use other datasets too but you have to do it in conjunction with one or more of the 10 datasets listed on the Discovery site.
I’m probably revealing my nerdy librarian hand here but the 10 datasets are really rich and exciting:
- There is library data from the British Library, Cambridge and Lincoln
- There is archives data from the National Archives and the Archives hub;
- Museum data from the Tyne and Wear Museums collections
- English Heritage places data
- Circulation data from a few UK university libraries
- The musicnet codex
- And search data from the OpenURL router service
There are 13 prizes to be won so there is every incentive to enter even if you are somehow able to resist the siren call of all that exciting data!
The competition is open now and closes on the 1st of August.
JISC is currently funding a range of projects that investigate how data stored in institutional systems can be mined to gain insights into the way that university services are operating and use those insights to improve the services. These projects are spread across two programmes, the activity data programme and the business intelligence programme. There are a few other projects working in similar areas spread across other programmes.
Last week we took the opportunity to bring most of these projects together to discuss their various approaches and to think about what else JISC can do to help universities make the most of their activity data.
We started the event with lightning talks from each project attending. I suspect faithfully listing all of those projects here would probably make for a gruelling reading experience. So instead I’ll group them into the broad motivations the projects are pursuing. Some projects fall under more than one category. I have linked to the presentations the projects gave, if I do not have the slides I have linked to the project website.
Some are mining data to gain insights into behaviour of people or systems in the institution to allow better resource allocation and intervention at crucial periods
LIDP, Supporting institutional decision making, Bringing corporate data to life, Lumis, IN-GRiD, Retain, Student Engagement Traffic Lighting
The projects cover a vast range of areas from libraries to student management to environmental monitoring. Despite this breadth there are some common issues. These are the issues that jumped out at me on the day:
- Not all institutions have people with the technical skills and the statistical skills required to manipulate and analyse data .
- A lot of these datasets are large and that brings up issues of how to store and manipulate the data and how do you decide what to retain.
- Institutions might need to take an institution wide strategic approach to deciding what data should be collected, how it should be exploited and by who. There also needs to be long term strategic approaches to data exploitation in departments like libraries.
- Working across different silos of activity data is a problem we are only just beginning to face.
- Language is an issue. Throughout this blog post I have used the term activity data, but this is a generic term we need to be clearer about what type of data we are talking about.
To help ensure that others can benefit from the lessons the projects learn there are synthesis projects for the activity data programme and the business intelligence programme. The purpose of the synthesis projects is to gather information on key issues and turn them into advice and guidance than anyone in the HE sector can use to inform the way they do things at their institution. Infonet are producing a business intelligence infokit for their synthesis project. The university of Manchester and Sero Consulting are producing the activity data synthesis. You can read the activity data synthesis blog which talks about progress so far and can also see a mindmap that describes the areas their final website will cover.
We ended the day with a discussion of the possible ways that JISC could look to address some of these issues. We produced a long list of very good ideas. We also had a go at prioritising which were the most pressing or valuable. Our top 6 ideas were:
- Developing guidance for institutions on taking a strategic approach to exploiting activity data
- Addressing the need for new skills for exploiting activity data both from a technical perspective and from a statistical skills perspective
- Establish clear definitions for terms used – this could include a simple glossary and use of examples to illustrate the terms
- Developing a culture of exploiting data in institutions
- Exploring what’s involved in ensuring data is easy to reuse
- Study behaviour and how it relates to usage patters
JISC won’t be able to address all of these issues straight away, so my colleague Myles Danson and I will have to decide which we focus on. Comments and advice would be very welcome indeed!
After the meeting Mathieu from the UCIAD project wrote a very interesting blog post about the need to take a user centered approach to activity data so that’s something we’ll need to consider too.
Business intelligence resulting from ‘user activity data’ could help universities to manage resources more efficiently, budget more effectively, make smarter purchasing decisions, improve their services and demonstrate impact. The likes of Amazon and Tesco use activity data to make business decisions and to also provide recommender services on top of this data. They know a lot about thier customers’ purchasing habits. How does this translate to the academic environment? Are we able to utilise this use data for our benefit? This is a very timely and exciting area of investigation for the sector.
We invite you to attend a workshop exploring the potential of this data which is derived from services such as library systems, virtual learning environments and student registries.
As well as informing people of the potential of this data, we are looking to generate ideas and use cases to help plan for future work in this area. So please come ready to participate – I promise it will be an interesting day!
This workshop is suitable for senior managers and practitioners working in libraries, teaching and research. It will be chaired by Professor David Baker, Deputy Chair of JISC, with contributions from practitioners who have practical experience of using user activity data in higher education.
Date: 14 July 2010
Venue: The Hatton, 51-53 Hatton Garden, London, EC1N 8HN
For more information and registration go to: http://www.jisc.ac.uk/events/2010/07/businessintelligence.aspx
Dr. Paul Taylor works at the University of Melbourne and has just finished a 2 week secondment in the UK with the JISC-funded EIDCSR (Embedding Institutional Data Curation Services in Research) project based in Oxford. This is an approximate transcript of a quick 5 minute interview between Paul and Neil Grindley (JISC Information Environment Programme Manager)
Hi Paul, thanks for sparing the time out of a very busy schedule … what role do you have in the EIDCSR project?
Thanks Neil … I’m here to help them come up with a draft policy for the management of research data and records. It’s something we’ve had in place at the University of Melbourne since 2005 and we’ve just completed a revision of the policy to hopefully help make it a little more useful for researchers.
Tell us a little bit more about how that policy has been developed at the University of Melbourne and the reactions to it from researchers and data managers.
As I said, we’ve had policy in place since 2005 and early this year we were asked to work out how compliant we were with it, on the basis that if you have a policy and no-one pays any attention to it, its probably not much use keeping it there! Not surprisingly, we found out that most people weren’t compliant and also didn’t really know that the policy was there. We’re hoping that was the reason that they weren’t compliant rather than any sort of animosity against policies in general – but that’s still to be determined.
We reviewed the policy for two reasons: firstly to try and make it of more use to researchers (… there’s limits to that because when you are writing a policy to go across the institution, it has to contain really high level principles about the management of research data. If you get too specific you rule large populations out and then people pay even less attention to it than they did before). Secondly, its to get some attention and a bit of refocus on the data management area. There are a lot of things happening at the university at the moment in terms of the services that the university intends to provide for it’s researchers and some other changes in the Australian environment. We’re hoping to lock the high-level principles away in policy documentation and focus on keeping the guidance, information and support materials up to date and relevant for researchers.
The sustainability of keeping that guidance and information for researchers up to date is a real issue. Capturing their feedback and working it back into future iterations of those materials (and ultimately the policy documentation) is a desirable outcome but also a big challenge isn’t it?
Yes, it is.
How do you think that the policy that you’ve developed in Melbourne transposes to the University of Oxford?
That’s a good question … one of the things that we’ve learnt from the 2005 version of the policy is that its not enough to have the central policy on its own. There needs to be some kind of localisation of the policies and so with this new version of our policy we’ll be asking faculties to come up with their own enhancements so that it makes more sense to their researchers, and then probably get departments to do the same thing. I’d imagine the same sort of system could work at Oxford but it would be a little more complex with the number of people that would need to be involved in coming up with these localised versions of the policy. The hope is that there will be a trickle down effect from the high-level policies which have a practical influence on the way that researchers go about managing data.
In the meetings that I’ve had since I’ve been here, there have been some excellent examples of data managers and data management researchers (I guess you’d call them) who are working closely (one-on-one) with researchers who have come up with some excellent and novel solutions. I think the more that that can happen – a sort of resourcing at the coal face – then the more likelihood there is of high level principles trickling down to meet some of the very local one-on-one researcher-based developments. At that stage, perhaps there would be a general improvement in the management of research data across the institution.
One of the things I’ve heard a lot from people is the need for it to be a federated system. A lot of the departmental research groups have come up with their own systems for managing their own research data. Anything new that is provided centrally from the university has to try and complement those processes rather than take them over. That wouldn’t work well here (in Oxford) and it wouldn’t work in Melbourne. It would tend to antagonise people rather than improve the situation.
Yes … that principle of embedding existing processes and workflows into broader policy initiatives is an important concept for institutions grappling with these kinds of issues at the moment. Thanks very much Paul.
University of Melbourne – Policy on the Management of Research Data and Records (2005)
Review of Policy on the Management of Research Data and Records (2009)
EIDCSR Project (Embedding Institutional Data Curation Services in Research)
Finally got around to looking at the article on data that appeared on the Nature website last week.
Very nice to see JISC mentioned so positively in the editorial. They mention the Digital Curation Centre by name which is obviously one of the key pieces of support and infrastructure that JISC is funding to ensure that UK universities and colleges have access to advice and guidance in the handling and managing of research and other types of data.
Some other resources they didn’t have space to mention …The DCC (in collaboration with the Research Information Network) run the Research Data Manager’s Forum. This is a series of meetings that have brought a number of practitioners, funders and other stakeholders together to examine and discuss the issues facing data managers and curators.
There is a mailing list available that is geared towards this community
There is a recent report (Nov 2008) that looks at the Benefits of Curating and Sharing Research Data.
Another report (Jan 2009) looks at various national infrastructures enabling the sharing of data.
Earlier reports are available … one looking at the skills, roles and career structures that are required to support data scientists
All of which build on a report from 2007 authored by Liz Lyon, “Dealing with Data”.
The JISC Research Data Management programme is now in full swing and is in the process of starting 8 new major projects that will examine various aspects of Data Management Infrastructure. These projects will be supported by the DCC and other initiatives that will progress specific areas of complementary work (e.g.Tools).
A couple of weeks ago I attended the RLUK conference, their first conference and one that everyone there seemed to enjoy. Unfortunately I only made it for the last day for a slot where a panel of funders, policy bodies and service providers, including JISC, said a few words about priorities and partnership with others.
I did get to hear Lynne Brindley speak. She covered a lot of ground and most of what she said chimed with JISC priorities; albeit coming from a different set of organisational boundaries. Anyway I thought I’d just jot down what Lynne said as I think the issues she raised are well worth recounting here. I might’ve misinterpreted some things, especially since it was a while ago now but on the whole I think I’ve captured the main points.
In general she was referring to the fact that in the complex digital environment offering services that remain relevant and take advantage of what Lynne called “mass creativity” can be difficult. But she said the choice for libraries is “ to engage or not engage”. Unsurprisingly the message was to engage.
A summary of issues she raised:
• Developing digital information services does incur a cost. A lot of innovative projects have been developed but we have not yet fully tackled sustainability.
• Libraries should support innovative scholarship. We’re now in a complex world where the web is a platform of “mass creativity” but offers real opportunities for innovative scholarship. She referred to some examples where digitisation and making digital resources available have led to new knowledge.
• Libraries need to move well beyond the critical role they play in licensing and recognise that things like document supply are not as relevant as they once were.
• “life beyond the document” how should libraries respond to this?
• The research data question and the skills gap – we have data librarians but not enough of them; traditionally libraries are more orientated towards humanities.
• Masses of information of different types – blogs, email etc are all important to scholarship they are the ephemeral information of today; what are we doing about versions of works or notes and annotations? Think of authorship and how notes are kept of authors that enhance research.
• Many people use information in different ways, skim reading etc, therefore should delivery be different, does it matter that people use information differently? Information literacy does that matter? Should libraries be helping to equip people with the skills to make the right judgments?
• The researchers of the future (and quite a few researching now) come from the born digital age and will use information differently, so what is information literacy?
• Web archiving: the web is a huge resource that must be accessible into the future for research; the legal issues are a problem but hopefully legal deposit will make a difference.
• The value of the library can sometimes be summarised as: authenticity, authority and long-term use – what about authority v amateur?
• Digital preservation is very important – this has been seen as important at policy and government levels but now it is getting into the public conscience – this is when libraries start to have real success with these issues. Just tell someone that all those photos will not be accessible and they can relate to it.
• She ended on intellectual property (IP) and referred to the EU Green Paper on Copyright and how IP deserved attention and organisations, such as academic libraries, needed to take action so any risk of locking information down further was mitigated. She emphasised that without reasonable copyright exceptions there is a risk to democratic society.
A lot of these issues are being addressed by libraries and organisations like the British Library and JISC, for example we’re responding to the EU Green Paper on Copyright in the Knowledge Economy. But despite that all of the issues require further debate and change.
JISC is about to launch a collaborative initiative with SCONUL, RLUK, The British Library and RIN that builds on our Libraries of the Future campaign and that will seek to further understand and shape the position of libraries into the future. Watch this space…it should be announced shortly.
Back last year, following the Digital Curation Conference in Washington DC, JISC and the Andrew J Mellon Foundation hosted an international workshop to discuss and suggest where the international priorities are for research and development work supporting academic research data curation. It’s taken a while for the notes to become available, for which I apologise, but here they are:
Priorities for research data curation workshop 2007
(I realise this is a PDF file, which won’t please everyone, but shrunk the filesize by over an order of magnitude from MS Word)
The starting point for the workshop was a recognition that, while research data orients largely by (sub)discipline, the way in which infrastructure is developed and funded is often oriented nationally, or even around institutions. Some way is needed to square these two. I have to confess that, on the day, I wasn’t sure we’d made a lot of progress, but in drafting the notes I changed my mind somewhat. Certainly, Peter Murray-Rust seemed to identify the academic department infrastructure as a key point where intervention could serve both that department and the wider goal of data curation and sharing. The photos of flip chart diagrams are perhaps not easy to read or understand, but suggest a distinctive place for libraries and repositories.
Greg Crane’s Perseus project anticipated some of the topics that were covered later – notably how to design an infrastructure that is sustainable and yet adaptive – there are a few ideas in the notes. there are also a few ideas about how the problem space might be broken down so that an international approach can be taken, though this remains difficult. With luck and effort, JISC’s and other UK ‘data’ work will join up with that in the US (eg the NSF Datanet programme), Australia (Australian National Data Service), etc, and these notes will help us do that.
Many thanks to the workshop participants, listed at the end of the notes.