Observing the Web
Like a lot of people, when I think about it, or when I’m reminded about it, I understand that the Web is a place where someone is always watching what you do. I understand that … but then I think, well … the Web is such a huge beast; such a vast ocean; such a giant metropolis where the comings and goings of individuals are insignificant. How and why would anyone notice what I’m looking at and which links I’m clicking on?
Then up pops Tom Barnett from Switch Concepts Ltd. at a meeting yesterday to tell us that ‘Google has a file the size of an encyclopedia on everyone in this room.’
Hmmm … that’s not a particularly comfortable idea for someone to put in your head. I start to feel a vague sense of paranoia creeping through my mind.
And then I think, c’mon Neil, pull yourself together! Google really doesn’t care who you are. They just want to put things in your line of sight that are more rather than less likely to get you to open your wallet and part with your wages!!
Such were the thoughts that were buzzing around my head yesterday at an event organised by the Web Science Trust (http://webscience.org).
The meeting was entitled ‘Observing the Web’ and the purpose was to highlight some of the work that the Web Science Trust and their partners and collaborators are doing to build a global network of Web Observatories providing an open analytics environment to drive new forms of Web research. We went round the room doing introductions and Dame Wendy Hall ended up branding us a ‘motley crew’. Academics, industry players, not-for-profits, technologists, funders, charities, a lawyer. (Quite a respectable looking motley crew in the very smart surroundings of the Royal Society I might add). But ‘motley crew’ felt about right for a topic and a collaborative, academic, open activity that is still exploring the territory and testing new ground. Presumably in contrast to the well-resourced, sophisticated and highly developed (but opaque) methods employed by the corporate observers of the Web (Facebook, Amazon, Google, Microsoft, Yahoo etc.).
The point of all of this ‘observing’ is not to try and take account of every little bit of data and content on the web, but rather to understand what the aggregated use of the Web can tell us; how trends and fashions and changes of behaviour in relation to the Web might illuminate aspects of our society and culture, both now and for future students and researchers.
This was all of great interest to Jisc. We are currently working with the British Library, the Oxford Internet Institute and the Institute of Historical Research on an initiative that aligns very well with the notion of the Web Observatory.
The Big Data project (http://www.oii.ox.ac.uk/research/projects/?id=88)
and
the AADDA project (http://www.history.ac.uk/projects/digital/AADDA)
are both using a copy of the Internet Archive’s collection of UK domain websites collected over the period 1996-2010, to examine new ways to engage with the web at domain level, and develop new forms of research that leverage the scale of the web. As the name of the Oxford project says … it’s all about using ‘Big Data’.
This was work that emerged from influential JISC-funded reports commissioned in 2010 -
Researcher Engagement with Web Archives
http://www.jisc.ac.uk/whatwedo/programmes/preservation/researcherengagementwithWA.aspx
As we heard at the meeting, the academic observatory is a very different proposition to the corporate observatory and comes with enormous challenges including: interoperability (how do we link observatories?); access (asides from Twitter which of the big corporates will let us use their data?); privacy (will people feel spied upon?); and sustainability (what is the business model?).
A fascinating meeting and big topic. There will be more discussion in Early May at the ACM Web Science Meeting in Paris.
http://www.websci13.org/registration/
Licensing Data as Open Data
One of the findings that has emerged clearly from the UK OER Programme and from the UK Discovery work is that for a healthy content ecosystem, information about the content needs to be available to many different systems, services and users. Appropriately licensing the metadata and feeds is crucial to downstream discovery and use.
The OER IPR Support Project have developed this fabulous animation to introduce the importance of open data licensing in an engaging way.
It was developed out of the UK OER Programme but informed by the work of several other areas including UK Discovery, Managing Research Data, the Strategic Content Alliance, and sharing XCRI course feeds. With thanks to the many people who helped in the storyboarding, scripting and feedback: particularly Phil Barker, Tony Hirst and Martin Hawskey.
You may remember the same OER IPR team produced the Turning a Resource into an Open Educational Resource (1,700+ hits and counting). The team is Web2Rights (Naomi Korn, Alex Dawson), JISC Legal (Jason Miles-Campbell) and the animator is Luke McGowan. The whole animation is (c) HEFCE on behalf of JISC, and Creative Commons Attribution Share Alike 3.0.
We hope it will have wide usefulness and we very much welcome feedback.
Amber Thomas, JISC
When ideals meet reality
At ALT-C in early September I ran a session with David Kernohan on Openness: learning from our history. The theme of the conference was ” a confrontation with reality” so it seemed fitting to explore the trajectories taken by various forms of openness. What follows is just a short thought piece that I contributed to the session about some of the patterns I have observed over the past decade or so.
Curves and Cycles
The first thing to say is that we are all different in our encounters with new approaches Whether they are new technologies like badges or new delivery models like MOOCs. We are each on our own learning curves and changed curves, and we meet new ideas and solutions at different points in the hype cycle. That is a lot of variation. So when we meet new ideas, we can respond very differently. My first message is that every response is a real response.
Polarisation
It’s to easy to characterise people as pro- or anti- something It’s too easy to present things as a debate for or against But polarisation often masks the real questions, because we don’t hear them properly.
“The use of technology seems to divide people into strong pro- and anti-camps or perhaps utopian and dystopian perspectives” Martin Weller, The Digital Scholar [1]
Dialectics of Open and Free
There is usually a dialectic around open and free: free as in freedom, free as in beer [2]. “Open as in door or open as in heart”: Some courses are open as in door. You can walk in, you can listen for free. Others are open as in heart. You become part of a community, you are accepted and nurtured [3]. I always add: open as in markets?
Branching
To follow on from free as in beer …A great example of branching is the trajectory of the open source movement. there were big debates over “gratis vs libre” and that gave birth to the umbrella term Free AND Libre Open Source Software term – FLOSS. By enabling the practices of open source to branch off, to allow the community to branch off, we saw a diffusion of innovation. Towards profit-making business models in some areas, free culture models in others. Its interesting how github supports the whole spectrum
This has also been the approach of the UK OER Programme. We have been quite pluralistic about OER, to let people find their own ways. We have certainly had tensions between the marketing/recruitment aspect and the open practice perspective. What’s important to note is that often it’s not just one model that comes to dominate.
Tipping points into the mainstream
We don’t always understand what brings about mainstreaming. We very rarely control it.
Consider a story from open standards: the rise and fall of RSS aggregation. Is it netvibes or pageflakes that made the difference? Or google-reader? At what point did twitter and facebook start to dominate the aggregation game and overtake RSS? The OER programme gave freedom for each project to choose their platform. They didn’t chose a standard they chose a platform. It’s often when open standards are baked in to platforms that we see take-up without conscious decision making.
I’m not sure we always notice: sometimes when mainstreaming happens we don’t recognise it. When did e-learning become part of the fabric of education?
Pace
Finally, change can take a lot longer than we hope. The 10 years since the Budapest Open Access Initiative [4] can feel like geological time. And yet the OA movement has achieved so much. Perhaps we need some time-lapse photography approach to recognising the impact of changes we started back then. So many more people understand OA now. So many more people care.
Change takes longer than you think
Key Messages
We are all unique in our encounters with new things. Polarisation often masks the real questions. There is often a dialectic around open and free. Often it’s not just one model that comes to dominate. Sometimes when mainstreaming happens we don’t recognise it. Change can take a lot longer than we hope.
References:
[1] http://www.bloomsburyacademic.com/view/DigitalScholar_9781849666275/chapter-ba-9781849666275-chapter-013.xml
[2] http://www.wired.com/wired/archive/14.09/posts.html?pg=6
[3] http://followersoftheapocalyp.se/open-as-in-door-or-open-as-in-heart-mooc/
[4] http://en.wikipedia.org/wiki/Budapest_Open_Access_Initiative
Data-Driven Library Infrastructure: UKSG 2012 Presentation
Below is a copy of the plenary presentation I gave to the UKSG conference 2012. I have also included a much reduced transcript of the talk to provide some context to the slides.
My presentation was about looking at library services and systems from a data-centric point of view. Specifically, it was about the potential that library data has for the creation of new services and improved systems.
This isn’t a radically new vision – indeed the idea of data-driven is something that seems all pervasive at the moment (data-driven journalism etc). Rather it is a way to refocus, or possibly to re-align our thinking so what may appear problems at the present are viewed as new opportunities.
There is also a video of the presentation available:
I began my presentation with a video. The video was made by University of Lincoln students without formal permission from the university and upoaded to YouTube.
So, I think the film highlights nicely the three main themes of my presentation:
- Situating services and infrastructure within the wider ecosystem (this might be institution; community; society etc) – allow innovation to flourish anywhere, and ensure you’re in a position to take advantage of it;
- Redistributing effort – focus on the services that have an impact for users, ensure you have the talent to recognise those emergent opportunities and embrace them;
- Covering all eventualities: Future proofing – become agile and more entrepreneurial. The barriers for students creating the video were incredibly low: flip cam and youtube. Barriers to students using library data should be low too
1. Ecosystem
Taking a data centric approach enables the library to affect the entire ecosystem that they inhabit.
Focusing on the data forces us to think about the other sources of important data within the institution: the Repository, VLE’s, student records etc. The wider data ecosystem becomes evident, and the potential of the data underpinning those systems can be realised.
A really good example of this is the Discovery work that’s currently being undertaken by JISC and RLUK and Mimas at the Uni of Manchester. Discovery’s aim is to provide a metadata ecology’ for UK education and research – and it does this by focusing on open and accessible data.
What happens, suddenly, is the data ecosystem starts to mingle with the human ecosystems libraries are inevitably a part of. The free flow of data provides the fertile ground for new ideas and services to grow – Innovation is allowed to flourish everywhere on campus – not just within the confines of the traditional walls of the library.
Libraries and their institutions need to ensure an environment where this flourishing of innovation can happen, and that there are the right skills and people to recognise those opportunities, and help develop further the ideas and prototypes.
2. Effort
3. Eventualities
My Story of O(pen)
Here at JISC we think a lot about openness: what it means, how to support it, where it takes us.
This is my contribution to that thinking. It is very much my individual views, but informed by the work we do at JISC, and by the Open Knowledge Foundation, amongst others.
My open narrative
Open makes things visible.
The everyday sense of “open” is open rather than closed – letting people see what is there, what is happening.
The web enables you to;
- do some of your processes/practices online, visible to others
- share some of your products/outputs online, visible to others
Open makes access easy.
This is where open–as-in-open-access comes in: open without needing to log in, and open without payment.
SO
Open is social.
The “many eyes” principle of sharing open data and the open innovation model encourage others not only to view but to comment, to feed back, to engage. This speeds up the process in hand and improves the quality of the resulting work.
AND
Open makes things usable by others.
Open standards exist to encourage as many developers as possible to adopt them.
This is where open licensing comes in: granting others explicit and generous permissions to use your content.
FURTHERMORE
Open can be a way of working.
Doing open working and openly releasing outputs can make a person feel differently about what they do. Researchers might call this collection of activities open scholarship, technologists might call their activities open development, project teams might call it open innovation. Each of these types of open practice has elements in common and elements specific to the sorts of activities the practice involves.
HOWEVER
Open is not exclusive
Open source can mean both the open development process and the open source software. They are not always found together: open development processes can produce non-open software, and closed development processes can produce open source software.
BUT
Opens are mutually beneficial
There is a virtuous cycle when open process and open products combine. In open scholarship, both creating and using open content and using open ways of working, the content feeds the practice feeds the content.
I’m watching the Openness in Education course with interest and I expect this whole meta open concept to deepen in 2012.
A Diagram of Opens
Its important to note that is is an abstracted diagram: in my view, open is not a replacement for the way things currently work. There is not ever going to be a total transformation to open. The reality is a mixed economy. Business models matter. Practice models matter.
Open can be good for business, open can be good for practice but it exists in a bigger ecosystem of technologies and behaviours. Good is not enough, it needs to be useful. That’s what JISC and other advocates of openness are working hard to surface.
Ultimately I think open is good because it is a good way of working.
Amber Thomas

My Story of O(pen) by Amber Thomas is licensed under a Creative Commons Attribution 3.0 Unported License.
From infteam.jiscinvolve.org.
Permissions beyond the scope of this license may be available at http://www.jisc.ac.uk/contactus
Activity data and data protection – what am I allowed to do?
JISC has been engaging in scoping the potential for activity/usage data for the HE sector under the Activity Data Programme with 9 projects. There is much potential that activity data brings in terms of business intelligence for various uses such as recommendation services, collections management etc. A very engaging synthesis of the work has been produced plus high level guides for activity data.
However up to this point, the relationship of usage data and the potential arising from data protection issues has not been explored in depth. Following on from a report commissioned from JISC Legal together with a briefing paper by Naomi Korn and Charles Oppenheim, certain issues have been explored in depth which are not clear cut such as the creation and subsequent use of anonymised data which does not contain any personally identifiable information (name, age etc) but where the mashing up of this data could lead to a user being identified. Services therefore need to be mindful of this but not let it prohibit the potential that activity data affords. The papers and accompanying FAQs explore these cases as well as instances where consent has been given to use personally identifiable information, the importance of seeking consent, how consent might be given and when?
- High level guides for activity data
- Personal Data and Consent Management: A Briefing Paper with FAQs
- Consent Management: Handling Personalisation Data Lawfully (full report)
Berlin 9 conference
I am just back from the Berlin 9 conference. The “Berlin” series of conference are named after the Berlin Declaration on Open Access, and this was the first time the annual conference has been held in North America. It’s very hard to summarise my reactions from the conference, there were so many stories showing how opening up scholarship can lead to real benefits, in health, development, innovation and our quality of life. For example, Cyril Muller from the World Bank described how that organisation has adopted an open approach to the work it funds, and to its own operations, and is encouraging the governments with whom it works to do the same. Laura Czerniewicz from the University of Cape Town showed how open education resource, configured for SIM-enabled mobile devices, can make a real difference to some quite seriously disadvantaged students. And Elliot Maxwell highlighted some wonderfully elegant research studies, showing clearly how, when scientific findings and resources are made open, it leads to a greater diversity, quality and application of knowledge. Of course, there are implications. Michael Crow of Arizona State University argued that all this requires us to re-think the university as a social technology, and Philip Bourne highlighted some of the challenges we have in moving to a research practice that is native to the digital environment, genuinely reproducible, and that rewards researchers who move in that direction. The overwhelming impression, though, was of a scholarly community now adopting more open approaches, and beginning to see tangible benefits from that. Berlin 10 is on the African continent for the first time. I hope it will bring new voices to be heard in this community.
Shared Academic Knowledge Base (KB+) – Library Directors event
Yesterday saw the shared academic knowledge base (KB+) briefing day for approx. 60 library directors and senior managers take place in London, at the Wellcome Trust.
The project, known as KB+, is developing a shared community service that will improve the quality, accuracy, coverage and availability of data for the management, selection, licensing, negotiation, review and access of electronic resources for UK HE.
The aims of the day were to:
- Provide an update on the progress of the shared academic knowledge base project;
- Surface and share some of the questions, concerns and ideas participants may have about the project and the management of electronic resources in general;
- To let participants know what will be happening next with the project and how you can get involved if you would like.
The day began with Ben Showers (JISC Programme Manager) providing some context to the work and situating the project within the wider subscriptions management landscape. The presentation can be found here: Shared Academic Knowledge Base: Context and Landscape
The meeting engendered a large amount of discussion about the project, with participants freely sharing concerns, ideas and possible solutions to some of the issues that surfaced.
Extensive notes were taken from the Q&A sessions to help inform the project, but instead of repeating verbatim the questions and answers I have tried to highlight some of the themes that emerged during the meeting below.
Themes
A number of themes emerged during the day and, while this is not an exhaustive list, these are some of the recurring or critical issues that were surfaced:
Transformation of current practice
It was acknowledged that this project was potentially transformative; it has the potential to change what might be termed the bread and butter of library work. Therefore its impact on the community, and how it works, could be significant.
This means that the community, from senior managers to practitioners and beyond will be keenly interested in the developments and the project will need to build trust and facilitate the involvement of the whole library community. Which brings me on to another of the days themes:
Communication
This was a theme that seemed to surface at regular intervals during the day. There was a clear message that the project needs to be able to communicate regularly with the library community on both progress and developments as they take place. This might manifest itself in a newsletter such as that employed by the Discovery programme, or utilising existing communication channels from JISC, JISC Collections and other sector bodies (or indeed a combination).
The combination of communications channels is also important given the range of stakeholders interested in the developments, from commercial vendors and publishers to librarians in the UK and internationally.
Under this theme there were issues surrounding how the communication channels would allow for more interactivity than might otherwise be usual in a JISC funded project given both the high profile nature of the project, as well as the need for ongoing community engagement in the work.
Engagement
Closely related to communications was the topic of engagement.
Specifically a lot emerged on how the community, especially ERM librarians and similar, could be engaged in the project in a useful and meaningful way.
In his presentation Liam made it clear that the project hopes to ‘recruit’ a number of embedded librarians where the project will pay for a proportion of their time to work on the project. It was made clear by the participants, however, that it would need to be made clear what expectations any involvement might have, from the skill levels and expertise of the person, through to the time length they might be involved.
Clarity on these issues would be key to maintain sector engagement.
It was also suggested there might be the need for something like an ‘advocacy pack’ so that library directors had the arguments to convince senior staff of the benefits of engaging with the project.
An interesting sub-theme within engagement was the power of the institutions themselves to help engage with the commercial companies and organisations they work with to put pressure on them to both work with the project as well as implement the recommendations and standards the project might recommend.
The message was clearly that this was a partnership.
Collaboration and leveraging other work
It was expressed a number of times how important it will be for the project to leverage this work and funding with other initiatives and projects that can help the KB+ project deliver its outputs.
It was acknowledged how much work was currently taking place around this area, such as national projects such as KBART , TERMs and JISC funded projects such as the journal usage statistics portal and e-journal archiving work including Peprs and the entitlement registry, as well as international projects such as the Open Library Environment.
This helped reinforce the projects own ambitions of engaging with, and where possible working with these complimentary projects and initiatives.
Sharing problems
This is a shared service, but it will be important that when issues are surfaced by an individual institution, or indeed a problem is resolved by someone, that the whole community can be made aware of this.
What tends to happen now is a problem will be reported to a supplier and that problem is then normally resolved, but no one other than the originating institution knows about this.
Further points of discussion
There were a number of other points of discussion including:
- The potential conflict between aiming for quality of data and ensuring its timeliness. It’s essential that quality doesn’t impact on the ability of libraries to deliver services to users as and when they want them.
- Print subscriptions: The briefing day concentrated largely on electronic resources, but it was clear that participants wanted to see print incorporated in the work. The project is taking a unified approach so this won’t be an issue, although electronic will remain the focus for much of the work.
- The Identifiers elephant! It was clear that participants also felt that the issue of how the project deals with identifiers (be those institutional, journal title etc) will be a critical.
- Decision making and workflows will be two potential aspects of the final service, but it is important to recognise that a focus on the decision making components that the service will deliver could help strengthen potential business models, and demonstrate real value to institutions.
As the event demonstrated, there will be a lot more work going on over the next few months to get the project into a position where it can successfully transfer to a service.
In the meantime, this won’t be the last you’ll hear of the project, with plans already in place to start communicating and engaging with the community over this important shared service.
If you would like to find out more about the project, or have any questions them please feel free to contact Liam Earney at JISC Collections.
Marketing and other dirty words
I have been thinking a lot recently about how to move beyond the rhetoric of “open equals good” towards identifying where open approaches help us meet key business cases. A notable quote from the Power of Open book launch was that “open isn’t a business model, its a part of a business model”. I’m seeing this trend in open educational resources, open access repositories and open innovation. It’s how open source became more mainstream, and we need to be learning from that journey. If we want to see open approaches sustained, we need to get businesslike about how make the case, however contradictory that might sound.
Earlier this month I spoke at a UKOLN event on metrics and the social web, and the discussion there reinforced the potential of using the web more effectively to underpin our key business goals in further and higher education.
On 26th July I am presenting at the Institutional Web Managers Workshop 2011 and I will be developing this theme further, paying particular attention to the way that web managers can support open access, open educational resources and open social scholarship.
In reflecting on how open access and OER can contribute to the core business cases of universities, I think that activities particularly worthy of more attention include:
- Profiling academic expertise
- Supporting REF impact metrics
- Enhanced research publications
- Cross-linking open content to open course data
- Social media listening tools
- Web analytics and visualisation
My presentation on slideshare: Marketing and other dirty words
Choosing Open Licences
I thought it might be useful to round up some recent JISC-funded resources to support licensing decisions. The emphasis in these tools is on open licences because that’s what I’ve been most involved with, but I’d love to hear of other key resources to help people choose licenses.
To get started, there is an overview of the openness of open licences which explores how open the various licenses are. This is part of the Strategic Content Alliance IPR Toolkit.
Licensing open data: a practical guide is a thorough piece of advice about licences for open data of various kinds. This comes out of the UK Discovery work.
The open bibliographic data guide is a really well structured resource to help libraries and content owners decide whether to make metadata available freely on the web under an open licence. It is from the UK Discovery work that JISC has been funding.
And more broadly:
The IPR and Licensing Learning Module from the Strategic Content Alliance allows you to work through the key concepts at your own pace.
The Creative Commons compatibility wizards are interactive tools to help us understand what open licences can be remixed with each other. This is from the OER IPR support project suite of tools.

I fully recommend spending a few minutes working with this. It really brings home the importance of thinking through how you licence your work and what you want people to be able to do with it. My “take-home points” were:
- CC Zero or CCBY, unsurprisingly, allow for the greatest re-use
- SA (Share Alike) sounds like the most open versions of the licence but it means any asset can only be licensed back out as the same, so it is really quite restrictive
- You really need to understand what you can and can’t do with content licensed with the ND (Non-Derivatives) clause before you decide to use it, either as a provider or a user.
- The NC (Non Commercial) clause is the most highly debated clause at the point of release, this tool shows its implications, but we still need to understand better what it really means in the context of UK HE now and in the future.
- Use cases really are key to providers weighing up the risks of more open licences vs the opportunities they bring. The more stories we have to support those decisions the better
On a related note, the same tea, have also developed a Risk Management Calculator that helps you assess the risks of reusing existing works of different types in your work, depending on what you know about their licensing and how you want to license your work.
Now moving a couple of highlights from the HEA/JISC OER projects
The IPR for Educational Environments course is aimed at educators, covering all the basics you need to know about IPR.
The MEDEV good practice and risk assessment toolkit steps you through decisions around copyright, consent and data protection.
This seems like a good place to share some clarification following questions I’ve had in the past. Over three years ago JISC Collections developed an open educational licence to accompany their suite of licences . It was designed as a very open licence, to handle some of the complexities and lack of explicit permissions in relation to educational use of Creative Commons licences. For example, it was more specific about institutional liabilities which was important in the educational context. Since CC licences are now much better understood by publishers, such a licence is less necessary so we are not actively promoting it.
Reflecting this growth in awareness of open licensing options, Jorum is now going to focus on openly licensed resources.
Finally a plea: don’t forget that as well as deciding which licence to use, you also have to express it. The best way to do that is to embed it. Check out the guidance on embedding licences from the Strategic Content Alliance, and for images have a look at the Xpert Attribution tool. The growth in smart browser-based tools like Open Attribute is going to help end-users understand better what you are allowing them to do with your content, so think now about ensuring your carefully-chosen licence statement reaches your users.
If you can’t quite see what you want in the list above, there are more resources and tools available through web2rights , The Strategic Content Alliance IPR Toolkit and of course, guidance is available from JISC Legal .
The past few years has seen a step change in the adoption of such licenses, helped greatly by popular web services like Flickr, but we still have much to do in helping to raise awareness of the opportunities offered by open licenses, and to explore the business models around them.
So there we have it: a round up of recent guidance, tools and courses funded via JISC to help you make sound licensing decisions. All developed with FE/HE in mind, but also useful to libraries, galleries, archives and museums, the creative sector and government.
Comments very welcome
Amber Thomas
