Crowdsourcing Citizen Science
Woodcock, J., Greenhill, A., Holmes, K., Graham, G., Cox, J., Oh, E. Y., and Masters, K. (2017) ‘Crowdsourcing Citizen Science: Exploring the Tensions Between Paid Professionals and Users’, Journal of Peer Production, 10.
This paper explores the relationship between paid labour and unpaid users within the Zooniverse, a crowdsourced citizen science platform. The platform brings together a crowd of users to categorise data for use in scientific projects. It was initially established by a small group of academics for a single astronomy project, but has now grown into a multi-project platform that has engaged over 1.3 million users so far. The growth has introduced different dynamics to the platform as it has incorporated a greater number of scientists, developers, links with organisations, and funding arrangements—each bringing additional pressures and complications. The relationships between paid/professional and unpaid/citizen labour have become increasingly complicated with the rapid expansion of the Zooniverse. The paper draws on empirical data from an ongoing research project that has access to both users and paid professionals on the platform. There is the potential through growing peer-to-peer capacity that the boundaries between professional and citizen scientists can become significantly blurred. The findings of the paper, therefore, address important questions about the combinations of paid and unpaid labour, the involvement of a crowd in citizen science, and the contradictions this entails for an online platform. These are considered specifically from the viewpoint of the users and, therefore, form a new contribution to the theoretical understanding of crowdsourcing in practice.
The Zooniverse is a citizen science crowdsourcing platform. It began as a single astronomy project to find a new way to analyse a large dataset of pictures of galaxies, but has since grown to over 50 projects. The projects involve categorising data, often pictures, but this has also expanded to videos and audio. For example, Galaxy Zoo involves categorising pictures of galaxies. In a new iteration of the project, Galaxy Zoo Bar Lengths, the user is presented with a picture of a galaxy and asked “Does this galaxy have a bar?” followed by a yes/no option with further branches of questions (Zooniverse, 2015). Along with this basic task—something that could be compared to the activities on micro-work platforms (see Irani, 2015)— there are two other important factors. The first is the chance of serendipitous discoveries. Unlike micro-work, there is also the possibility to contribute to something beyond the scope of the assigned individual task. This could involve engaging in the process of discovery in a substantive way, or even an individual user discovering something themselves. This potentiality is tied to the ethos of Citizen Science, something that will be discussed in detail later. This involves, either partly or wholly, the involvement of non-professionals in science. However, there are two important divisions to consider when it comes to the role of non-professionals. Firstly, it can just mean the involvement of non-professionals in the processes of data gathering, analysis and interpretation. Secondly, it can mean non-professionals becoming genuinely involved in decision-making processes about science (Lewenstein, 2004: 1). These two divergent positions capture the complexities of citizen science in practice. The Zooniverse provides the opportunity for large numbers of users to engage in the first, while offering the potential for the second. It is not necessary to offer the second to achieve the first; however, the use of the term citizen science at least implies an element of the second.
The infrastructure that allows very large numbers of users to participate simultaneously in the Zooniverse is run on Amazon Web Services cloud servers. Amazon (2016) also runs the Amazon Mechanical Turk on this service that provides “access to an on-demand, scalable workforce”. This involves splitting larger tasks into small fragments and then outsourcing them to a pool of digital workers. The way in which the labour input becomes hidden on these kinds of platforms has been described by Trebor Scholz (2015) as “digital black box labor”. It obfuscates a number of issues: how is the labour process organised and who is doing it? How is it managed and controlled? What is it being used for? And, particularly important for this paper, what tensions are present both inside and beyond the platform? Therefore, the paper takes a lead from Karl Marx’ (1976: 279) metaphorical tailing of “Mr. Moneybags” and “the possessor of labour-power” into the “hidden abode of production”. However, in this case, the focus is not the production per se, but into the obscured processes of digital black box labour: The hidden abode of crowdsourcing.
The paper focuses on the following research questions to explore citizen science peer-production on the Zooniverse platform:
1. What is the relationship between paid and unpaid labour?
2. How is the crowd included in the processes of citizen science?
3. What are the tensions that emerge in the practice of crowdsourcing citizen science?
This paper, therefore, seeks to explore the intricacies associated with co-creation and peer production between the citizen scientist and the traditional scientist when both micro-tasking and serendipitous discovery occur on an online citizen science platform. In particular, the study focuses on the “digital black box of labor” (Scholz, 2015) and the hidden aspects of crowdsourcing on a citizen science platform.
The challenge is to focus the analysis on the users in the Zooniverse. This begins with a review of the relevant literature on crowdsourcing, drawing attention to the key conceptual issues shared across platforms. The next part of the paper discusses the threefold methodological approach deployed in the investigation: interviews with both paid workers and users involved in the Zooniverse and an ethnography from the user perspective. Before moving on to examine the empirical findings, the background of the Zooniverse is explored (see Methodology section). This involves an analysis of the platform’s development, covering the influences of research culture and the university, the culture of computer programmers, and the impact of funding. The findings of the research are presented in the following section, focusing on the classifying process and then the user perspective. The key argument of this paper is that by drawing attention to the users it is possible to explore the tensions and contradictions inherent in a citizen science crowdsourcing platform.
The term crowdsourcing was first coined by Howe in 2006. He stated that it was “the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call.” While Howe’s definition was useful in its early stages, the breadth of crowdsourcing applications in the profit and not-for-profit arena has now breached this foundational definition (Estelles-Arolas et al., 2012). Kaganer (2013: 23) more recently described crowdsourcing as “a third generation sourcing ecosystem” and argues that new organised online intermediaries offer a pool of virtual workers tempting for any potential buyer. Kaganer’s description is more attuned with a traditional market-based value-chain understanding of labour transactions, albeit in the global marketplace.
Crowdsourcing has the potential to provide a new source of productivity, innovation and knowledge capture (Andriole, 2010; Lindic et al., 2011) with its only downsides being intellectual property (IP) leakage, a lack of trust in the crowd due to its amorphous anonymity (Knudensen and Mortensen, 2011), and a low propensity to deal with highly complex, larger projects in a cost effective way. However, value can be realised when contemporary businesses can cost effectively match different types of crowds with unique organisational needs (Erickson et al., 2012). For example, many contemporary organisations require a wide variety of micro-tasks to be carried out including tasks that need a variety of skills, knowledges, cognitive strategies, experiences, problem-solving abilities and/or a combination of all these on a daily basis. Crowdsourcing allows for a process of “pyramiding” – a similar process to snowball sampling to find expertise – and has been identified as an important way that innovation can be achieved with the “broadcast search” type of crowdsourcing (Poetz and Prügl, 2010). The value creation of crowdsourcing is, therefore, determined by the collaboration within and across an enterprise and its ability to match a micro-task with an individual suitable to carry it out (Chui et al., 2012).
The basic process of labour crowdsourcing is encapsulated in micro-work. For Amazon Mechanical Turk, this involves breaking a larger project down into small discrete tasks, or what Amazon (2016) terms HITs (Human Intelligence Tasks)—the individual tasks that workers can complete on the platform. Irani (2015: 720) has drawn attention to the “cultural work” of Amazon Mechanical Turk, highlighting it as “an emblematic case of microwork crowdsourcing”. The interaction—or labour process—is comparable at this point with isolated individuals each adding to a larger overall project. The name of the platform is inspired by the historical Mechanical Turk (or Automaton Chess Player), a hoax from the 18th Century. While “the historical Turk showed off technology to draw attention away from the human laborer, today, Mechanical Turk’s … crowd sorcerers work with coolness and the spectacle of innovation to conceal the worker” (Scholz, 2015). It is in this way that crowdsourcing platforms are similar to a “black box”. This has a dual meaning, referring to the safety recording device found in transportation or to “a system whose workings are mysterious; we can observe its inputs and outputs, but we cannot tell how one becomes the other” (Pasquale, 2015: 3). Scholz (2015) develops this term into “digital black box labor” to capture this process on crowdsourcing platforms. As Scholz (2015) continues to argue that “many researchers have focused on optimizing … these platform ecosystems: trying to make them run more efficiently, more frictionless and with a better understanding of the motivations of the workers.” However, the more pressing task for researchers is to focus on the “building of alternatives, outrage, conflict, and worker organization”.
This micro-tasking is common across for-profit crowdsourcing of labour and not-for-profit approaches. As Smith et al. (2013) have argued, the difference is that citizen scientist users are not motivated by financial gain. They tend to take a sense of pride and enjoyment in their activities on the platform, a factor that can lead to the formation of a dedicated crowd with high quality outputs. This aspect does not mean that citizen scientists can be mobilised simply through their desire to take part in science. There is a risk that these kind of users may feel their contributions (which are often the result of hard work) are not being appreciated and perhaps even exploited (Shahri et al., 2014) in terms of the value that is created is appropriated. Unlike the sharing platforms that became common in Web 2.0—Facebook, Flickr, YouTube and so on—the user-generated content of citizen science represents the possibility of “collaboration”, rather than just “sharing”, drawing on an important distinction made by Hyde et al. (2012:53). This collaboration takes place across what Latour (2005) terms “actor-network”, entities with both structure and agency. In a similar vein, Carpentier (2011) has discussed the challenge of understanding what “participation” means in an online context, drawing a distinction between “minimalist” and “maximalist” forms of participation, drawing attention to questions of power in these relationships.
In the specific case of citizen science, there has been research building on the idea that users have a “cognitive surplus” (Shirky, 2010) that can be engaged in crowdsourcing projects across different disciplines. For example, Jennet and Cox (2014) and Wald et al. (2015) have outlined different principles and guidelines for successfully organising such projects. Similarly, Causer and Wallace (2012) and Dunn and Hedges (2013) illustrate the potential uses in the humanities. According to Bonney et al. (2009), there can be a lack of specialist knowledge, misclassifications and even resulting errors within the data produced using a citizen science platform. However, Lintott et al. (2011) and Willett et al. (2013) have illustrated that the utmost care is taken to ensure the legitimacy of classification as it can be applied to the wider scientific community. Other forms of project specific and or scientific assessment of citizen science platforms exist. For example, Raddick et al. (2009) also partly define successful citizen science projects as the calibration of user contributions – for example, the extent to which appropriately sophisticated algorithms are employed to convert the raw data that is provided by participants into meaningful scientific insight (Wiggins and Crowston, 2011). Other measures of effective project design and resource allocation include the provision of adequate training (Riesch and Potter, 2014), the division of effort between volunteers (Franzoni and Sauermaann, 2014) and the extent to which accurate data can be collected at a lower cost (Dai and Weld, 2010). Finally, Cox et al. (2015) have studied how the best performing projects tend to be those that are more established, as well as those in the area of astronomy.
This operates in a context of calls to democratise sciences (Guston, 2004) and arguments about the relationship between science and democracy (see, for example, the edited collection by Kleinman, 2000). The rise of citizen science, therefore, has important implications for the “experts-lay divide” (Lidskog, 2008) and the role of expertise (Fuller, 2008). There is also the wider environment in which work and organisations are becoming transformed by digital technology. These processes have been captured in the terms immaterial labour (Lazzarato, 1996) and cognitive labour (Boutang, 2011) as new forms of work emerge. The rise of Web 2.0 platforms, like Facebook and YouTube, has drawn users into a process of “produsage” (Bruns, 2008) in which they both use and produce content. This has also resulted in a blurring of the boundaries between work and play, which Kücklich (2005) identifies with “playbor”.
In practice, the combinations of the two dimensions of citizen science, as discussed earlier by Lewenstein (2004: 1), requires reaching a balance. From the perspective of the Zooniverse platform, both motivation and loyalty need to be encouraged. Therefore, most crowdsourcing platforms of this type promote their utilitarian purpose—that the participation of users will further scientific knowledge—in order to facilitate the organisation of the crowd and the users’ collective contributions (Chamberlain et al., 2013). In the context of a not-for-profit system, this is a complex endeavour without the recourse to monetary incentives.
While motivation is important, there are other important distinctions to be made across crowdsourcing platforms. Crowdsourcing can be conceptualised into two different types: lead user innovation and micro-work. These can be differentiated in terms of quantitative and qualitative outputs. The quantitative output is associated with microwork platforms, like Amazon Mechanical Turk, while the qualitative output is closer to those involving lead user innovation, found with the competition type model (where an individual user proposes a new solution in response to a call from a company, see, for example, Stinson, 2014). The quantitative model involves limited user outputs. Microwork “relies on dyadic relationships consisting of one buyer, one supplier and a well defined final deliverable” (Kaganer et al., 2013: 25). The larger projects are fractured into small parts and then the costs are subsequently driven down through internal competition. There is no opportunity for collaboration and innovation is not possible, nor is it incentivised. The qualitative model, on the other hand, involves a problem being tendered out to a crowd with the aim of soliciting solutions. It is a process in which expertise can be sourced from within a crowd. The first method to do this is the “broadcast search” with an open call for ideas or solutions (Jeppesen and Lakhani, 2010). The second is the formal organising of outsourced innovation. In this “arbitrator model”, found with crowdSPRING or InnoCentive, organisations can gain “on-demand access to a specialised community of skilled suppliers who can be engaged on a project via a competition or contest” (Kaganer et al., 2013: 26). The quantitative and qualitative models are both driven by outputs, requiring problems that are identified beforehand, but differ in the level of engagement of the crowd, collectively or otherwise.
The Zooniverse combines both as a hybrid model. While the Zooniverse shares the quantitative micro-work features in the analysis of scientific data, it also offers the potential—and it is important to stress that this is a potential—to contribute more broadly as citizen scientists to qualitative scientific discoveries. The main way in which this manifests itself on the platform is through serendipitous scientific discovery. One example of this is the discovery of Hanny’s Voorwerp. Hanny, a Dutch school teacher, discovered this unusual phenomenon while categorising images of galaxies. After it was passed on to the scientists involved in the project, the resulting discovery was named after her (Lintott et al., 2009). There is also considerable excitement about the possibilities of new discoveries on the projects Higgs Hunter, Planet Hunter and Snapshot Supernova. However, there is a formal distinction between the platform and the scientists on the one hand, and the crowd of users on the other. The Zooniverse has an explicit public engagement aim, achieved by allowing users to engage in scientific analysis but also providing outreach and educational opportunities. However, the question of whether this process goes beyond a unidirectional relationship is less clear. This requires an understanding of how members of the heterogeneous public respond with their own ideas and demands. This paper, therefore, seeks to explore the intricacies associated with co-creation and peer production between the citizen scientist and the traditional scientist when both micro-tasking and serendipitous discovery occur on an online citizen science platform. In particular, the study focuses on the “digital black box of labour” and the hidden aspects of crowdsourcing on a citizen science platform.
This paper combines three different empirical sources in order to investigate the Zooniverse. To understand the specificities of the labour process and the relationship between paid labour and users within the platform, qualitative methods were used to gather data that has been analysed using an interpretivist approach (Walsham, 1995). The paper draws on data collected as part of an ongoing collaborative research project with the Zooniverse. The first method is a set of 23 in-depth interviews with paid employees of the Zooniverse. They cover all of the different roles involved on the platform, including both founding members and newer staff. These interviews were arranged directly with the Zooniverse and were conducted in person in Oxford and Chicago. The second empirical component involved 19 interviews with users. These were selected at random from a sample of users who agreed to be interviewed after a survey, again facilitated by the Zooniverse. Given the geographic spread of the users, the interviews were conducted on Skype. Both sets of interviews were transcribed and these were analysed with the NVivo software package.
The third part of the research is an ethnography conducted by a post-doctoral research assistant. Her online practices were informed by Baym’s (2006) approach to conducting online ethnography. This entails participation as a means to understand important aspects of social life, in this case the life of a citizen science contributor. The ethnography of the Zooniverse was carried out over a year, including classifications and engagement in the online community. The researcher kept a diary on a near-daily basis, which when transcribed, was converted into approximately 80,000 words of thick description. This was accompanied by screenshots that illustrate and document the user perspective across different projects and activities. The ethnography was focused on two projects, Galaxy Zoo (as the oldest project) and Penguin Watch (which was the newest project at the start of the ethnography) and conducted online on the Zooniverse platform. The excerpts drawn on in this study are exemplar observations drawn out to illustrate important findings and, therefore, support arguments presented in the theoretical exploration and analysis of this case.
The combination of two different qualitative approaches provides valuable insights into the Zooniverse from different perspectives. This method is experimental, seeking to uncover the user perspective on a platform that entails a “black box” type experience for outsiders. Therefore, the study sought to examine the hidden practices and experiences of crowdsourcing citizen science. It focuses on the experiences that were most reflective of these.
Background to the Zooniverse
This study focuses on the Zooniverse, as it is currently the world’s leading crowdsourcing citizen science website. It involves a large number of users and a range of projects from different disciplines such as astronomy, zoology and history (Banks, 2013). Luczak-Rosch et al. (2014) explain that Zooniverse users may contribute to multiple projects, and the crossover between these projects can be significant.
The Zooniverse platform was initially established by a small group of academics for a single astronomy project, but has now grown into a multi-project platform that has engaged over 1.3 million users so far. The Zooniverse projects are united by two distinct aims and objectives, the first of which being to solve specific scientific problems by serving as a reduction tool for data (and therefore professional, paid labour) intensive science and transforming raw user inputs into a “data product” for use in academic research (Fortson et al., 2012). The second is a broader intention to engage in education and outreach activities, whether directly or through the ZooTeach resources. Perhaps the most famous project is Galaxy Zoo, but the platform now involves projects across disciplines as diverse as archaeology and seafloor biology. The rapid growth of the platform has brought with it complications: increasing numbers of scientists and institutions, a larger team of software developers, multiple funding grants, and a larger and more diverse user base. The basic process remains the same across the projects: The platform provides relatively simple categorisation tasks that are completed by a large crowd of users. The analysis provided by users is then used by professional scientists for further research.
The Zooniverse began as a post-doctoral research project at Oxford University and it retains certain characteristics from these origins. A small group of astronomers were searching for a better solution to classifying galaxies, a relatively simple task that needed to be repeated a very large number of times. After developing a system to do it themselves—and realising how long this would take with only their input—they decided to develop a way to outsource this work to a crowd. As a leading person from the Zooniverse explained, “The original organisation model was a loose collaboration. That still exists.” There is now the added pressure to keep the Zooniverse running and viable as a platform, with many of those interviewed directly responsible for, and indeed dependent upon, the success of projects.
The role of the volunteer users on the platform has been crucial to the scientific output. It has been estimated that:
… the perfect graduate student—essentially, a human computer that never eats, sleeps or takes a bathroom break—spending 24 hours a day, seven days a week analyzing Galaxy Zoo’s data would have needed three to five years to match what Galaxy Zoo’s volunteers collectively accomplished in the project’s first six months. (Pinkowski, 2010)
While this notion of a “perfect graduate student” is implausible, it could be possible to envisage a single graduate student working eight hours a day taking at least 15 years (or more if they took lunch breaks, weekends and holidays off!) to complete what the Galaxy Zoo volunteers took only half a year to do. However, this analogy of the “human computer” is also useful in another way. Many of the categorisation tasks cannot currently be analysed by computer algorithms, yet with the recent innovations in machine learning, this may not be an obstacle for that much longer. In fact, the ongoing contributions of human users provides a testing ground for new methods to automate the classification of images. Nevertheless, the Zooniverse has made the transition from a small group of researchers to becoming one of the most important citizen science crowdsourcing platforms.
The move from an ad hoc project based out of the astronomy department at Oxford University to one of the leading citizen science platforms was far from straightforward. For example, one of the back-end software developers pointed out that, “Originally, the first GalaxyZoo Project was really not designed, it was just, let’s just try this thing and it worked surprisingly well.” From this starting point, increasing layers of complexity became added as the single project grew into a platform: more scientists and research institutions became involved, software developers were hired, funding was awarded from different organisations, and the team became split between Oxford and Chicago. This created three factors that have shaped the role of paid labour on the platform: the effects of the research culture and the origins of the university, then the culture of software developers and the influence of open source ethics, and followed by a discussion of the impact of funding on an organisation that is neither based in the public, nor private sectors.
The organisational culture of the Zooniverse is primarily shaped by the institutional experiences and pressures of the university sector. For many of the staff—especially those who founded the Zooniverse—the majority of their experience of work and organisation comes from this context. It is difficult to speak of a general culture in a university, partly because they are sites of ongoing transformations (Collini, 2012), but also because the university sector itself is heterogeneous. It should, therefore, be noted that Oxford University is unique/exceptional? in a number of ways. Firstly, it epitomises an elitist, research intensive institution. This means that there are comparatively high levels of funding and autonomy, along with a ubiquitous brand identity. This provides the freedom to innovate in various ways, backed up by the legitimacy of the institution. In this context, establishment of the first Zooniverse project and move to a platform was facilitated by the institution. In many universities, this kind of experimentation might be difficult to justify, but at Oxford University, as a leading person in the Zooniverse explained, “In astronomy at least, if you’re a post-doc in particular, you have quite a lot of freedom to work on whatever.” So this group of post-doctoral students—with the help of some volunteer web developers—started work on the platform. They continued to argue, “One of the key reasons GalaxyZoo happened was because we didn’t have to ask for permission or it didn’t cost anything. Well, it cost ten quid because it was the domain name.” Therefore, the organisation began with a very light structure and has had to develop along with the demands of new projects, groups of scientists and an expanding crowd of users.
The culture of astronomers has more in common with computer programmers than social scientists, easing the collaboration between the two. Therefore, while they constitute separate sets of cultural influences, there are areas of overlap. For example, a website developer on the Zooniverse explained that, “We’ve all got roughly the same background … if I was looking for people to write code I would not look at astrophysicists because astronomers in general, although they do write their own code, they tend to write terrible code (Laughs).” So despite these differences, astronomers and computer programmers nevertheless both have common references points in quantitative data and coding that shapes their work. The use of specific programmes to manage the labour process shapes the flow of work and the way that people interact. For example, one of the front-end software developers explained that people collaborate between Chicago and Oxford: “Working on the same code base but we interact with this website called GitHub where you can propose code changes and review them, edit them, and talk to each other and stuff like that. So the bulk of the interaction happens online through GitHub or [other] websites.” These digitally-enabled labour processes are accompanied by work practices and associated cultural features. One notable dimension is the proliferation of an Open Source Software (OSS) ethos. As the technical projects manager discussed, “I always saw the Zooniverse as a kind of brand of coding rebels … I just believe strongly in not letting money dictate what you do.” Similarly, another web developer described the importance of Citizen Science:
I think it’s important because it makes science accessible again. I think for a lot of people science has become too complicated and academic and they’re kind of almost snobby, “Oh you’re not smart enough, you can’t help us,” and this just turns that on its head. You absolutely can help us and it doesn’t matter what level you are, you could also learn more about science.
This democratic and collaborative element is shared between Citizen Science and the OSS movement. However, like with OSS, there is a question of whether the reality matches up to the intentions of those involved, particularly when external organisations become involved.
The management of the Zooniverse can be initially be understood as resulting from a combination of scientists and computer programmers that shaped the environment from which the Zooniverse emerged. It is, therefore, influenced by the institutional and cultural backgrounds of these respective groups but also by the associated pressures. In particular, the pressures of funding have a significant impact on the Zooniverse, not only in how it is organised, but also limiting what it is able to do. The Zooniverse developed with a range of funding grants from different sources. These initially began as academic grants associated with the scientific projects, but they have grown in scope over time. For example, there have been a number of grants relating to public engagement or platform development. As a leading member of the Zooniverse explained, “Despite my ranting about grants and how that constrains what we do, what you’re really trying to do is spend the first—if we get a grant in, you want at least the second half of that to be beyond what you’ve written in the grant because you need to get the next grant.” So the constraints of grant funding mean that even after securing one source of funding, the process of securing the next one begins quite quickly. “The goal,” they continued, “can’t be to get to the last day of the grant, to have spent all the money on exactly what you said and delivered only what you said because then you’ve got nowhere to go.” Therefore, the pressure of funding is constant and the work of securing funding has to be built into the activities of the Zooniverse.
The classifying process
The tensions between paid and unpaid labour stem from the contradictions involved in citizen science. There is a question of who “owns” the data and outputs, “how” can the data be classified, “what” is the classification experience like and “where” can further questions and alternative voices (including dissent) be heard. These concerns shape the process of scientific discovery, particularly as the analysis is pulled from a collaborative space. The Zooniverse itself did not begin as a platform for citizen science; rather, it started as a novel way to address a particular problem and has grown from there. While it has grown into a popular platform for citizen science, at its core the primary activity remains a transaction. The professional scientists bring their large datasets for categorisation, hoping for an end-product that can be used to further their own research agenda. With this comes the pressures for academic performance, mainly in the form of publication outputs within limited timeframes and submitted in refereed journals (a process that, itself, includes various forms of hidden labour in the editorial and reviewing process). Meaningful collaboration with the crowd can, therefore, become secondary in a process orientated towards specific outputs. The potential for new ideas or serendipitous discoveries to be found within the dataset by the crowd transcends the original research intentions. In practice, this results in different levels of engagement between scientists and users: while some projects regularly communicate and involve users, others remain strictly transactional.
User engagement takes two main forms on the Zooniverse. The first is the categorisation activity, which is completed directly from a project-specific web browser page. There are differences between projects, but they share a relatively common format. A picture (or in some cases a video or sound clip) is shown with different options for the user to select. In some projects, this is a very quick process; for example, Sunspotter only takes a few seconds as users are asked to select the more complex picture out of two. There are also projects that take significantly longer, such as for example the transcription tasks that users undertake with AnnoTate or Ancient Lives. The second engagement is on the Talk forum, which provides users with opportunity to communicate with each other about the images on individual projects.
The crowd of users are by nature a heterogeneous formation. This means that users come from a range of backgrounds, experiences, expertise and motivation. This diverse range of backgrounds is also one of the potential strengths of citizen science. Citizen science, therefore, has the potential to draw on a wide range of insights across disciplines. This dimension of serendipitous discovery is not deliberately organised, instead remaining as an emergent possibility. The user community coalesces on Talk, with threads of discussions for users to communicate with each other. However, these are mainly limited to single projects, rather than encouraging a wider community and potential cross-fertilisation. The two parts, the individual micro-work of classifications and the limited forum for discussion, do not necessarily form an ideal framework for citizen science. Most notably the issue of democratisation remains complex. While users are encouraged to classify and talk about those classifications, this in itself does not represent a democratising of science; rather, it is a “minimalist” form of participation (Carpentier, 2011). It is certainly true that most people would not have had access to images of distant galaxies or rare animals before, but random access to images from a database is not the same as being handed elements of democratic control—for example, having meaningful input on what will be explored—or even having users voices heard in the scientific process.
The user perspective
A wide range of explanations have been given for why users may or may not want to participate in online citizen science projects. For example, Raddick et al. (2010) explain that one of the main motivations for participating in a specific citizen science platform is the opportunity to learn about science through a hands-on experience. Others, such as Mathieson (1991), highlight the issues of accessibility in regards to user participation, taking into account knowledge, access to the technological platforms such as computers and the Internet. The citizen science aspect complicates the user experience on the Zooniverse. The users are engaging in a scientific project and, therefore, classifying scientific data, but this does not require the ability to relate their input to what is happening to the overall scientific data analysis. The process of categorising relatively abstract images can easily become disconnected from the overall research project and any potential findings. This was borne out in the user interviews. Only a third of the interviewees were aware of any research outputs from the projects they had participated in, while only three said they had any kind of relationship with the Zooniverse team (with an additional one explaining they had a relationship with the scientists on one project). Only three of the interviewees participated in the Talk forum, and this correlated with those who said they had some sort of relationship with the Zooniverse team. It is perhaps unsurprising that the users who participated in the forum were able to build these relationships; however, it is surprising how many people had not used Talk. In general, users explained this lack of interaction in two ways. The first were those users simply not interested in this aspect, seeing their participation as classifying and having no need to talk to others about it. The second were users that had technical difficulties with the browser-based platform, with the user either unable to connect reliably or their previous negative experiences stopping them from trying again.
The users that were interviewed expressed complex motivations for involvement in the Zooniverse. These ranged from satisfaction, research, for fun, to contribute to science, for their own interest or for teaching purposes. The interviewees can be broadly divided into two groups. The first had some sort of scientific education and were currently employed, retired from or wanted to have worked in science. This meant the Zooniverse provided a valued opportunity to participate (albeit in various ways) in science. A number of these interviewees explained that they also engaged in other kinds of citizen science projects, particularly those involving the crowdsourcing of data capture. In a number of cases, health reasons limited potential participation, something that the online platform allowed users to overcome. Another important subset of these were science teachers. For them, the Zooniverse provided the opportunity to use a live scientific project in the classroom. This involved getting pupils to complete classifications in class and even setting participation as homework. The platform was, therefore, seen as an exciting way to teach about the importance of science in general and the scientific method specifically. With this group of users, the notion of citizen science was very important. This was expressed in a broad understanding of scientific progress as being tied up with social progress more generally, in a kind of enlightenment logic. Within this, there was a discussion of the importance of democratisation; however, it did not extend to what this could mean in practice in the Zooniverse.
The second group of Zooniverse users identified from the interviewees were those who used the platform for fun. These users explained that they were motivated primarily by enjoyment. The low barriers to entry meant that classifications could be done at short notice or even while doing other activities like watching television. These motivations were regularly discussed by users on the Snapshot Serengeti project. There was little understanding of what the classifications would be used for; instead, users enthusiastically discussed their enjoyment of seeing and collecting pictures of different animals. What united the two different groups of users was a common agreement that aspects of gamification—the application of game-like elements—should be adopted by the Zooniverse. Only two of the interviewees were against gamification, with three wanting some aspects, and the remainder in favour. For some of the interviewees gamification would be another way to involve users, expose more people to “science”, and complete the categorisation tasks more quickly. However, the more reticent interviewees expressed concerns about undermining the seriousness of science and the output. Overwhelmingly, the interviewees spoke positively about the potential benefits of measuring user’s contributions and rewarding or motivating this in various ways. This stands in contrast to the Zooniverse, which has decided not to implement such techniques.
Our study revealed a complex and constantly shifting relationship to the long-term involvement in scientific data classification via a crowdsourced citizen science platform. The ethnography explores what it is like to categorise data and reveals that the user starts with enthusiasm and enjoyment in the beginning of the categorising process. For example:
My first impression of this project is that it is excellent. I think that it is well thought out and potentially addictive. I love the topic, and although I’m not completely sure why we’re counting penguins I like it. I’m actually genuinely pleased to be coming back and having a go tomorrow.
However, as the auto-ethnography continued, the participant found it difficult to get a sense of their value to classification process. While participating in the Penguin Watch project, there were two metrics available on the homepage: “640354 Images classified” and “7571 Volunteers participating”. On returning the next day, she:
… compared these numbers to the stats yesterday and I can see that over 10,000 images have been classified since yesterday. This is staggering and makes my contributions seem pitifully tiny. On one hand, this makes me want to get started but, on the other, it also makes me think why bother, does my small/meagre contribution really make much of a difference?
The classification process involves examining a photograph and tagging all of the penguins present in the picture. In one instance, she counted “62 penguins” and felt “proud … to spot so many tiny well-concealed penguins.” However, she continued to ask rhetorically: “I wonder if I have found them all.” Later, the ethnographer started to notice different terrain in the pictures of penguins, suggesting perhaps quite different locations. Yet, “there is no information about where these photos were taken or any background as to when/where or why, at least not immediately.” Thus, a basic question about the details of the scientific project behind the photographs is difficult to answer. Instead, she explains, “I suspect I will have to look around to find these answers myself and this information is not provided as it interferes with the classification process.”
Over time, the excitement and enthusiasm wanes and the user battles with more negative experiences associated with the categorising of the data. It is at this time when a positive intervention from the platform could be particularly effective. In lieu of this, users must draw on their own personal motivations to keep going. For example, the ethnographer describes a series of challenging times over her year on the platform:
I classify for a while but there is nothing really compelling me today. Perhaps I am indifferent to their cute charms today for some reason, or maybe because I haven’t seen any especially good pictures today. But I admit I am struggling to hold my interest and my attention wanes to other websites and parts of the internet. I persevere for a little while longer but only because I want to write more in the diary and not necessarily because I want to continue classifying … The first image I come to is obviously a faulty camera. This is seriously off-putting, which does not bode well when I reflect on how my contributions really don’t make much difference … I’m presented with pretty much the same image as before and I must admit that my attention is well and truly waning now. I attempt to mark the minimum of 30 but it’s difficult because they are so small and close together.
It is clear that at these moments that interventions could be made to motivate the user to continue with classifications. A particularly successful approach in many other peer-production platforms (those that are voluntary or not-for-profit) is for motivations to be drawn from community support and fellow participants. These kinds of platforms do not use monetary incentives, requiring instead other reasons to be involved. The infrastructural limitations of the Zooniverse hamper the creation of a genuine and sustainable user community. While users can discuss images on Talk, these are limited to particular projects and focused around research specific topics. Again, from the user perspective, these interactions can be fraught with contradictions. Talk can be a positive experience where the highs of the classifying activities can be shared. Conversely it can also be a negative place, with moderators ignoring or rebuffing the user for moving away from the set categorising task. Similarly, greater involvement in the scientific projects could improve motivation over all. Instead of feeling like a small cog in a broader research machine, democratic engagement could give users a stake in the completion of the project.
The Zooniverse has undergone a remarkable transition from a group of post-doctoral students to becoming one of the most successful online citizen science platforms. Collectively, the users have contributed to 50 different projects, providing a significant labour input that would have been difficult to achieve by alternative means. However, the relationships involved beyond the front page of the website are difficult to examine, appearing like a “black box” (Scholz, 2015) that obscures the relationships between paid and unpaid labour, professional experts and citizen scientists. This paper has been able to reveal a range of processes taking place on the platform, focusing on the tensions and contradictions that citizen science entails in practice.
The first two research questions relating to the relationships between paid labour and the inclusion of the crowd in the processes of citizen science have been addressed throughout this article. The different ways that paid and unpaid labour are included on the platform have been detailed. The core process of user categorisation has similarities with micro-work platforms, with larger projects broken down into small parts that can be easily completed by a single user. There is no need for collective interaction to participate, yet the platform provides are opportunities to do this. The hybridity in the crowdsourcing output of the Zooniverse has the potential to combine the strengths of micro-work with the broadcast search approach. Despite a number of examples of serendipitous discovery on the platform, this form of participation remains an emergent possibility. It is not deliberately organised, and like the possibilities of genuine co-creation or peer-production, is at the discretion of individual project teams on the platform.
The motivation of users has been shown to be a combination of scientific engagement and hedonistic enjoyment. While the motivation of users does not change the basic interaction on the platform (whatever the reason for participating the data is still being categorised), the former raises a number of important questions about the nature of citizen science. The Zooniverse did not begin as a citizen science project, rather it began as a search for a solution to a large data problem. Although it has grown to include an education and outreach dimension, this transactional relationship remains at the core of the platform. Returning to Lewenstein’s (2004: 1) distinction between the basic involvement of non-professionals or the more substantive democratic inclusion, it is clear that this tension is unresolved on the platform. There is still the potential to find new ways to engage the crowd in the democratic processes of citizen science. The understanding of the role of paid labour begins from the competitive and output driven scientific context that the Zooniverse operates within. This means that projects need to have secured some kind of funding and be able to demonstrate quantifiable outputs. The heterogenous and diverse crowd could potentially contribute to the science projects in a variety of ways, but this entails a risk for the professional scientists involved. The need for reliable and large-scale data shapes the interactions that scientists have with the crowd, seeking to gather a finished data product that can be used in research.
The final research question explored the tensions that emerge in practice when organising the crowdsourcing of citizen science. At its core is the radical demand of democratisation that is not currently being fulfilled. The notion that those outside of academia could contribute to and decide on the direction of science (which it should be remembered is, in general, publicly funded) requires those within academia to relinquish at least some element of control. This has the potential to blur the boundaries between professional and amateur or even work and play. If considered in the context of the Open Source Software movement discussed by many of the paid developers on the Zooniverse, there is the potential to envisage a process of scientific peer-production that is quite different to the caricature of the ivory tower. However, this requires a leap of political faith, one that is difficult to make within the confines of concurrent funding bids. Therefore, while the Zooniverse represents an important step forward in how data rich scientific research can be conducted, the question of how to fulfil the goals of citizen science as more than a motivational device remain open.
Amazon (2016) Amazon Mechanical Turk. Available at: https://www.mturk.com/mturk/welcome.
Baym, N. (2006) “Interpersonal life online”, in L. Lievrouw and S. Livingstone (eds.) The Handbook of New Media: Updated, Student Edition. London: Sage, Pp. 35-54.
Bonney, R., C. Cooper, J. Dickinson, S. Kelley, T. Phillips, K. Rosenberg and J. Shirk (2009) “Citizen Science: A Developing Tool for Expanding Science Knowledge and Scientific Literacy”. Bioscience 59(11): 977 – 984.
Boutang, Y. M. (2011) Cognitive Capitalism. Cambridge: Polity Press.
Bruns, A. (2008) Blogs, Wikipedia, Second Life, and Beyond: From Production to Produsage. New York: Peter Lang.
Carpentier, N. (2011) “The concept of participation. If they have access and interact, do they really participate?” Communication Management Quarterly 21(6): 13-36.
Causer, T. and V. Wallace (2012) “Building A Volunteer Community: Results and Findings from Transcribe Bentham”. Digital Humanities Quarterly 6(2).
Chamberlain, J., U. Kruschwitz and M. Poesio (2013) “Methods for Engaging and Evaluating Users of Human Computation Systems”, in P. Michelucci (ed.) Handbook of Human Computation. New York: Springer, Pp. 679-694.
Collini, S. (2012) What are Universities for? London: Penguin.
Dai, P. and D. S. Weld (2010) “Decision-theoretic control of crowd-sourced workflows”. Proceedings of the 24th AAAI Conference on Artificial Intelligence, Menlo Park, CA: AAAI Press.
Dunn, S. and M. Hedges (2013) “Crowd-sourcing as a Component of Humanities Research Infrastructures”. International Journal of Humanities and Arts Computing 7(1-2): 147-169.
Fortson, L., K. Masters, R. Nichol, K. D. Borne, E. M. Edmondson, C. Lintott, J. Raddick, K. Schawinski and J. Wallin (2012) “Galaxy Zoo: Morphological classification and citizen science”, in Michael J. Way, Jeffrey D. Scargle, Kamal M. Ali, Ashok N. Srivastava (eds.) Advances in Machine Learning and Data Mining for Astronomy. CRC Press / Taylor & Francis Group, Pp. 213-236.
Franzoni, C. and H. Sauermann (2014) “Crowd science: The organization of scientific research in open collaborative projects”. Research Policy 43(1): 1-20.
Fuller, S. (2008) “Science democratized = expertise decommissioned”, in N. Stehr (ed.) Knowledge & Democracy. London: Transaction Publishers.
Guston, D. H. (2004) “Forget Politicizing Science. Let’s Democratize Science!” Issues in Science and Technology 21(1): 25-28.
Hyde, A., M. Linksvayer, kararinka, M. Mandiberg, M. Peirano, S. Tarka, A. Taylor, A. Toner and M. Zer-Aviv (2012) ‘What is Collaboration Anyway?’, in M. Mandiberg (eds.) The Social Media Reader. New York: New York University Press, Pp. 53-67.
Irani, L. (2015) “The cultural work of microwork”. New Media & Society 17(5): 720-739.
Jennett, C. and A. L. Cox (2014) “Eight Guidelines for Designing Virtual Citizen Science Projects”. Proceedings of HCOMP 2014 workshop Citizen + X:Volunteer-based Crowdsourcing in Science, Public Health and Government, Palo Alto, CA: AAAI Press.
Jeppesen, L. B. and K. R. Lakhani (2010) “Marginality and problemsolving effectiveness in broadcast search”. Organization Science 21(5): 1016-1033.
Kaganer, E., E. Carmel, R. Hirscheim and T. Olsen (2013) “Managing the Human Cloud”. MIT Sloan Management Review 54(2, winter).
Kleinman, D. L. (ed.) (2000) Science, technology, and democracy. New York: University of New York Press.
Kücklich, J. (2005) “Precarious playbor: Modders and the digital game industry”. The Fibreculture Journal 5. Available at: http://five.fibreculturejournal.org/.
Latour, B. (2005) Reassembling the Social: An Introduction to Actor-Network-Theory. Oxford: Oxford University Press.
Lazzarato, M. (1996) “Immaterial Labour”, in P. Virno and M. Hardt (eds.) Radical Thought in Italy. Minneapolis, MN: University of Minnesota Press.
Lidskog, R. (2008) “Scientised citizens and democratised science. Re‐assessing the expert‐lay divide”. Journal of Risk Research 11(1-2): 69-86.
Lintott, C., K. Schawinkim, W. Keel, H. Van Arkel, N. Bennert, E. Edmondson, D. Thomas, D. J. B. Smith, P. D. Herbert, M. J. Jarvis, S. Virani, D. Andreescu, S. P. Bamford, K. Land, P. Murray, R. C. Nichol, M. J. Raddick, A. Slosar, A. Szalay, and J. Vandenberg (2009) “Galaxy zoo: ‘Hanny’s Voorwerp’: A Quasar Light Echo?” MNRAS 399(1): 129-140.
Lewenstein, B. V. (2004) “What does citizen science accomplish?” Working Paper, Cornell University. Available at: https://ecommons.cornell.edu/handle/1813/37362 (accessed on 29 November 2015).
Luczak-Rosch, M., R. Tinati, E. Simperl, M. Van Kleep, N. Shadbolt and R. Simpson (2014) “Why Won’t Aliens Talk to Us? Content and Community Dynamics in Online Citizen Science”. Paper presented at the Eighth International AAAI Conference on Weblogs and Social Media, Ann Arbor, US, 1-4 June 2014.
Mathieson, K. (1991) “Predicting user intentions: Comparing the technology acceptance model with the theory of planned behavior”. Information Systems Research 2(3): 173-191.
Marx, K. (1976 ) Capital: A Critique of Political Economy: Volume 1. London: Penguin Books.
Pasquale, F. (2015) The Black Box Society: The Secret Algorithms That Control Money and Information. Cambridge, MA: Harvard University Press.
Pinkowski, J. (2010) “How to Classify a Million Galaxies in Three Weeks”. Time, March 28. Available at: http://content.time.com/time/health/article/0,8599,1975296,00.html (accessed on 2 November 15).
Poetz, M. K. and R. Prügl (2010) “Crossing Domain-Specific Boundaries in Search of Innovation: Exploring the Potential of Pyramiding”. Journal of Product Innovation Management 27: 897-914.
Raddick, M. J., G. Bracey, K. Carney, G. Gyuk, K. Borne, J, Wallin and S. Jacoby (2009) Citizen science: Status and research directions for the coming decade. AGB Stars and Related Phenomenastro 2010: The Astronomy and Astrophysics Decadal Survey, 46P.
Riesch, H. and C. Potter (2014) “Citizen science as seen by scientists: Methodological, epistemological and ethical dimensions”. Public Understanding of Science 23(1): 107-120.
Scholz, T. (2015) “Think Outside the Boss”. Public Seminar, Available at: http://www.publicseminar.org/2015/04/think-outside-the-boss/ (accessed on 10 November 2015).
Shahri, A., M. Hosseini, K. Phalp, J. Taylor and R. Ali (2014) “Towards a Code of Ethics for Gamification at Enterprise”. Proceedings of 7th IFIP WG 8.1 Working Conference, PoEM 2014, Manchester, UK, 12-13 November 2014, Pp. 235-245.
Shirky, C. (2010) Cognitive Surplus: Creativity and Generosity in a Connected Age. New York: Penguin Press.
Simpson, R., K. Page and D. De Roure (2014) “Zooniverse: Observing the World’s Largest Citizen Science Platform”. Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion. Geneva, Switzerland, Pp. 1049-1054.
Stinson, L. (2014) “How GE Plans to Act Like a Startup and Crowdsource Breakthrough Ideas”. Wired, April 11. Available at: http://www.wired.com/2014/04/how-ge-plans-to-act-like-a-startup-and-crowdsource-great-ideas/ (accessed on 23 May 2015).
Wald, D. M., J. Longo and A. R. Dobell (2015) “Design Principles for Engaging and Retaining Virtual Citizen Scientists”. Conservation Biology 30(3): 562-570.
Wiggins, A. and K. Crowston (2011) “From conservation to crowdsourcing: A typology of citizen science”. In Proceedings of the 44th Annual Hawaii International Conference on System Sciences. Koloa, HI: IEEE, Pp. 1-10.
Willett, K. W., C. J. Lintott, S. P. Bamford, K. L. Masters, B. D. Simmons, K. R. V. Casteels, E. M. Edmondson, L. F. Forston, S. Kaviraj, W. C. Keel, T. Melvin, R. C. Nichol, M. J. Raddick, K. Schawinski, R. J. Simpson, R. A. Skibba, A. M. Smith, and D. Thomas (2013) “Galaxy Zoo 2: Detailed Morphological Classifications for 304122 Galaxies from the Sloan Digital Sky Survey”. MNRAS 435: 2835-2860.
Zooniverse (2015) “Galaxy Zoo Bar Lengths”. Available at: https://www.zooniverse.org/projects/vrooje/galaxy-zoo-bar-lengths/ (accessed on 10 November 2015.