What are the social dimensions of VGI as “big data” in the context of disaster management?
Nov 24th, 2014 by j0p
By Yan Zhou
Start of thinking…
To understand the social dimension of VGI as “big data”, it’s necessary to consider the main objects of data: data producer, data self and data analyst. So that we can translate the corresponding social dimension of each object to more comprehensible respects: How the process of producing “big data” influences human’s social activities and on the other hand, how can social activities be reflected through “big data”?
But first of all, what is “big data”? According to the widely recognized definition, big data is characterized by 4Vs: volume(large amounts of data), variety(range of data types and sources), velocity(speed of data transfer) and value(the process of discovering huge hidden values from large datasets with various types and rapid generation) (Berman 2013; Gantz, Reinsel 2011). Volunteered geographic information (VGI), just as we have known from previous blogs, can be regarded as “big data”, which has become a popular word in disaster risk management (DRM).
Producing “big data” as social activity
For experienced mappers, Geographic information systems (GIS) plays an increasingly important role of producing VGI since the popularity of online mapping sites like OpenStreetMap. One example of DRM with online mapping is the “Missing Maps” project. It attracts international and local communities, NGOs and individuals to participate in collective mapping activity through organizing “mapping parties”. Volunteers from varies of occupations and countries get together and map for one purpose, which promote the communication and interaction of people with same interest. GIS is becoming part of mass media and functions as a tool for sharing and communicating knowledge (Sui, Goodchild 2011).
However, this chance for sharing and communicating knowledge is not equal for everyone. The level of using online GIS and participation are mostly determined by individual technical skills. Considering this problem, “Missing maps” project provides two approaches for mappers: experienced mappers are recommended using JOSM to edit data while beginners mapping with ID. In terms of the potentials and limits of democratization in such participation, Haklay(2013) has introduced a hierarchy of hacking, which demonstrated the ability to employ a given system. The first level “meaning hacking”, describes participants who make no change to system, like most of the volunteers in DRM who just find and collect information from web maps and photos. In the second level of “use hacking”, participants have technical skills to reuse some of the functionality of GIS and create new knowledge. What we can achieve in DRM at this level, for instance, is to map previous disaster experiences by tagging positions and adding information on a shared risk map. In the third level “sallow technical hacking”, systems are reconfigured to provide a new function. This requires people to obtain the ability of scripting and integrating information from multiple systems, like the Google Maps mashup at Scipionus.com as a response to the Hurricane Katrina disaster (Miller 2006). The final level is “deep technical hacking”, in which new systems will be created like OpenStreetMap, which allow the production of free geographic information that is accessible to anyone and for any purpose. Significant technical knowledge is required in this level, which are only available to a small group of technical elite. Yet, so far we have only considered people who have access to Internet, those who without internet are excluded and marginalized. In general, these people live in poor environment and thus are more danger in disaster. How to involve them as VGI producer, so that quicker and expresser assistance could be provided, remains to be a big challenge.
When comes to the willingness of sharing “big data”, a remarkable argument concerned with ways how VGI data are collected has arisen. We probably have the similar experience: we are required to accept providing location data compulsive in the contract of installing apps in smart phone. Otherwise we have to give up the installation. What is worse, the locational data will be continually recorded unless people turn off their phones, since even disable location services did not stop sending location data (Sui, Elwood 2013). Are these data still volunteered? Based on this fact, researchers have distinguished crowd sourced locational data into two types according to the origin of data. Data collected with the knowledge and explicit decision as well as purposes can be classified as volunteered geographic information (VGI), while those collected without knowledge, purpose and notice belong to contributed geographic information (CGI) (Sui, Elwood 2013). Besides privacy issue, the significance of differentiating VGI and CGI lies more in the possibility to recognize data quality and suitability of analysis regarding DRM.
Benefits and challenges of “big data”
We believe, at least have supposed that, the bigger the data is, the more we can benefit from them. The study of Hurricane Sandy may show us huge benefits from geo-referenced “big data” from tweeter(Shelton et al. 2014). From the whole scale of United States, the distribution of Sandy-related tweets shows a significant concentration in places that were most affected by the storm. And also the density of data from affected areas is corresponding with financial losses towards storm. Furthermore, the analysis within New York City has indicated the correlation between wealthy and tweeting activity. Places with significant damage but relatively little tweeting have captured our attention to the difference of social activity between central locations and periphery area. Under certain scales, different results and discoveries can be concluded from the same data. More social knowledge that encoded in disaster related data can be mined from varies of aspects.
However, can “big data” truly reflect social activity? Is that possible that too “big” volume of data from areas with high density of population can cause the problem of misleadingness and overreaction in DRM? It’s not always better to be “big”, regardless of validity and reliability (Metaxas et al. 2014) On the other hand, the so called “big data” seems to be “small” and not enough to cover all the disaster related people. The diversity and completeness of using disaster related “big data” is also a big challenge for researchers. Future study of DRM with geo-referenced data should be extended beyond the simple mapping and tagging, and also focus on comparing and combining with other data sources, connecting social and spatial processes such as social networks (Crampton et al. 2013) Moreover, more robust data analysis and synthesis methods for studying spatial dynamics in DRM are needed(Sui, Goodchild 2011), which also reminds us to think about additional knowledge and skills that are required in GIScience for DRM.
Berman, Jules (2013): Principles of Big Data. Preparing, Sharing, and Analyzing Complex Information: Elsevier.
Crampton, J. W.; Graham, M.; Poorthuis, A.; Shelton, T.; Stephens, M.; Wilson, M. W.; Zook, M. (2013): Beyond the geotag: situating ‘big data’ and leveraging the potential of the geoweb. In Cartography and Geographic Information Science 40 (2), pp. 130–139. DOI: 10.1080/15230406.2013.777137.
Gantz, J.; Reinsel, D. (2011): Extracting Value from Chaos, [International Data Corporation (IDC), Framingham, MA, 2011], www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf.
Haklay, Mordechai (2013): Neogeography and the delusion of democratisation. In Environ. Plann. A 45 (1), pp. 55–69. DOI: 10.1068/a45184.
Metaxas, P. T.; Mustafaraj, E.; Gayo-Avello, D. (2014): The Parable of Google Flu: Traps in Big Data Analysis. In Lazer, D., Kennedy, R., King, G., Vespignani, A. (Ed.): Big data. The parable of Google Flu: Traps in big data analysis. 343 volumes: AAAS, pp. 165–171.
Miller, Chrisopher C. (2006): A Beast in the Field: The Google Maps Mashup as GIS/2. In Cartographica: The International Journal for Geographic Information and Geovisualization 41 (3), pp. 187–199. Available online at 10.3138/J0L0-5301-2262-N779.
Shelton, T.; Poorthuis, A.; Graham, M.; Zook, M. (2014): Mapping the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of ‘big data’. In Geoforum 52, pp. 167–179. DOI: 10.1016/j.geoforum.2014.01.006.
Sui, D. Z.; Elwood, S. (Eds.) (2013): Crowdsourcing geographic knowledge. Volunteered Geographic Information (VGI) in theory and practice. Dordrecht, Heidelberg [u.a.]: Springer.
Sui, D.; Goodchild, M. (2011): The convergence of GIS and social media: challenges for GIScience. In International Journal of Geographical Information Science 25 (11), pp. 1737–1748. DOI: 10.1080/13658816.2011.604636.