Here is an interesting article outlining an effort supported by the government of India to compile vital statistics for determining the leading causes of death in its country between 2001 and 2014 (New York Times, Door by Door, India Strives to Know More About Death, May 22, 2014). While the sentiment is admirable, the approach seems byzantine and early 20th century. Government registry officials visit the homes of deceased citizens using autopsy forms as a guide to verbal information gathering about circumstances surrounding cause of death for those individuals who died at home without medical supervision. Good old fashioned epidemiology. Of course collecting data this way provides huge opportunity for bias due to errors affecting its reliability, including those introduced by language barriers, human recall, and lack of respondents' willingness to disclose intimate family details to a stranger representing the national government. Not to mention time-consuming: results will not be final for at least five years. With great insight, the article's author notes a more modernistic objection to building data sets of this type, that might not have been considered even 20 years ago: "A great reservoir of information will be valuable to public health specialists, but will probably bring little to the families who were its subjects."
Contrast conventional vital statistics data collection methods with crowdsourcing approaches. The crowdsourcing concept is characterized by collecting volumes of data, in a systematic manner, at no or very low cost to any single entity. Millions of ordinary people, fueled by passion for the cause, each contributes information about an observation made during the course of their normal activities, at but a micro-cost to themselves. Individual contributions are turned into digital data and amassed into a single, gigantic data set so the statisticians can do their magic. In terms an economist would understand, using crowdsourcing to create a useful data set is like harnessing the antithesis of forces that lead to the tragedy of the commons.
An interesting article also published by the New York Times (Crowdsourcing, for the Birds, August 19, 2013) provides an illustration of two variations of crowdsourcing used by wildlife biologists to monitor bird populations. It provides an articulate description of both efforts, each of which harness the power of crowdsourcing techniques to produce data for bird epidemiology research: (1) the Breeding Bird Survey coordinated by the United States Geological Survey (USGS) and (2) the eBird project, a non-profit, global ornithological network of high tech data collectors. The app-driven eBird project seems more productive, efficient, and appealing, despite criticisms lobbed against it by more conventional Bird Survey proponents, including the usual generic validity and reliability snarks. But the Bird Survey is adept at organizing and training volunteer bird-watchers to count birds in a systematic, quasi-controlled manner, and transfer observations by hand to a paper survey form. It requires conventional data processing methods to key in and compile data points to produce a final data set. Still, final data set products from both projects are freely available to the public so that anyone can use the data for their own analysis. Open data! A highlight of the data gathering process for the eBird project is that it provides the ability to produce a real-time view of bird populations around the world using heat maps, a visually appealing, intuitively easy to understand depiction of bird species population density, location, and migration over time. Imagine being able to track human health epidemics, diseases, conditions, and health outcomes in response to various interventions using similar methods.
favorite quote from the article is, "Birds are notoriously hard to
count." This observation, as if birds are harder to count than any other species? Scientists who count birds must not have any cross-training in the epidemiology of human health. How is it that birds are any harder to count than babies?
Adults? People with heart disease? People living in poverty or substandard living conditions? My mind almost short-circuited with thoughts about how crowdsourcing might be applied to compiling data sets for the study of human health and the effects of public policy and other interventions on health outcomes. Crowdsourcing: an untapped resource for producing data that could answer crucial questions in medical care research, particularly useful for very large, heterogeneous populations that were heretofore impossible to study because of the impracticalities of producing requisite data for a nominal cost.