I recently read an article that posed an interesting question: does the wealth of data that we are now able to collect about every aspect of life make the development of explanatory models and theories moot? All theories and models are reductionist representations of complex real life that are meant to help us understand how things work. But, says the author of this article, in the age of the Petabyte, explanations of why we do the things we do are becoming less important. More important are the development of tools and methodologies that extract the patterns, show the statistical correlations, and find the relationships between all the data that we are collecting.
Typically, the development of a theory or an explanatory model follows the following steps:
- Data collection and analysis
- Development of a conceptual model that is able to capture the dominant processes of the phenomenon
- Calibration, through the use of parameters and coefficients, of the model to make it most closely match observed data
In the past, the first step of data collection and analysis was limited by technology and resources. In order to figure out what was happening on the watershed-scale for example, experiments scaled down hydrological processes to collect data in an ordered environment, isolate variables, and determine coefficients that are then scaled back up to describe real watersheds. Because real watersheds are much more complex than laboratory experiments, the laboratory-developed physical process-based models must be calibrated and adjusted to fit empirical observations. Coefficients and parameters are adjusted until the model is able to output hydrographs that very closely match observed hydrographs.
Faith in the scientific method of the controlled laboratory experiment however has actually also led to less preliminary data collection and analysis in the practice of hydrological modeling. The mathematical equations developed from scaled-down experiments allow modelers to bypass data collection in the preliminary phases, instead selecting and aggregating small-scale model components together based on an a priori understanding of experimental hydrology. The resulting models are often called “mechanistic”, “process-based” or “physical models”. They are built “bottom up”, starting with mathematical descriptions of physical processes. Empirical data comes into the picture only to calibrate the preset physical model. Ironically, however, scholars have also noted that each adjustment in the parameterization phase has the potential to conceptually alter the dominant processes, so that it actually becomes quite opaque how the model is being influenced by what physical processes.
The alternative is the “top-down” model. The top-down model instead emphasizes analysis of data at the scale being modeled, and then, based on existing knowledge, hypothesizes what model components should and need to be included. These are sometimes called “conceptual models”, perhaps because it allows the opportunity to identify and propose new dominant process relationships, that would be obscured in the bottom-up models in parameterization and calibration. One strand of top-down modelers takes on a systems perspective, which originated from the field of cybernetics. In the systems approach, there are no components and no building-blocks of the whole. Rather, variables have interactions and feedbacks, linkages and extricable patterns. To systems modelers, more important than the internal processes and experimentally-based physical theories is the solving the optimization problem of relating input to output. Instead of explanatory blocks, there are just connections and correlations.
The article I referred to at the beginning of this article takes on exactly this systems perspective. The “deluge of data” that can take on more and more variables and correlate them to infer more information about the world– what you would like want to watch next on Netflix, or what advertisements you would likely click on– is an optimization problem of statistical correlation. There is no advertising theory behind it, no theory about social change. The critique that theories are moot in that setting is completely warranted. When the author states that models are unneeded, he is speaking definitely of the bottom-up model and somewhat of the top-down model. He is however, fully embracing the idea of the systems model– the notion that input and output can be related and predictable, but that the explanatory internal processes can be left a black box with no consequence.
But for planners explanatory models are needed. We need to be able to test the effects of certain interventions in a quantifiable way, and therefore it is not enough to have a completely “black box” model that relates input to output. At the same time, increased data collection does change the game. It frees us from having to rely on scaling up the preset theoretical building blocks of the past. Instead, technological advances make it so that we can have real-time stream flow data, precipitation data, remotely-sensed spatial data on topography and impervious surfaces, demographic data and health data, usually very easily. The myriad interventions that we try must be justified, and it will take the ability to determine the effects of those interventions using data.