Posted by: Michael Atkisson | June 10, 2011

More Is Different: A Big Data Presage from 1972 Physics

Reductionism:

In 1972, Philip Anderson made the claim that More Is Different. I can’t say that I understand all the implications that he made in his example of broken symmetry in the field of many-body physics, but the arguments Anderson made against reductivism have relevance for learning analytics and social science.
Reductivism, as Anderson described:
“The workings of our minds and bodies, and of all the animate or inanimate matter of which we have any detailed knowledge, are assumed to be controlled by the same set of fundamental laws, which except for under certain extreme conditions we feel we know pretty well” (1972, p. 393).

The flaw he spotted:

“The main fallacy in this kind of thinking is that the reductionist hypothesis does not by any means imply a “constructionist” one: The ability to reduce everything to simple fundamental laws does not imply the ability to start from those laws and reconstruct the universe…The behavior of large and complex aggregates of elementary particles, it turns out, is not to be understood in terms of a simple extrapolation of the properties of a few particles. Instead, at each level of complexity entirely new properties appear, and the understanding of the new behaviors requires research, which I think is as fundamental in its nature as any other… At each stage entirely new laws, concepts, and generalizations are necessary, requiring inspiration and creativity to just as great a degree as in the previous one” (1972, p. 393).
The example he gave referred to how the particles that made up the nucleus, though their behavior could be measured and predicted individually, did not explain how they interacted as system:

“But it needed no new knowledge of fundamental laws and would have been extremely difficult to derive synthetically from those laws; it was simply an inspiration, based, to be sure, on everyday intuition, which suddenly fitted everything together…When we see such a spectrum, even not so separated, and somewhat imperfect, we recognize that the nucleus is, after all, not macroscopic; it is merely approaching macroscopic behavior. Starting with the fundamental laws and a computer, we would have to do two impossible things -solve a problem with infinitely many bodies, and then apply the result to a finite system-before we synthesized this behavior” (1972, pp. 394-395).

In summary of Phillip Anderson’s (1972) remarks, in physics, description and prediction of behavior of the most fundamental components within a system are not modular like Legos (once the pieces are identified and how they interact in this situation doesn’t give an indication as to how they will interact as a whole in other situations), making it impossible to use the fundamental rules and conclusions that govern the most basic form to describe and predict how the basic forms work together or work together in other contexts without testing all possible component combinations in all possible configurations. As the analysis moves from the fundamental level to other, more complex levels of hierarchy, new ways to observe and predict behavior are required at each level, particularly seen that the exhaustive approach most likely will not be used.

Application to big data and learning analytics:

So, what does this mean for big data, social science, learning analytics? Google has developed formidable methodologies for marketing, translation and other data driven tasks. It is at the forefront of one of the most lucrative markets. According to a recent Economist article, “Data are becoming the new raw material of business: an economic input almost on a par with capital and labor. “Every day I wake up and ask, ‘how can I flow data better, manage data better, analyze data better?” says Rollin Ford, the CIO of Wal-Mart…Data exhaust”—the trail of clicks that internet users leave behind from which value can be extracted—is becoming a mainstay of the internet economy” (“Data, data everywhere,” 2010).
But just because much can be known about what certain people or groups of people may prefer of be likely to buy does not mean that those same rules and principles apply to anticipating other types of data driven decisions for those same people. When analytics methodologies, for example get applied to learning and behavioral data in learning environment, often much more must be known about “why” to interpret data than it would be required to know whether a certain ad campaign results in more purchases than another.

For example, in Ian Ayes’ (author of Super Crunchers) presentation to Google (2007), he was touting the necessity and ease of randomized trial testing of marketing campaigns in virtual environments, “When you do online work, you should always ask yourself, is there a way I can do a randomized test on this to really see whether this initiative works.”  He went to describe a test between two different adds for Monster.com that were to get employers to post job openings, “I don’t actually know why, but in some sense you don’t deeply need to know why. It’s so cheep and so quick, its speed and the scale of the impact is very quick when you do these randomized tests…when the cost of testing randomized test online goes to zero, you can start testing crazier things” (Ayres, 2007).

The challenge is when confidence is high and assumptions are unchecked when applying strategies, theories, and technologies into a new area. Ayes went on to describe the data behind the success of direct instruction, a teaching method built around scripted educational lessons that focus on call response interaction with students. He said that randomized trials showed that direct instruction was much better at producing better test performance in students, particularly for at risk students. So, he said that the data shows that scripts do a better job at teaching.

Questions then arise like: Better than what? What is teaching? The bar may be raised for at risk students, overcoming the inadequate skills of poor teachers, but does it raise the bar for expert teachers? Average students? Advanced students? Will direct instruction cause a higher average in test achievement but actually reduce efficacy in other areas (e.g. disincentivize divergent and creative thinking)? How does direct instruction affect student skills longitudinally once the students have moved on other types of learning environments where they are required to be more self-starting? Without adequately addressing these and other questions about direct instruction it would be difficult to really know that it was “better.”

The important critical questions that Ayes seemed to miss in his endorsement of direction instruction is an example of the problem Anderson (1972) identified in bringing fundamental principles from one area or level (randomized trials from online marketing) to another (randomized trials in education) and expecting the analytics and reserach design principles from the former to be able to explain and predict in the latter. Though this is not a direct example in education of Anderson’s notion that the the rules of a sub-system do not indicate the rules of the super-system, but it does underscore the care that must be taken when crossbreeding fields of inquiry. Methodological eclecticism often ends in erroneous conclusions, if the differences of assumptions, rules, and objectives between the fields are not addressed.

Returning to the idea of increasing levels of complexity in data, and that rules at one level may not apply at another–More is different, this is particualrly evident in education resaarch where clustering patterns in data often provide opportunties for mullti-level analysis. Student-level data often have different predictors than school-level or district-level data. Furthermore, cross-level interactions can also be different from effects that appear just within levels. It would be impossible to predict the interactions at group-levels or between levels with out the rules and structure at the higher levels of hierarchy beyond the fundamental body of student-level data as a whole, which are enierlly different dependind on how they are grouped.

More is different, but it also requires different analysis and methodology.

References

Anderson, P. W. (1972). More Is Different. Science, 177(4047), 393 -396. doi:10.1126/science.177.4047.393

Ayres, I. (2007, November 8). YouTube – Authors@Google: Ian Ayres. Retrieved from http://www.youtube.com/watch?v=5Yml4H2sG4U&feature=player_embedded

Data, data everywhere. (2010, February 25). The Economist, (A special report on managing information). Retrieved from http://www.economist.com/node/15557443?story_id=15557443


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: