How Does PISA Put the World at Risk (Part 3): Creating Illusory Models of Excellence
Few numbers command as much power as PISA scores, not even the number of Olympic medals or Nobel Prize winners in the world today. It is utterly shocking and embarrassing to see some otherwise rational and well-educated people (or at least they should be) in powerful positions believe that three test scores show the quality of their education systems, the effectiveness of their teachers, the ability of their students, and the future prosperity of their society.
PISA has become the star-maker in the education universe because of its bold claim to assess “the extent to which students near the end of compulsory education have acquired key knowledge and skills that are essential for full participation in modern societies” [1, p. 15]. Moreover, PISA claims to find educational stars by identifying which education systems better prepare their children for “full participation in modern societies”—as measured by PISA scores. The goal is for educational systems to learn from “the highest-performing and most rapidly improving school systems” [1, p. 15].
what if there are “serious problems” with the Pisa data? What if the statistical techniques used to compile it are “utterly wrong” and based on a “profound conceptual error”? Suppose the whole idea of being able to accurately rank such diverse education systems is “meaningless”, “madness”?
What if you learned that Pisa’s comparisons are not based on a common test, but on different students answering different questions? And what if switching these questions around leads to huge variations in the all-important Pisa rankings, with the U.K. finishing anywhere between 14th and 30th and Denmark between fifth and 37th? What if these rankings—that so many reputations and billions of pounds depend on, that have so much impact on students and teachers around the world—are in fact “useless”?
The article’s findings are troublesome to PISA and should be extremely unsettling to its faithful, say scholars who have independently reached the same conclusions. “As far as they are concerned, the emperor has no clothes,” writes Stewart. Citing numerous publications and conversations with scholars in Denmark, Northern Ireland, and the U.K., as well as with OECD, he points out major technical flaws with PISA’s composition of the tests, administering of the tests, and use of statistical techniques to generate country rankings.
Svend Kreiner, a professor of biomedical statistics at the University of Copenhagen, presents a more serious challenge to PISA. He questions the appropriateness of the model PISA uses to produce the country rankings. PISA uses the Rasch model, a widely used psychometric model named after the late Danish mathematician and statistician Georg Rasch. For this model to work properly, certain requirements must be met. But according to Kreiner, who studied under Rasch and has worked with his model for 40 years, PISA’s application does not meet those requirements. In an article published in the academic journal Psychometrika, Kreiner and co-author Karl Bang Christensen show that the Rasch model does not fit the reading literacy data of PISA, and thus the resulting country rankings are not robust. As a result, rankings of countries can vary a great deal over different subsets. Denmark, for example, can rank anywhere between 5th and 36th out of 56 countries . “That means that [PISA] comparisons between countries are meaningless,” Kreiner told TES.
Kreiner was not the first or only scholar to raise questions about PISA’s technical flaws. In 2007, a collection of nearly 20 researchers from multiple European countries presented their critical analyses in the book PISA According to PISA: Does PISA Keep What It Promises. Independent scholars from all over the world took apart PISA’s methodology, examining how it was designed; how it sampled, collected, and presented data; and what its outcomes were. Then the researchers compared the test’s real-life validity to its claims . Almost all of them “raise[d] serious doubts concerning the theoretical and methodological standards applied within PISA, and particularly to its most prominent by-products, its national league tables or analyses of school systems” [3, p. 10]. Among their conclusions:
- PISA is by design culturally biased and methodologically constrained to a degree which prohibits accurate representations of what actually is achieved in and by schools. Nor is there any proof that what it covers is a valid conceptualization of what every student should know.
- The product of most public value, the national league tables, are based on so many weak links that they should be abandoned right away. If only a few of the methodological issues raised in this volume are on target, the league tables depend on assumptions about the validity and reliability which are unattainable.
- The widely discussed by-products of PISA, such as the analyses of “good schools,” “good instruction” or differences between school systems … go far beyond what a cautious approach to these data allows for. They are more often than not speculative… [3, p. 12-13].
PISA did respond to some of the technical challenges. For example Andreas Schleicher, PISA’s face to the world, wrote a commentary responding to Kreiner’s charges in TES.
While the dispute over PISA’s technical flaws continues, some argue that even if PISA did everything right technically, it still could not possibly claim to be measuring the quality of entire education systems, let alone their students’ ability to live in the modern world.
“There are very few things you can summarise with a number and yet Pisa claims to be able to capture a country’s entire education system in just three of them,” wrote Dr. Hugh Morrison of Queen’s University Belfast in Northern Ireland. “It can’t be possible. It is madness”. Morrison, a mathematician, does not think the Rasch model should be used at all. He argues that “at the heart of Rasch, and other similar statistical models, lies a fundamental, insoluble mathematical error that renders Pisa rankings ‘valueless’ and means that the programme ‘will never work’” (Stewart, 2013). The problem of PISA, according to Morrison, violates a central principle of measurement drawn from physicist Niels Bohr’s work: the entity measured cannot be divorced from the measuring instrument. Morrison illustrates his point with an example. Suppose Einstein and a student both produced a perfect score on a test. “Surely to claim that the pupil has the same mathematical ability as Einstein is to communicate ambiguously?” The unambiguous communication would be “Einstein and the pupil have the same mathematical ability relative to this particular [test]… [M]athematical ability, indeed any ability, is not an intrinsic property of the individual; rather, it’s a joint property of the individual and the measuring instrument.” In a nutshell, Morrison’s point is that PISA scores students’ ability to complete tasks included in the test, not their general ability to understand and succeed.
Even if PISA did measure cognitive abilities as accurately as it claims to, those abilities only span three domains: math, reading, and science. PISA makes the assumption that these skills are universally valuable. In other words, as Svein Sjøberg, a professor of science education at Norway’s University of Oslo, points out, PISA “assumes that the challenges of tomorrow’s world are more or less identical for young people across countries and cultures” and thus promotes “kind of universal, presumably culture-free, curriculum as decided by the OECD and its experts”. This assumption is mistaken. “Although life in many countries do [sic] have some similar traits, one can hardly assume that the 15-year olds in e.g. Japan, Greece, Mexico and Norway are preparing for the same challenges and need identical life skills and competencies” [4, p. 7].
Even if cognitive skills in math, science, and reading were the most important skills in the universe, they would not—could not—be the only skills an educational system should cultivate. Skills and knowledge in other domains, such as “the humanities, social sciences, foreign languages, history, geography, physical education etc.” [4, p. 3] play a crucial role if citizens of any country are to live a fulfilling life. So do non-cognitive skills: social-emotional skills, curiosity, creativity, resilience, engagement, passion, and a host of other personality traits. In fact, many would argue that talents, skills, knowledge, and creativity in domains outside math, science, and reading are at least as important, perhaps even more important to live successfully in the new world. Henry Levin, a professor in economics of education at Teachers College, Columbia University reviews empirical evidence that shows the essential value of non-cognitive skills to work and life in his article More Than Just Test Scores .
Thus, even if PISA were methodologically sound, conceptually correct, and properly administered, it is still a test. And as such, the results can only indicate the extent to which 15 year olds provided responses deemed correct by the test makers. The only unambiguous conclusion of Shanghai’s PISA ranking would be that 15-year-old students in Shanghai provided the most responses judged to be correct in math, reading, and science in 2009 and 2012. Leaping from the highest PISA score in three subjects to the best education system in the world is too big a jump for any logical person—unless the purpose of education is defined as doing well on the PISA.
Since no one, not the Chinese and not even the PISA team (I hope), would define the purpose of education as achieving good PISA scores, making China the world’s model of educational excellence just because some of its 15-year-olds received the highest PISA scores is not only inaccurate but misleading. The excellence is a simple illusion created by the PISA league tables.
1. OECD, Ready to Learn: Students Engagement, Drive, and Self-beliefs. 2013, OECD: Paris.
2. Kreiner, S. and K.B. Christensen, Analyses of Model Fit and Robustness. A New Look at the PISA Scaling Model Underlying Ranking of Countries According to Reading Literacy. Psychometrika, 2013. June.
3. Hopmann, S.T., Gertrude Brinek, and M. Retzl, eds. PISA zufolge PISA – PISA According to PISA. 2007, Lit Verlag: Berlin.
4. Sjøberg, S. PISA: Politics, fundamental problems and intriguing results Recherches en Education, 2012. 14.
5. Levin, H.M., More Than Just Test Scores. Prospects: The Quarterly Review of Comparative Education, 2012. 42(3): p. 269-284.