Rolf V. Olsen & Sigrid Blömeke

In an earlier blog, we established that international large-scale assessments can be regarded as powerful and influential knowledge sources for making claims about the quality of educational systems. Comparisons between countries have over time been one of the most dominant ways of interpreting the results coming out of the studies, and this blog post continues by going into further detail about such between country analyses.

The World is an Educational Laboratory

 “We, the researchers who… decided to cooperate in developing internationally valid evaluation instruments, conceived of the world as one big educational laboratory where a great variety of practices in terms of school structure and curriculum were tried out. We simply wanted to take advantage of the international variability with regard both to the outcomes of the educational systems and the factors which caused differences in those outcomes.”, Torsten Husén, 1973, in the report from FISS, one of the first international large-scale studies in education.

This quote captures the essence of why comparisons with others is perceived to be useful:

  • Comparisons with other systems helps by providing cases of what is typical across countries. This provides a normative rhetorical framing for statements like “Norwegian 4th graders read more newspapers than 4th graders do on average in other countries”.
  • They provide exemplary cases of what is possible. That is a rhetorical frame for benchmarking, e.g. in statements like “Compared to the highest performing systems in the world, Norwegian 15-year olds lag on average several years of schooling behind”.
  • International comparisons provide information about phenomena which are somewhat invisible or unobservable within one country alone because there is no or very little variation within one educational system. However, the variation between countries can be substantial. One example is school starting age, which is relatively fixed within many countries but varies across countries.

Using a metaphor from photography: The international variation creates contrast and a background allowing the main object to be highlighted. Initially arbitrary numbers are given meaning through relative comparisons.

Standardization and Quality Management

This “relative” framing has obvious shortcomings, mostly related to the range of countries participating and the limitation that the framing is based on the assumption that the comparisons are relevant and accurate. Or as so often stated: we have to make sure that we are “comparing oranges with oranges, and not with apples”. International assessments put a great deal of effort to ensure that comparisons between countries can be made. They have strict rules for sampling of schools and students to ensure comparability across countries, the tests and questionnaires are piloted in all countries, translations of the instruments are verified in several steps, and the test and questionnaire items are empirically checked to ensure that they function equally across countries – to mention just a few of the quality checks being done.

Nevertheless, the quality with which we can make valid comparisons between countries is never perfect, and can be potentially quite misleading. As the number and heterogeneity of countries in these studies increases, this challenge of ensuring fair comparisons becomes ever harder. We will return to this and other threats and challenges to interpretations of comparisons across countries in another blog.