Measure to Improve: Evaluation as a Driver of Change in Education

Evaluation is a central pillar in the development of education systems, and particularly relevant in contexts such as Latin America and the Caribbean, where social and economic inequalities remain a barrier to accessing quality education. In this article, Mercedes Mateo, Head of the Education Division at the IDB, shares insights on the importance of measuring learning outcomes in order to adjust policies, improve teaching, and ensure results and equity in access to relevant, quality education.

Measure to Improve: Evaluation as a Driver of Change in Education

“We cannot improve what we cannot measure. What is not improved, always degrades.”
— Lord Kelvin

It is difficult to reach a destination if it is unclear where we want to go or whether we are moving in the right direction. If a farmer wants to improve their harvest, they examine the quality of the soil, measure plant growth, and analyse which nutrients are lacking. If an athlete wants to run faster, they need to know their current performance, identify weaknesses, and adjust their training. The same is true in education: we cannot improve learning without measuring it. Educational evaluation is the tool that allows us to observe progress and analyse what works, what doesn’t, and how to move towards more effective and equitable teaching.

Measuring the impact of implemented educational policies and programmes is absolutely essential. Without evaluation, we do not know whether we are achieving our intended goals, nor whether each student’s learning process is progressing in the right direction. Evaluation is essential from both the student’s and the teacher’s perspectives, as it is only by measuring progress that we can adapt teaching to the appropriate level and keep pace with each learner.

Evaluation is not merely an administrative exercise, but a key tool for improving education, from the classroom to the design of national policies. It is also essential to make investment more efficient. We know that education systems require more resources. Latin America, for instance, invests around one-third per student of what OECD countries invest on average.

However, student learning outcomes in the region fall below expectations, even accounting for these levels of investment. This highlights a significant margin for improving the efficiency of educational spending. It is precisely in this context that evaluation and impact measurement play a fundamental role. Evidence-based decision-making is indispensable in designing policies and programmes that produce real improvements in learning outcomes. It is not only about increasing investment in education, but also about optimising existing resources to maximise impact, especially in a region facing severe fiscal constraints.

What Do We Evaluate and How?

Educación.

Once we have established the importance of measurement, we can begin to discuss what we evaluate and how we do it.

For example, we know that formative assessments are essential for students to understand where they stand and which areas of learning need strengthening. These assessments benefit not only the students but also provide teachers with key information to personalise instruction and offer targeted support to each learner. Every student faces different challenges. For this reason, formative assessment is a crucial tool that must be applied consistently at key stages in the learning process. Moreover, with advances in technology, formative evaluation has become a central feature of adaptive learning platforms, which adjust content based on each student’s specific needs.

However, while formative assessments are important, they are not sufficient. We also need summative assessments to measure large-scale performance, identify equity gaps, and begin to understand which aspects of the education system require adjustment. In this context, regional and international comparative assessments, such as PISA (OECD), LLECE (UNESCO), and others, are key instruments that allow us to see how students in one country perform compared to their peers and reflect on the performance of education systems as a whole.

Evaluation is not merely an administrative exercise, but a key tool for improving education, from the classroom to the design of national policies.

Challenges of Educational Evaluation and Measurement

Where does Latin America stand in terms of evaluation? Fifteen countries in the region participated in the 2019 ERCE regional assessment, which measures learning among primary school students, and fourteen countries took part in the 2022 PISA tests, which assess the performance of 15-year-old students in Reading, Mathematics, and Science. However, participating in international evaluations is not enough. It is vital for countries to develop their own national evaluations.

Between 2021 and 2023, fourteen Latin American countries conducted national assessments at some educational level, allowing them to monitor the educational situation in more detail within their own territories. Of these fourteen countries, only eight implement some form of census-based evaluation. For those who wish to delve deeper, the report The State of Education in Latin America, published by the IDB last year, offers a comprehensive analysis of the current state of national, regional, and international assessments in the region.

Implementing robust evaluation systems is not without its challenges. First, these are costly processes requiring significant technical and institutional capacity. Traditionally, the focus has mainly been on measuring foundational cognitive skills, such as Reading and Mathematics, while overlooking other equally critical dimensions, such as socioemotional skills, whose large-scale measurement remains a challenge for education systems not only in the region but globally. Although technology can certainly help reduce costs and facilitate implementation, it is essential to ensure the conditions needed to maintain the quality of these measurements.

What We Know Works

There are several basic principles that are strongly backed by evidence. For instance, in contexts where teachers face gaps in pedagogy and subject knowledge, scripted lessons have been shown to have a very positive impact on learning. In interventions we have carried out in various countries and highly vulnerable contexts, we have found that scripted teaching models can help reduce learning gaps among students taught by teachers with varying levels of training and experience.

We also know that experiential learning and the use of active methodologies — such as project-based learning and problem-solving — are the pedagogical strategies with the greatest impact on student outcomes. This is because humans learn best through experience. In several countries, the IDB has worked with Ministries of Education on initiatives promoting active learning in science and mathematics through school projects linked to real-world community issues, such as water management or environmental protection.

Experimental evaluations show that these experiences not only strengthen academic skills but also develop students’ critical thinking, creativity, and civic engagement. It is worth noting that for students with greater learning difficulties, these strategies should be complemented by explicit, timely, and tailored instruction. In such cases, the use of appropriate scaffolding is essential to support their progress and ensure that all students move forward in their educational journey.

Another fundamental principle is teaching each student at the appropriate level. Since every person learns at their own pace, it is crucial to adapt instruction to prevent anyone from falling behind. However, many teachers are not fully equipped with the pedagogical strategies needed to implement differentiated instruction effectively. To support them, there are digital platforms that can identify each student’s learning level and deliver personalised content. The IDB is working with several Ministries of Education in the region to implement these platforms, aiming to facilitate differentiated instruction and improve learning outcomes, especially for students facing the greatest challenges.

Another clear piece of evidence is the importance of culturally adapted programmes. One cannot simply replicate the same programme in one context and another. A good example is a bilingual intercultural education programme we implemented in Panama, which received international recognition and has rigorous experimental evidence of its impact. By adapting learning to the community’s reality — in this case, combining traditional mathematics with ethnomathematics — students’ capacity to absorb content is strengthened.

Contrary to what one might believe, we now have a considerable body of evidence on what works to improve learning. Experiential learning, active methodologies, teaching at the right level and differentiated instruction, scripted lessons where teachers have limited training, and culturally adapted programmes are some examples of highly effective interventions. However, much of this evidence comes from small-scale, pilot interventions. In any intervention, not only does the science behind the design matter, but also the science of implementation.

What may work in controlled settings and at small scale may become ineffective when the implementation deviates from the original design. That is why the main challenge is often the capacity to implement at scale. A pilot programme may have high impact in controlled settings, but when expanded nationwide, the quality of implementation may be diluted, and the impact drastically reduced or entirely lost. Therefore, it is essential not only to evaluate before scaling up a policy, but also to continuously monitor its implementation to ensure that outcomes are sustained.

Returning to the starting point, improving education means generating evidence to design policies and implement programmes that lead to real improvements in learning. For this reason, evaluation should not be viewed merely as a diagnostic or accountability process, but rather as an opportunity for change and continuous improvement for both the system and the students. It must go beyond the logic of “testing to classify” and become a constant learning tool that informs more effective public policies. Only then can the region make evaluation a catalyst for system-wide change that is real and sustainable, ensuring that teaching and learning meet the needs of 21st-century students.

You may also be interested in…