Although the education sector was not among the first to embrace AI and data science, today no one doubts the importance, implications, and applications these technologies can have in transforming the sector towards better and more equitable education.
Data and statistics are crucial for ensuring that children and young people can access quality education. As Peter Drucker said, what cannot be measured cannot be managed.
However, these data, essential for ensuring that socio-educational intervention programs are effective and relevant, pose challenges within social organizations. These challenges are often related to factors such as information collection, the difficulty of finding appropriate indicators, or adapting evaluation methodologies to diverse socio-cultural contexts.
ProFuturo is a global, large, and complex program that manages data from over 5,000 schools in 45 countries. It is very aware of these challenges and constantly works to improve its data collection, processing, and management processes. Among other things, it collaborates with university and academic institutions such as the Pontifical University of Salamanca (UPSA), with which it created the Telefónica ProFuturo-UPSA Chair ‘Data Analytics of Educational Projects in Vulnerable Environments,’ to promote research and technology development for data exploitation of educational projects like ProFuturo.
In the following lines, we will focus on the Final Degree Project (TFG) of one of its students, Jorge Carrasco, focused on how applying AI to complex data systems can boost the efficiency of educational organizations. His results demonstrate the great potential of machine learning methods to improve the management and evaluation of educational programs and projects.
Project Objectives: Maximizing Impact with Predictive Analysis and AI
Since its inception, the ProFuturo program has always sought to have tools to monitor the progress of the educational centers where it operates, as well as the impact generated by the project. To manage and evaluate its program, ProFuturo uses a sophisticated system that employs a series of Key Performance Indicators (KPIs) and surveys to assign a maturity level to its schools, categorizing them into five levels: initial, basic, intermediate, advanced, and transformative. However, until now, no sophisticated statistical analyses of these data had been conducted, nor had their predictive potential been evaluated. This is precisely the main objective of this project: to conduct an exhaustive analysis of the data collected by ProFuturo to extract valuable information that facilitates better decision-making and resource management.
To achieve this, several key steps were followed:
- Design a data integration process to combine data from various sources into a single coherent set and store it. This process, known by the acronym ETL (Extraction, Transformation, and Loading), is very relevant because it helps load, clean, and transform the data so that relevant information can later be extracted from it.
- Predict with high precision the maturity level of the projects implemented by ProFuturo using a selected subset of indicators. This objective has been considered very relevant by ProFuturo as it will allow for more efficient and targeted investment planning.
- Identify the indicators most relevant to the evolution and maturity of ProFuturo’s operations: this objective was proposed by the organization itself to have relevant information that avoids investing in sectors that do not have the desired impact on operation progression. This way, investment in operations can be optimized.
- Identify possible correlations between KPIs, allowing ProFuturo to better manage the Program.
- Design a simple and intuitive software tool: Provide ProFuturo with useful visualizations that can shed light on the experimental results obtained. This objective is relevant to facilitate the possible technological transfer of the TFG to ProFuturo.
An Innovative Project
Over the years, numerous studies have focused on predicting the evolution of KPI values using various techniques such as fuzzy logic, regressive machine learning methods, and deep learning algorithms. For example, AI models have been used to predict road pavement conditions using a series of indicators, predict the risk of school dropout among students based on indicators collected by schools, predict water condition in a region using waste indicators, or predict the risk of accidents in construction activities using certain indicators.
These projects have used AI models to make simple predictions. However, this work does something different and more complex. Instead of predicting only two possible outcomes (such as success or failure), the approach applied in this work can handle multiple different outcomes simultaneously. To achieve this, the data were processed in a very detailed and systematic manner using a series of organized steps (pipeline). This innovative method not only improves prediction capabilities but also opens new possibilities for using KPIs in project status prediction, something that had not been done before.
The Process
The initial data presented several challenges, including:
- Abundance of qualitative variables: Except for one field, all variables were qualitative, requiring specific methods for their treatment.
- Excessive “noise”: The tables contained much information irrelevant to the research project, necessitating cleaning and preprocessing tasks.
- Data dispersion: Although there were many measurements, they were taken in an unstructured manner, not always measuring the same indicators and omitting many others.
- Nonlinear data: The nature of the data was highly nonlinear, requiring complex AI models to make valid predictions.
- Unbalanced data: When studying the maturity levels of schools, it was observed that most were at the same level. Techniques were applied to avoid predictive biases.
- Correlated indicators: Many indicators were correlated, necessitating the use of feature selection techniques to eliminate redundancy among them.
To address these, a thorough cleaning and transformation process was conducted, allowing the data to be used in predictive AI models such as Support Vector Machine, LASSO, or Random Forest Classifier. Among these, the “Random Forest” model stands out.
Random Forest is a model that creates many decision trees (hence the name “forest”) that work together to make more accurate predictions. For example, suppose we want to predict whether a student will pass an exam based on various characteristics (study hours, class attendance, etc.). Using Random Forest, we train multiple decision trees with different subsets of data and features. Then, for a new student, each tree will make its prediction (pass or fail), and the final result will be the option that most trees have voted for.
To further improve accuracy, a technique called bootstrapping was applied to address the class imbalance problem, ensuring that different data combinations were used to train the model.
A feature selection process was then performed, resulting in a model with excellent predictive results (98.2%) but using only seven indicators instead of 25. As the author of this TFG, Jorge Carrasco, states, “these excellent results will allow ProFuturo to automate the evaluation of school maturity levels without the need for manual surveys, saving the organization time and resources.” Additionally, he continues, “by identifying the most relevant KPIs, ProFuturo will be able to focus its efforts on areas that truly impact educational improvement, optimizing resource management and distribution.”
The implementation of this system will not only increase the accuracy of educational project evaluation but also provide a valuable tool for strategic decision-making in all socio-educational organizations. With cleaner databases and more robust predictive models, these organizations will be better prepared to monitor the progress of their initiatives and adjust their strategies more efficiently. This, in turn, will help reduce the digital divide in disadvantaged communities, ensuring that more students have access to quality education.