More and Better Teachers Thanks to Data Analytics

Teacher training is one of the key determinants of educational quality. But in many vulnerable countries, maintaining that training and preventing dropout remain a challenge. The ProFuturo UPSA Telefónica Chair has analyzed thousands of records from ProFuturo’s digital platform to understand how teachers are trained and why some interrupt the process. The results point to practical solutions: more effective learning pathways, early detection of dropout, and personalized programs. A demonstration of how applied research can improve education where it is most needed.

More and Better Teachers Thanks to Data Analytics

The quality of an education system depends, to a large extent, on the quality of its teachers. And in the most vulnerable contexts, where resources are limited and inequalities are amplified, supporting the continuous training of teachers is an urgent and complex task. In recent years, the ProFuturo Foundation has made this its raison d’être and works on several fronts to provide teachers with digital tools and training programs that enable them to improve their pedagogical practice and, with it, the learning of millions of children.

Cátedra UPSA ProFuturo

One of these fronts is research. Because, in addition to being an educational organization, this foundation investigates how data can help to better understand the educational process. Through the Telefónica ProFuturo-UPSA Chair, Data Analytics of Educational Projects in Vulnerable Contexts, the entity has promoted a space for applied research in which university students develop projects that respond to real problems in digital education. This chair has become a bridge between university and social action: a place where data analytics, artificial intelligence and computer science are put at the service of educational improvement.

In this article we present the results of two undergraduate theses developed within this framework. The first analyzes more than 250,000 records from ProFuturo’s platform in five Latin American countries to map teachers’ learning pathways using graph mining techniques. The second applies machine learning and deep learning algorithms to predict course dropout, achieving accuracy rates close to 90%.

Their conclusions, in addition to offering academically interesting findings, make it possible to design more effective pathways, anticipate risks and optimize resources. In the following lines we will see how university research can be translated into practical solutions to ensure more and better teachers where they are most needed.

Data Analytics in Education

Every time a teacher accesses a digital course, completes a module (or abandons it halfway through), they leave a trace in the form of data. Multiplied by hundreds of thousands of users and dozens of countries, these traces constitute a flow of information that, when properly analyzed, can reveal much about how teachers learn and what they need. In this way, the accumulated records are transformed into useful knowledge to improve the training experience.

And this is where data analytics comes into play. In recent years, the development of machine learning, deep learning and data mining techniques has opened the door to much more sophisticated analyses of educational processes. For example, it is now possible to detect behavioral patterns, identify risk factors, and design personalized pathways that adapt to the needs of each teacher.

In the case of ProFuturo, this analytical capacity is especially valuable. Its platform, which operates in highly diverse contexts and with thousands of teachers in training simultaneously, is capable of collecting and storing millions of data points from their activities. Anticipating dropout, detecting key courses in learning trajectories or locating bottlenecks are very useful steps to optimize the training offer and increase the real impact in classrooms.

The two undergraduate theses (TFG) presented here are excellent examples of this. Both use data to improve teacher training, but they approach it from different angles: one focuses on the complete learning journey, and the other addresses the risk of losing teachers along the way. Together they provide a complementary picture of what analytics can bring to digital education.

In recent years, the development of machine learning, deep learning and data mining techniques has opened the door to much more sophisticated analyses of educational processes.

Teachers’ Learning Pathways through Graph Mining

The first thesis analyzes which trajectories teachers follow on the ProFuturo platform and which courses make the difference in their progress. To do this, the author applied a widely used data science technique: graph theory.

In this model, each course is represented as a node and the links between them appear when the same teacher has taken both. The result is a graph: a network in which the size of the nodes reflects how many teachers have completed that course and the thickness of the links indicates how many have followed that same path. By analyzing this network, patterns emerge that a simple list of records would never show: which courses are most central (large nodes connected to many others), which act as “bridges” between pathways (those that appear linking different course communities), and which groups of courses tend to cluster together naturally.

The study worked with more than 250,000 records from teachers in Colombia, Brazil, Mexico, Peru and Chile. From there, centrality, betweenness and closeness metrics were calculated, and the Girvan-Newman algorithm was applied to detect communities, i.e., groups of courses that teachers tend to take together more frequently, and which reveal natural learning trajectories beyond the formal organization of the catalog.

For example, graph analysis showed that teachers do not train by following arbitrary routes, but tend to follow well-defined trajectories. In Colombia, the analysis revealed two prominent communities around Innovative Educational Methodologies. Within them, highly popular itineraries appeared, such as the one on digital skills (from Digital Audio to Interactive Images), taken by dozens of teachers, or the route that combines Cooperative Learning with Gamification and successive levels of innovation.

In Peru, thanks to a larger catalog (81 courses), more varied and combined trajectories were observed, with teachers alternating between technological and pedagogical training. On the other hand, in Chile, the more limited offer (48 courses) constrained the network of trajectories, although coherent sequences were identified such as School Bullying – Neurodidactics – Learning Difficulties. In Brazil and Mexico, consistent itineraries emerged around basic digital skills and active methodologies, with courses like Introduction to Gamification acting as connectors between routes.

The thesis also proposes some methodological suggestions to refine future analyses: improving the assignment of threshold values that determine which courses or itineraries are represented, automating the calculation of network metrics to facilitate their use in different contexts, perfecting the categorization of detected communities, and enriching the visualization of graphs so that the results are clearer and more accessible.

Beyond the specific results of the analysis, the findings of this thesis have evident practical applications for the management of teacher training in ProFuturo. For example:

  • Strategic priority for central courses. Nodes at the center of the network—those taken by many teachers and connected with numerous pathways—should be given special attention. This implies keeping them updated, ensuring accessibility, and reinforcing their materials, as they function as structural pillars of the offer.
  • Formalizing recurrent itineraries. Course sequences that appear repeatedly in different countries constitute learning routes consolidated in practice. Formalizing them as recommended itineraries can help better guide new teachers and increase completion rates.
  • Reviewing “peripheral courses.” These are those that appear little connected to the rest of the network and remain on the margins of the graph. This does not mean they lack value, but it does invite reflection: do they respond to specific needs? Should they be integrated into broader itineraries, or revised to make them more attractive and relevant?
  • Context adaptation. Comparative analysis across countries shows clear differences: in some cases digital routes predominate, in others pedagogical ones. There is no single model valid for all. Adjusting the offer to each national context can increase both participation and program effectiveness.
  • Basis for future decisions. Beyond immediate results, this type of analysis provides ProFuturo with a permanent tool to monitor the evolution of training, detect changes in patterns and plan catalog expansion based on evidence.

Predicting Dropout with Machine Learning

The second thesis addresses another fundamental challenge of digital teacher training: dropout. Some teachers who start a course interrupt it before completing it. Understanding why this happens and, above all, anticipating it, is key to designing more effective programs.

The thesis uses ProFuturo’s Moodle platform and applies machine learning techniques and advanced statistics to identify which teachers are most likely to drop out of a course: if we can predict dropout with enough anticipation, it is possible to activate support mechanisms and increase completion rates.

The work began with the cleaning and preparation of data. From there, multiple variables potentially related to dropout were identified and evaluated, such as frequency of access to the platform, number of activities completed, or connection time.

To determine which variables were most relevant in predicting dropout, statistical techniques were used (such as the chi-square test and mutual information), as well as predictive models (such as logistic regression and Random Forest).

The predictive models applied to the platform data demonstrated that it is possible to anticipate with high accuracy (around 90%) which teachers will interrupt a course. The most striking finding is that what matters is not the simple fact of passing or failing at the end, but the evolution of participation during the course: logins that become less frequent, activities left incomplete, interaction that decreases over time. These early signals explain dropout much better than the final grade. Moreover, the model distinguishes between those who truly drop out and those who are simply delayed, which avoids overestimating the problem.

The results of this thesis also offer a wide range of practical applications for ProFuturo:

  • Early warning systems. With precise predictive models, automatic alerts can be generated when a teacher shows signs of dropout, allowing timely intervention.
  • Personalized retention strategies. Knowing which teachers are at higher risk makes it easier to allocate resources such as specific tutoring, reminders, or reinforcement in key content.
  • Resource optimization. Predicting dropout helps concentrate efforts where they can have the most impact, instead of applying general measures to the entire population.
  • Improving the training experience. Knowing the factors that most influence dropout makes it possible to redesign courses to be more attractive and accessible.

The Union of Research and Social Action

These two undergraduate theses presented within the framework of the Telefónica ProFuturo UPSA Chair point in the same direction: data are not a collateral product of digital education, but a tool to improve it.

But beyond the specific results, the value of these works lies in what they represent: the union between university research and social action. The use of knowledge placed at the service of a greater objective: improving educational quality where it is most needed.

You may also be interested in…