Biomedical Image Segmentation Based on Foundation Models Adapted Without Retraining and With Uncertainty Estimation
Authorship
F.G.S.
Master in artificial intelligence
F.G.S.
Master in artificial intelligence
Defense date
07.18.2025 09:30
07.18.2025 09:30
Summary
Two important shortcomings limit the effectiveness of current learning-based solutions for biomedical image segmentation. One major issue is that new segmentation tasks typically demand the training or fine-tuning of new models, a resource- intensive process requiring significant machine learning expertise that is often beyond the reach of medical researchers and clinicians. The second critical limitation is that most existing segmentation methods yield only a single, deterministic segmentation mask, despite the considerable uncertainty often present regarding what constitutes correct segmentation. This uncertainty arises from both inherent data variability (aleatoric) and the model’s own knowledge gaps (epistemic). This work specifically addresses the estimation of these uncertainties in the segmentation process. By understanding and quantifying these uncertainties, we can significantly increase the explainability and interpretability of segmentation models, enabling more confident and informed decision-making in vital medical applications. We propose to develop a generalized method to analyze these different uncertainty types without requiring model retraining.
Two important shortcomings limit the effectiveness of current learning-based solutions for biomedical image segmentation. One major issue is that new segmentation tasks typically demand the training or fine-tuning of new models, a resource- intensive process requiring significant machine learning expertise that is often beyond the reach of medical researchers and clinicians. The second critical limitation is that most existing segmentation methods yield only a single, deterministic segmentation mask, despite the considerable uncertainty often present regarding what constitutes correct segmentation. This uncertainty arises from both inherent data variability (aleatoric) and the model’s own knowledge gaps (epistemic). This work specifically addresses the estimation of these uncertainties in the segmentation process. By understanding and quantifying these uncertainties, we can significantly increase the explainability and interpretability of segmentation models, enabling more confident and informed decision-making in vital medical applications. We propose to develop a generalized method to analyze these different uncertainty types without requiring model retraining.
Direction
Pardo López, Xosé Manuel (Tutorships)
Pardo López, Xosé Manuel (Tutorships)
Court
IGLESIAS RODRIGUEZ, ROBERTO (Chairman)
CORRALES RAMON, JUAN ANTONIO (Secretary)
ALONSO MORAL, JOSE MARIA (Member)
IGLESIAS RODRIGUEZ, ROBERTO (Chairman)
CORRALES RAMON, JUAN ANTONIO (Secretary)
ALONSO MORAL, JOSE MARIA (Member)
Autotracking module for long range aerial objects
Authorship
I.A.M.S.
Master in artificial intelligence
I.A.M.S.
Master in artificial intelligence
Defense date
07.18.2025 10:00
07.18.2025 10:00
Summary
This thesis focuses on the evaluation of different tracking methods integrated in MMTracking, an open-source library that provides implementations for single-object and multiple-object trackers, exploring the capacities of each tracker in detail, to identify the best method for maximization of the capture of relevant information about the moving object. The evaluation was performed using known benchmarks such as MOTChallenge and OTB2015 that provide diverse conditions and scenarios. The results led to a comprehensive analysis of each method, showing which tracking method handles better each scenario. Additionally, this study contributes to the continuous search of tracking algorithms by providing insights and identifying areas for improvement.
This thesis focuses on the evaluation of different tracking methods integrated in MMTracking, an open-source library that provides implementations for single-object and multiple-object trackers, exploring the capacities of each tracker in detail, to identify the best method for maximization of the capture of relevant information about the moving object. The evaluation was performed using known benchmarks such as MOTChallenge and OTB2015 that provide diverse conditions and scenarios. The results led to a comprehensive analysis of each method, showing which tracking method handles better each scenario. Additionally, this study contributes to the continuous search of tracking algorithms by providing insights and identifying areas for improvement.
Direction
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
Blanco Freire, Lara (Co-tutorships)
Dago Casas, Pablo (Co-tutorships)
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
Blanco Freire, Lara (Co-tutorships)
Dago Casas, Pablo (Co-tutorships)
Court
IGLESIAS RODRIGUEZ, ROBERTO (Chairman)
CORRALES RAMON, JUAN ANTONIO (Secretary)
ALONSO MORAL, JOSE MARIA (Member)
IGLESIAS RODRIGUEZ, ROBERTO (Chairman)
CORRALES RAMON, JUAN ANTONIO (Secretary)
ALONSO MORAL, JOSE MARIA (Member)
KIME: Kumite Intelligent Movement Evaluation
Authorship
H.M.C.
Master in artificial intelligence
H.M.C.
Master in artificial intelligence
Defense date
07.18.2025 10:30
07.18.2025 10:30
Summary
This thesis addresses the challenge of objectively analyzing light-contact Karate (Kumite) fights, where rapid and precise techniques must be judged in real time, by harnessing computer vision and deep learning solely from video recordings. Traditional scoring relies on human referees, which introduces subjectivity, potential bias, and limited capacity to process large volumes of footage for athlete scouting and performance evaluation. To overcome these limitations, three interrelated components were developed: First, a data extraction pipeline was devised to locate and segment moments of interest in full-length match videos. By combining scoreboard change detection via a lightweight CNN and manual validation, a curated dataset of scoring and non-scoring events was generated. Second, a workflow was created to distinguish the two fighters, Aka and Ao, through tatami boundary detection, person detection, instance segmentation, and color-based filtering. Object tracking was then applied to reduce computational load while maintaining identity consistency across frames, resulting in a validated classification dataset. Finally, transfer learning strategies were explored for classifying individual frames as scoring or non-scoring actions and assigning the correct athlete and point value. Two approaches were compared: freezing the feature extractor versus fine-tuning upper layers of a pretrained image classifier. The frozen-backbone model demonstrated superior generalization and achieved low false positives rate, an attribute essential for real-world integration into semi-automated judging or analytics systems. Overall, this work demonstrates the feasibility of a non-intrusive, video-only solution for Kumite analysis and lays a foundation for further development toward real-time deployment, enhanced explainability, and expanded tactical insights.
This thesis addresses the challenge of objectively analyzing light-contact Karate (Kumite) fights, where rapid and precise techniques must be judged in real time, by harnessing computer vision and deep learning solely from video recordings. Traditional scoring relies on human referees, which introduces subjectivity, potential bias, and limited capacity to process large volumes of footage for athlete scouting and performance evaluation. To overcome these limitations, three interrelated components were developed: First, a data extraction pipeline was devised to locate and segment moments of interest in full-length match videos. By combining scoreboard change detection via a lightweight CNN and manual validation, a curated dataset of scoring and non-scoring events was generated. Second, a workflow was created to distinguish the two fighters, Aka and Ao, through tatami boundary detection, person detection, instance segmentation, and color-based filtering. Object tracking was then applied to reduce computational load while maintaining identity consistency across frames, resulting in a validated classification dataset. Finally, transfer learning strategies were explored for classifying individual frames as scoring or non-scoring actions and assigning the correct athlete and point value. Two approaches were compared: freezing the feature extractor versus fine-tuning upper layers of a pretrained image classifier. The frozen-backbone model demonstrated superior generalization and achieved low false positives rate, an attribute essential for real-world integration into semi-automated judging or analytics systems. Overall, this work demonstrates the feasibility of a non-intrusive, video-only solution for Kumite analysis and lays a foundation for further development toward real-time deployment, enhanced explainability, and expanded tactical insights.
Direction
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
RODRIGUEZ FERNANDEZ, ISMAEL (Co-tutorships)
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
RODRIGUEZ FERNANDEZ, ISMAEL (Co-tutorships)
Court
IGLESIAS RODRIGUEZ, ROBERTO (Chairman)
CORRALES RAMON, JUAN ANTONIO (Secretary)
ALONSO MORAL, JOSE MARIA (Member)
IGLESIAS RODRIGUEZ, ROBERTO (Chairman)
CORRALES RAMON, JUAN ANTONIO (Secretary)
ALONSO MORAL, JOSE MARIA (Member)
Development of a Computer Vision-Based Tool for the Automatic Detection of Basketball Shots and Court Position Analysis.
Authorship
A.M.R.
Master in artificial intelligence
A.M.R.
Master in artificial intelligence
Defense date
07.17.2025 09:30
07.17.2025 09:30
Summary
This work presents a modular computer vision system for automatic detection of basketball shots and court position analysis using single-camera footage. Aimed at democratizing access to sports analytics and reducing reliance on manual annotation, the tool integrates state-of-the-art object detection (YOLO and RT-DETR), tracking (ByteTrack), and homography-based court mapping to position players. It detects shot attempts, classifies outcomes (made/missed), assigns possession, and generates both annotated videos and structured datasets. Evaluated on real amateur videos, the system demonstrates robust performance across spatial, temporal, and classification metrics. These results highlight its potential as a practical and accessible solution for automated basketball analytics.
This work presents a modular computer vision system for automatic detection of basketball shots and court position analysis using single-camera footage. Aimed at democratizing access to sports analytics and reducing reliance on manual annotation, the tool integrates state-of-the-art object detection (YOLO and RT-DETR), tracking (ByteTrack), and homography-based court mapping to position players. It detects shot attempts, classifies outcomes (made/missed), assigns possession, and generates both annotated videos and structured datasets. Evaluated on real amateur videos, the system demonstrates robust performance across spatial, temporal, and classification metrics. These results highlight its potential as a practical and accessible solution for automated basketball analytics.
Direction
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
MALLO ANTELO, JAIME (Co-tutorships)
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
MALLO ANTELO, JAIME (Co-tutorships)
Court
Taboada González, José Ángel (Chairman)
MERA PEREZ, DAVID (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
Taboada González, José Ángel (Chairman)
MERA PEREZ, DAVID (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
State-of-the-Art Voice Models for Galician Language Using a Small-to-Medium TTS Dataset
Authorship
A.M.S.
Master in artificial intelligence
A.M.S.
Master in artificial intelligence
Defense date
07.17.2025 10:00
07.17.2025 10:00
Summary
Text-to-speech (TTS) synthesis plays a crucial role in human-computer interaction and remains a hot research topic in the speech technology and machine learning communities. With advances in deep learning techniques and increased computing power, deep neural network-based TTS systems have emerged as a powerful alternative to traditional methods. Recently, end-to-end deep learning TTS models have produced impressive natural-sounding and high-quality results. However, extending these models to multiple languages and speakers is challenging, especially for low-to-medium resource languages such as Galician. In our study, we use an open small-to-medium Galician TTS dataset to train different voice models in Galician. We also apply synthetic data generation to address identified shortcomings in the original dataset. We explore state-of-the-art architectures, including training from scratch and transfer learning techniques. The resulting models are validated and compared through subjective and automatic evaluations.
Text-to-speech (TTS) synthesis plays a crucial role in human-computer interaction and remains a hot research topic in the speech technology and machine learning communities. With advances in deep learning techniques and increased computing power, deep neural network-based TTS systems have emerged as a powerful alternative to traditional methods. Recently, end-to-end deep learning TTS models have produced impressive natural-sounding and high-quality results. However, extending these models to multiple languages and speakers is challenging, especially for low-to-medium resource languages such as Galician. In our study, we use an open small-to-medium Galician TTS dataset to train different voice models in Galician. We also apply synthetic data generation to address identified shortcomings in the original dataset. We explore state-of-the-art architectures, including training from scratch and transfer learning techniques. The resulting models are validated and compared through subjective and automatic evaluations.
Direction
BUGARIN DIZ, ALBERTO JOSE (Tutorships)
MAGARIÑOS IGLESIAS, MARIA DEL CARMEN (Co-tutorships)
BUGARIN DIZ, ALBERTO JOSE (Tutorships)
MAGARIÑOS IGLESIAS, MARIA DEL CARMEN (Co-tutorships)
Court
Taboada González, José Ángel (Chairman)
MERA PEREZ, DAVID (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
Taboada González, José Ángel (Chairman)
MERA PEREZ, DAVID (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
Anomaly Detection Using Autoencoder Models in Industrial Environments
Authorship
F.M.S.
Master in artificial intelligence
F.M.S.
Master in artificial intelligence
Defense date
07.17.2025 10:30
07.17.2025 10:30
Summary
The increasing connectivity and automation in Industry 4.0 environments has introduced new challenges for ensuring operational reliability and security. Anomaly detection plays a crucial role in identifying failures and cyberattacks that could compromise production systems. This work investigates the use of autoencoder-based models for unsupervised anomaly detection in both network traffic and sensor data, collected from a simulated cocktail production system. A fully-connected autoencoder is employed to detect deviations in Modbus network flows, while a sequence-to-one LSTM-autoencoder is used to model temporal patterns in multivariate sensor streams. Both models are trained on normal data and evaluated under realistic attack scenarios, including Modbus register tampering and SYN Flood denial-of-service. Experimental results demonstrate that autoencoders can effectively detect anomalies in industrial settings, with LSTM-based models offering improved performance in environments with cyclic behavior.
The increasing connectivity and automation in Industry 4.0 environments has introduced new challenges for ensuring operational reliability and security. Anomaly detection plays a crucial role in identifying failures and cyberattacks that could compromise production systems. This work investigates the use of autoencoder-based models for unsupervised anomaly detection in both network traffic and sensor data, collected from a simulated cocktail production system. A fully-connected autoencoder is employed to detect deviations in Modbus network flows, while a sequence-to-one LSTM-autoencoder is used to model temporal patterns in multivariate sensor streams. Both models are trained on normal data and evaluated under realistic attack scenarios, including Modbus register tampering and SYN Flood denial-of-service. Experimental results demonstrate that autoencoders can effectively detect anomalies in industrial settings, with LSTM-based models offering improved performance in environments with cyclic behavior.
Direction
CARIÑENA AMIGO, MARIA PURIFICACION (Tutorships)
Pérez Vilarelle, Laura (Co-tutorships)
CARIÑENA AMIGO, MARIA PURIFICACION (Tutorships)
Pérez Vilarelle, Laura (Co-tutorships)
Court
Taboada González, José Ángel (Chairman)
MERA PEREZ, DAVID (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
Taboada González, José Ángel (Chairman)
MERA PEREZ, DAVID (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)