dc.contributor | Santos Díaz, Alejandro | |
dc.contributor | School of Engineering and Sciences | |
dc.contributor | Soenksen, Luis Ruben | |
dc.contributor | Montesinos Silva, Luis Arturo | |
dc.contributor | Ochoa Ruiz, Gilberto | |
dc.contributor | Tamez Peña, José Gerardo | |
dc.contributor | Campus Monterrey | |
dc.contributor | dnbsrp | |
dc.creator | Vela Jarquin, Daniel | |
dc.date.accessioned | 2023-07-17T20:49:08Z | |
dc.date.accessioned | 2023-07-19T19:10:25Z | |
dc.date.available | 2023-07-17T20:49:08Z | |
dc.date.available | 2023-07-19T19:10:25Z | |
dc.date.created | 2023-07-17T20:49:08Z | |
dc.date.issued | 2023-06 | |
dc.identifier | Vela Jarquin, D. (2023). Caption generation with transformer models across multiple medical imaging modalities (Master's thesis). Instituto Tecnológico de Monterrey. | |
dc.identifier | https://hdl.handle.net/11285/651044 | |
dc.identifier | https://orcid.org/0000-0001-5624-8791 | |
dc.identifier | 1154114 | |
dc.identifier | 57215617169 | |
dc.identifier.uri | https://repositorioslatinoamericanos.uchile.cl/handle/2250/7715654 | |
dc.description.abstract | Caption generation is the process of automatically providing text excerpts that describe relevant features of an image. This process is applicable to very diverse domains, including healthcare. The field of medicine is characterized by the vast amount of visual information in the form of X-Rays, Magnetic Resonances, Ultrasound and CT-scans among others. Descriptive texts generated to represent this kind of visual information can aid medical professionals to achieve a better understanding of the pathologies and cases presented to them and could ultimately allow them to make more informed decisions. In this work, I explore the use of deep learning to face the problem of caption generation in medicine. I propose the use of a Transformer model architecture for caption generation and evaluate its performance on a dataset comprised of medical images that range across multiple modalities and represented anatomies.
Deep learning models, particularly encoder-decoder architectures have shown increasingly favorable results in the translation from one information modality to another. Usually, the encoder extracts features from the visual data and then these features are used by the decoder to iteratively generate a sequence in natural language that describes the image. In the past, various deep learning architectures have been proposed for caption generation. The most popular architectures in the last years involved recurrent neural networks (RNNs), Long short-term memory (LSTM) networks and only recently, the use of Transformer type architectures. The Transformer architecture has shown state-of-the art performance in many natural language processing tasks such as machine translation, question answering, summarizing and not long ago, caption generation. The use of attention mechanisms allows Transformers to better grasp the meaning of words in a sentence in a particular context. All these characteristics make Transformers ideal for caption generation.
In this thesis I present the development of a deep learning model based on the Transformer architecture that generates captions for medical images of different modalities and anatomies with the ultimate goal to aid professionals improve medical diagnosis and treatment. The model is tested on the MedPix online database, a compendium of medical imaging cases and the results are reported. In summary, this work provides a valuable contribution to the field of automated medical image analysis | |
dc.language | eng | |
dc.publisher | Instituto Tecnológico y de Estudios Superiores de Monterrey | |
dc.relation | acceptedVersion | |
dc.rights | http://creativecommons.org/licenses/by/4.0 | |
dc.rights | openAccess | |
dc.title | Caption generation with transformer models across multiple medical imaging modalities | |
dc.type | Tesis de Maestría / master Thesis | |