dc.contributorSantos Díaz, Alejandro
dc.contributorSchool of Engineering and Sciences
dc.contributorSoenksen, Luis Ruben
dc.contributorMontesinos Silva, Luis Arturo
dc.contributorOchoa Ruiz, Gilberto
dc.contributorTamez Peña, José Gerardo
dc.contributorCampus Monterrey
dc.contributordnbsrp
dc.creatorVela Jarquin, Daniel
dc.date.accessioned2023-07-17T20:49:08Z
dc.date.accessioned2023-07-19T19:10:25Z
dc.date.available2023-07-17T20:49:08Z
dc.date.available2023-07-19T19:10:25Z
dc.date.created2023-07-17T20:49:08Z
dc.date.issued2023-06
dc.identifierVela Jarquin, D. (2023). Caption generation with transformer models across multiple medical imaging modalities (Master's thesis). Instituto Tecnológico de Monterrey.
dc.identifierhttps://hdl.handle.net/11285/651044
dc.identifierhttps://orcid.org/0000-0001-5624-8791
dc.identifier1154114
dc.identifier57215617169
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/7715654
dc.description.abstractCaption generation is the process of automatically providing text excerpts that describe relevant features of an image. This process is applicable to very diverse domains, including healthcare. The field of medicine is characterized by the vast amount of visual information in the form of X-Rays, Magnetic Resonances, Ultrasound and CT-scans among others. Descriptive texts generated to represent this kind of visual information can aid medical professionals to achieve a better understanding of the pathologies and cases presented to them and could ultimately allow them to make more informed decisions. In this work, I explore the use of deep learning to face the problem of caption generation in medicine. I propose the use of a Transformer model architecture for caption generation and evaluate its performance on a dataset comprised of medical images that range across multiple modalities and represented anatomies. Deep learning models, particularly encoder-decoder architectures have shown increasingly favorable results in the translation from one information modality to another. Usually, the encoder extracts features from the visual data and then these features are used by the decoder to iteratively generate a sequence in natural language that describes the image. In the past, various deep learning architectures have been proposed for caption generation. The most popular architectures in the last years involved recurrent neural networks (RNNs), Long short-term memory (LSTM) networks and only recently, the use of Transformer type architectures. The Transformer architecture has shown state-of-the art performance in many natural language processing tasks such as machine translation, question answering, summarizing and not long ago, caption generation. The use of attention mechanisms allows Transformers to better grasp the meaning of words in a sentence in a particular context. All these characteristics make Transformers ideal for caption generation. In this thesis I present the development of a deep learning model based on the Transformer architecture that generates captions for medical images of different modalities and anatomies with the ultimate goal to aid professionals improve medical diagnosis and treatment. The model is tested on the MedPix online database, a compendium of medical imaging cases and the results are reported. In summary, this work provides a valuable contribution to the field of automated medical image analysis
dc.languageeng
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterrey
dc.relationacceptedVersion
dc.rightshttp://creativecommons.org/licenses/by/4.0
dc.rightsopenAccess
dc.titleCaption generation with transformer models across multiple medical imaging modalities
dc.typeTesis de Maestría / master Thesis


Este ítem pertenece a la siguiente institución