Trabajo de grado - Maestría
TAPIR: Transformers for Action, Phase, Instrument, and steps Recognition
Fecha
2022-06-06Registro en:
instname:Universidad de los Andes
reponame:Repositorio Institucional Séneca
Autor
Verlyck, Mathilde Agathe
Institución
Resumen
Surgical workflow analysis aims to improve the safety, planning, and efficiency of surgical procedures. However, most benchmarks for studying surgical interventions focus on a specific challenge instead of leveraging the intrinsic complementarity among different tasks. In this work, we present a new model to approach a holistic surgical scene understanding. Jointly with the release of the Phase, Step, Instrument, and Atomic Visual Action recognition (PSI-AVA) dataset by the Biomedical Computer Vision (BCV) group from the Universidad de Los Andes, we present Transformers for Action, Phase, Instrument, and steps Recognition (TAPIR) as a solid approach to surgical scene understanding. PSI-AVA includes annotations for both longterm (Phase and Step recognition) and short-term reasoning (Instrument detection and novel Atomic Action recognition) in robot-assisted radical prostatectomies videos. TAPIR leverages the dataset's multi-level annotations as it benefits from the learned representation on the instrument detection task to improve its classification capacity. Lastly, our experimental results in both PSI-AVA and other publicly available databases demonstrate that TAPIR is a stepping stone for future research in the holistic benchmark.