dc.contributorFuentes Magdalena, New York University, New York, NY
dc.contributorSteers Bea, New York University, New York, NY
dc.contributorZinemanas Pablo, Universitat Pompeu Fabra, Barcelona, Spain
dc.contributorRocamora Martín, Universidad de la República (Uruguay). Facultad de Ingeniería.
dc.contributorBondi Luca, Bosch Research, Pittsburgh, PA, USA
dc.contributorWilkins Julia, New York University, New York, NY
dc.contributorShi Qianyi, New York University, New York, NY
dc.contributorHou Yao, New York University, New York, NY
dc.contributorDas Samarjit, Bosch Research, Pittsburgh, PA, USA
dc.contributorSerra Xavier, Universitat Pompeu Fabra, Barcelona, Spain
dc.contributorBello Juan Pablo, New York University, New York, NY
dc.creatorFuentes, Magdalena
dc.creatorSteers, Bea
dc.creatorZinemanas, Pablo
dc.creatorRocamora, Martín
dc.creatorBondi, Luca
dc.creatorWilkins, Julia
dc.creatorShi, Qianyi
dc.creatorHou, Yao
dc.creatorDas, Samarjit
dc.creatorSerra, Xavier
dc.creatorBello, Juan Pablo
dc.date.accessioned2022-05-03T12:01:35Z
dc.date.accessioned2022-10-28T20:21:40Z
dc.date.available2022-05-03T12:01:35Z
dc.date.available2022-10-28T20:21:40Z
dc.date.created2022-05-03T12:01:35Z
dc.date.issued2022
dc.identifierFuentes, M., Steers, B., Zinemanas, P. y otros. Urban sound & sight : Dataset and benchmark for audio-visual urban scene understanding [en línea]. EN: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23-27 may, pp 141-145. Piscataway, NJ : IEEE, 2022. DOI 10.1109/ICASSP43922.2022.9747644
dc.identifierhttps://ieeexplore.ieee.org/document/9747644
dc.identifierhttps://hdl.handle.net/20.500.12008/31397
dc.identifier10.1109/ICASSP43922.2022.9747644
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/4985287
dc.description.abstractAutomatic audio-visual urban traffic understanding is a growing area of research with many potential applications of value to industry, academia, and the public sector. Yet, the lack of well-curated resources for training and evaluating models to research in this area hinders their development. To address this we present a curated audio-visual dataset, Urban Sound & Sight (Urbansas), developed for investigating the detection and localization of sounding vehicles in the wild. Urbansas consists of 12 hours of unlabeled data along with 3 hours of manually annotated data, including bounding boxes with classes and unique id of vehicles, and strong audio labels featuring vehicle types and indicating off-screen sounds. We discuss the challenges presented by the dataset and how to use its annotations for the localization of vehicles in the wild through audio models.
dc.languageen
dc.publisherIEEE
dc.relationICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23-27 may 2022, pp. 141-145.
dc.rightsLicencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
dc.rightsLas obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)
dc.subjectLocation awareness
dc.subjectTraining
dc.subjectIndustries
dc.subjectAnnotations
dc.subjectConferences
dc.subjectSignal processing
dc.subjectBenchmark testing
dc.subjectAudio-visual
dc.subjectUrban research
dc.subjectTraffic
dc.subjectDataset
dc.titleUrban sound & sight : Dataset and benchmark for audio-visual urban scene understanding
dc.typePonencia


Este ítem pertenece a la siguiente institución