Urban sound & sight : Dataset and benchmark for audio-visual urban scene understanding

Fuentes, Magdalena; Steers, Bea; Zinemanas, Pablo; Rocamora, Martín; Bondi, Luca; Wilkins, Julia; Shi, Qianyi; Hou, Yao; Das, Samarjit; Serra, Xavier; Bello, Juan Pablo

dc.contributor	Fuentes Magdalena, New York University, New York, NY
dc.contributor	Steers Bea, New York University, New York, NY
dc.contributor	Zinemanas Pablo, Universitat Pompeu Fabra, Barcelona, Spain
dc.contributor	Rocamora Martín, Universidad de la República (Uruguay). Facultad de Ingeniería.
dc.contributor	Bondi Luca, Bosch Research, Pittsburgh, PA, USA
dc.contributor	Wilkins Julia, New York University, New York, NY
dc.contributor	Shi Qianyi, New York University, New York, NY
dc.contributor	Hou Yao, New York University, New York, NY
dc.contributor	Das Samarjit, Bosch Research, Pittsburgh, PA, USA
dc.contributor	Serra Xavier, Universitat Pompeu Fabra, Barcelona, Spain
dc.contributor	Bello Juan Pablo, New York University, New York, NY
dc.creator	Fuentes, Magdalena
dc.creator	Steers, Bea
dc.creator	Zinemanas, Pablo
dc.creator	Rocamora, Martín
dc.creator	Bondi, Luca
dc.creator	Wilkins, Julia
dc.creator	Shi, Qianyi
dc.creator	Hou, Yao
dc.creator	Das, Samarjit
dc.creator	Serra, Xavier
dc.creator	Bello, Juan Pablo
dc.date.accessioned	2022-05-03T12:01:35Z
dc.date.accessioned	2022-10-28T20:21:40Z
dc.date.available	2022-05-03T12:01:35Z
dc.date.available	2022-10-28T20:21:40Z
dc.date.created	2022-05-03T12:01:35Z
dc.date.issued	2022
dc.identifier	Fuentes, M., Steers, B., Zinemanas, P. y otros. Urban sound & sight : Dataset and benchmark for audio-visual urban scene understanding [en línea]. EN: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23-27 may, pp 141-145. Piscataway, NJ : IEEE, 2022. DOI 10.1109/ICASSP43922.2022.9747644
dc.identifier	https://ieeexplore.ieee.org/document/9747644
dc.identifier	https://hdl.handle.net/20.500.12008/31397
dc.identifier	10.1109/ICASSP43922.2022.9747644
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/4985287
dc.description.abstract	Automatic audio-visual urban traffic understanding is a growing area of research with many potential applications of value to industry, academia, and the public sector. Yet, the lack of well-curated resources for training and evaluating models to research in this area hinders their development. To address this we present a curated audio-visual dataset, Urban Sound & Sight (Urbansas), developed for investigating the detection and localization of sounding vehicles in the wild. Urbansas consists of 12 hours of unlabeled data along with 3 hours of manually annotated data, including bounding boxes with classes and unique id of vehicles, and strong audio labels featuring vehicle types and indicating off-screen sounds. We discuss the challenges presented by the dataset and how to use its annotations for the localization of vehicles in the wild through audio models.
dc.language	en
dc.publisher	IEEE
dc.relation	ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23-27 may 2022, pp. 141-145.
dc.rights	Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
dc.rights	Las obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)
dc.subject	Location awareness
dc.subject	Training
dc.subject	Industries
dc.subject	Annotations
dc.subject	Conferences
dc.subject	Signal processing
dc.subject	Benchmark testing
dc.subject	Audio-visual
dc.subject	Urban research
dc.subject	Traffic
dc.subject	Dataset
dc.title	Urban sound & sight : Dataset and benchmark for audio-visual urban scene understanding
dc.type	Ponencia

Este ítem pertenece a la siguiente institución

Universidad de la República (Uruguay)