Abstract: Query-by-Example Spoken Term Detection (QbE-STD) retrieves relevant audio files corresponding to a spoken query, without relying on explicit word-level textual transcriptions. In ...
Please cite this work with the following BibTeX: @inproceedings{cocchi2024augmenting, title={{Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering}}, ...
Abstract: Visual Odometry (VO) empowers robots with the ability to perform self-localization within unknown environments using visual cues, yet it is faced with challenges in dynamic environments. In ...