Abstract: Query-by-Example Spoken Term Detection (QbE-STD) retrieves relevant audio files corresponding to a spoken query, without relying on explicit word-level textual transcriptions. In ...
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Please cite this work with the following BibTeX: @inproceedings{cocchi2024augmenting, title={{Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering}}, ...
Abstract: Visual Odometry (VO) empowers robots with the ability to perform self-localization within unknown environments using visual cues, yet it is faced with challenges in dynamic environments. In ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results