Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorKoudouna, Daniel
dc.contributor.authorTerzić, Kasim
dc.contributor.editorFarinella, Giovanni Maria
dc.contributor.editorRadeva, Petia
dc.contributor.editorBraz, Jose
dc.contributor.editorBouatouch, Kadi
dc.date.accessioned2021-03-18T10:30:17Z
dc.date.available2021-03-18T10:30:17Z
dc.date.issued2021-02-08
dc.identifier.citationKoudouna , D & Terzić , K 2021 , Few-shot linguistic grounding of visual attributes and relations using gaussian kernels . in G M Farinella , P Radeva , J Braz & K Bouatouch (eds) , Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - (Volume 5) . vol. 5 VISAPP , International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications , vol. 5 , SCITEPRESS - Science and Technology Publications , pp. 146-156 , 16th International Conference on Computer Vision Theory and Applications (VISAPP 2021) , 8/02/21 . https://doi.org/10.5220/0010261301460156en
dc.identifier.citationconferenceen
dc.identifier.isbn9789897584886
dc.identifier.issn2184-4321
dc.identifier.otherPURE: 273372330
dc.identifier.otherPURE UUID: 6e896dcb-c28c-4d22-a908-845479219cc6
dc.identifier.otherJisc: 22da8e513675461bb4c02d1c756e80c9
dc.identifier.otherScopus: 85102977648
dc.identifier.otherWOS: 000661288200013
dc.identifier.urihttps://hdl.handle.net/10023/21653
dc.description.abstractUnderstanding complex visual scenes is one of fundamental problems in computer vision, but learning in this domain is challenging due to the inherent richness of the visual world and the vast number of possible scene configurations. Current state of the art approaches to scene understanding often employ deep networks which require large and densely annotated datasets. This goes against the seemingly intuitive learning abilities of humans and our ability to generalise from few examples to unseen situations. In this paper, we propose a unified framework for learning visual representation of words denoting attributes such as “blue” and relations such as “left of” based on Gaussian models operating in a simple, unified feature space. The strength of our model is that it only requires a small number of weak annotations and is able to generalize easily to unseen situations such as recognizing object relations in unusual configurations. We demonstrate the effectiveness of our model on the pr edicate detection task. Our model is able to outperform the state of the art on this task in both the normal and zero-shot scenarios, while training on a dataset an order of magnitude smaller. (Less)
dc.language.isoeng
dc.publisherSCITEPRESS - Science and Technology Publications
dc.relation.ispartofProceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - (Volume 5)en
dc.relation.ispartofseriesInternational Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applicationsen
dc.rightsCopyrightc©2021 by SCITEPRESS – Science and Technology Publications, Lda. This is an open access article under the CC BY-NC-ND license.en
dc.subjectFew-shot learningen
dc.subjectLearning modelsen
dc.subjectAttribute learningen
dc.subjectRelation learningen
dc.subjectScene understandingen
dc.subjectQA75 Electronic computers. Computer scienceen
dc.subject3rd-DASen
dc.subject.lccQA75en
dc.titleFew-shot linguistic grounding of visual attributes and relations using gaussian kernelsen
dc.typeConference itemen
dc.description.versionPublisher PDFen
dc.contributor.institutionUniversity of St Andrews. School of Computer Scienceen
dc.contributor.institutionUniversity of St Andrews. Coastal Resources Management Groupen
dc.identifier.doihttps://doi.org/10.5220/0010261301460156


This item appears in the following Collection(s)

Show simple item record