Abstract
Due to the lack of annotation of their large videoarchives, multimedia content provider companies andtelevision channels do not use the data in their archivesto their full extent. In order to contribute with a solutionto this problem, we have developed a tool that combinesaudio and visual information to annotate video.In particular, this tool has been used by a video productioncompany that has given us positive feedback. Themain innovation of this tool is the use of environmentalsound recognition to annotate video. Here we focus onthe tool's audio information extraction method, whichconsists of a sound recognizer that learns a small set ofspectral features from the data using non-negative matrixfactorization. The recognizer can be used for differentpurposes such as to classify musical instruments, toidentify the notes that are played and to distinguish environmentalsounds like water, traffic, trains and people.
Original language | Unknown |
---|---|
Title of host publication | Proceedings of the IEEE International Conference on Audio, Language and Image Processing |
Pages | 1125-1130 |
DOIs | |
Publication status | Published - 1 Jan 2012 |
Event | IEEE International Conference on Audio, Language and Image Processing - Duration: 1 Jan 2012 → … |
Conference
Conference | IEEE International Conference on Audio, Language and Image Processing |
---|---|
Period | 1/01/12 → … |