Due to the lack of annotation of their large videoarchives, multimedia content provider companies andtelevision channels do not use the data in their archivesto their full extent. In order to contribute with a solutionto this problem, we have developed a tool that combinesaudio and visual information to annotate video.In particular, this tool has been used by a video productioncompany that has given us positive feedback. Themain innovation of this tool is the use of environmentalsound recognition to annotate video. Here we focus onthe tool's audio information extraction method, whichconsists of a sound recognizer that learns a small set ofspectral features from the data using non-negative matrixfactorization. The recognizer can be used for differentpurposes such as to classify musical instruments, toidentify the notes that are played and to distinguish environmentalsounds like water, traffic, trains and people.
|Title of host publication||Proceedings of the IEEE International Conference on Audio, Language and Image Processing|
|Publication status||Published - 1 Jan 2012|
|Event||IEEE International Conference on Audio, Language and Image Processing - |
Duration: 1 Jan 2012 → …
|Conference||IEEE International Conference on Audio, Language and Image Processing|
|Period||1/01/12 → …|
Cavaco, S. C. F. M., & Correia, N. M. R. (2012). Automatic Instrument and Environmental Sound Recognition for Media Annotation of TV Content. In Proceedings of the IEEE International Conference on Audio, Language and Image Processing (pp. 1125-1130) https://doi.org/10.1109/ICALIP.2012.6376785