views:

124

answers:

3

What kind of semantic information can be extracted from such media? Anything would be fine, be it differentiation between music and spoken text, detection of distinct sounds (like gunshots or birds or cars), detecting indoor/outdoor takes or intensity of camera motion.

I know that there are many, many, many, manymanymany research topics in this category, but I didn't find any applications of any of these. Does anybody have links to applications / libraries / working prototypes / news about upcoming products on these topics?

A: 

The best topic for finding applications of this is that you might want to look at the research topic of "Content Based Video Retrieval and Indexing"

Other than that:

  • You can use learning techniques to classify the information recieved (video, single frames, or audio)
  • You can use clustering techniques to find similar sections of audio or video

One application of this is commercial removal. Commercial removers typically do a clustering approach to eliminate sections of commercials in TV video.

monksy
Do you have any links to libraries capable of applying any of these techniques?
soulmerge
Nope, just papers talking about their approach and how successful they were.
monksy
+1  A: 

Have a look at MP4REG, which is the registration authority for code-points in "MP4 Family" files.

Short primer: Within the MPEG4 & QuickTime world, the basic physical building block of media is called an "Atom". Atoms can not only contain the actual audio and video, but also technical and non-technical meta data. The last of which sound interesting to you.

E.g.:

  • albm: Album title and track number (user-data)
  • jp2i: intellectual property information

I've only looked closely at this stuff once, with respect to meta-data, and my impression was that is it a fast and loose world. You might want to look at some low-level MP4 parsing tools that will let you inspect the individual atoms of real world media files. I think there are even unofficial (unregistered), custom atoms for use within specific systems.

Stu Thompson
The library itself looks very interesting. But if I understood it correctly, it only provides a.) technical data and b.) data that was entered by the user. I'm rather looking for information that is extracted through anlysis of the media.
soulmerge
It can provide more than just technical data. But, yes, it is just data that is specifically entered in by the creating/managing system.
Stu Thompson
A: 

Music feature analysis is a huge topic these days. Imagine the possibilities! http://en.wikipedia.org/wiki/Music%5Finformation%5Fretrieval

Also, check out the Conet Project: http://www.archive.org/details/ird059

just_wes