What kind of semantic information can be extracted from such media? Anything would be fine, be it differentiation between music and spoken text, detection of distinct sounds (like gunshots or birds or cars), detecting indoor/outdoor takes or intensity of camera motion.
I know that there are many, many, many, manymanymany research topics in this category, but I didn't find any applications of any of these. Does anybody have links to applications / libraries / working prototypes / news about upcoming products on these topics?