A while ago I came across an interesting array of video presentations on a German company's website. They deal with modifying a video stream while it's playing, and I remained pleasantly impressed by the accuracy and smoothness of this technique. Out of all these presentations, I considered one of them quite fascinating in terms of text blending within dynamic, playing video. It allows you to type in a string in a text box while the video is playing and embeds transformed variants of the text you wrote withing the video, with realistic accuracy. My question is if you would happen to know what kind of algorithm is required for such a feature, how could I programmatically embed real-time text and images in a video stream? Is there any research paper or library I should look into for details?
PS. Don't flame me for the contents of the video, it's the programming technique that I'm interested about, the video is the best example I could find.