I ask this academically, I want to ask aloud a very important question and have the community try to answer it. Can we build a system that generates a scene to play out along a live anonymous group video chatroom that can read the text typed at it and respond with a chatbot?
Live Internet video is often blurry and has low resolution. One cannot make out many details in the scene of the distant party. Scenes can be rendered with modern software tools that look very real when not moving. Making them move realistically is a large piece of simulation software.
Faces can be rendered at 24 frames per second by a cluster of 24 systems capable of 1 frame per second. The video would then have a 1 second lag from the point where the decision was made as to which facial expression to generate. These facial expressions and their generation is a key problem. The skin realism requirement is a solved problem by the graphics community.
Facial expressions have been categorized by several researchers. They also can be rendered, this has been shown in modern computer graphics literature. We can do them if we can know which ones are appropriate for a given situation.
Chatbots have been in use for decades. There exist now quite 'smart' chatting programs that will read what it is asked and reply in a sensible way. They have always done this with text, but text-reader software can speak out in a human-ish voice, and speech recognition software is getting better every year.
What I propose is the fact that it should be quite rudimentary to connect all of these disparate parts of software development and create some truly amazing turing-test beater.
This program could enter a virtual space and display a realistic environment as if on a webcam like the other participants. It can watch their facial expressions and it can listen to their speech and it can read their text. It could then create a response and either type or say it back to the group. The choosing of what to respond with is a difficult problem that not even most humans have mastered. We can get it close with a lot of work.
The Turing Test is about proving that a communicator is a human, but 'proof' only in the sense that it is good enough to fool the human judges. If the human judges are simply everyone, they will not likely apply a strict formal procedure. Guessing or falling for a trick is good enough.
Do you think we can do this?
Is this plan flawed? Are there moral implications to tricking the average viewer in this way? Can we make millions of dollars by generating personal intelligent assistants?